I have a couple of servers (one at home and one at work) and I need to make sure that they are up all the time. However, sometimes a day or two go by with no one using the server, so I don’t always know if the server is down right away. I need some sort of immediate notification when the server is down.
I know that you can use various commercial products that will do this, but I think it’s better to do it myself, that way I learn more and I am not dependent on some other company to take care of my situation and have appropriate privacy controls.
In this posting I’ll explain how I did this as well as provide all of the scripts to do this. Obviously, I’m assuming that all of the computers are running Linux — they are servers, after all.
I’m using a hosted domain as the central computer that the other servers contact (technically it should be called a hosted website). You could also use a VPS or any other server to act as your control server instead. (I am not checking to make sure that my hosted domain is up.) You could also just do this with two servers: ServerA checks to see if ServerB is up and vice versa.
Overview
- Each server sends a simple message to the main computer (basically my hosted domain name, quarkphysics.ca) every 15 minutes
- If the main computer does not detect a message after a certain time, it sends a text/SMS message to my cell phone.
- A web interface on the hosted domain shows the status of each server and has other features for controlling things.
Only steps 1 and 2 are necessary to get this to work.
Note that at least one server is behind a firewall. You cannot access it from outside nor can you ping it … This server is for internal use only and yet I have to know right away if it goes down.
Note that none of these scripts has any authentication so anyone who knows the path and the script name can control SMS notification of your servers as well as add in fake entries. So I’ve changed the path names, script names and computer names for this document to prevent people from messing with my setup.
Web Interface
In this example, you can see that ServerA is down for more than 30 minutes (status=error). SMS is disabled (light red background behind buttons) so I will not be getting text messages about it. ServerB is up – so the buttons to enable/disable SMS don’t work. ServerC has missed one notification, so it’s been down between 15 and 30 minutes. Since the background under the buttons is light green, it means that I’ll be receiving SMS messages about it every 15 minutes.
Everything is done via filenames and timestamps stored in those files:
Contents of /somepath/ folder
1
2
3
4
5
|
FILENAME CONTENTS OF FILE (timestamp)
AliveStatusServerA.txt 1467830186
AliveStatusServerB.txt 1467832169
AliveStatusServerC.txt 1467832316
DisableServerA.txt 1467831761
|
None of these are the names of my actual servers, so the Delete button comes in handy here. If you delete a real server, no problem. The entry just gets recreated next time a “ping” arrives.
PHP scripts
Overview
- each server runs a crontab entry
- central hosted domain runs the following scripts
- aliveStamp.php
- notifyServerDown.php
- serverList.php
- disableSMS.php
Part 1a: Server up notification
Each server runs the following command in their crontab:
1
2
|
# alive notification every 15 minutes
10,25,40,55 * * * * /usr/bin/curl http://yourdomain/cgi-bin/aliveStamp.php?COMPUTER=ServerA
|
NOTES:
- All that is needed is to access ONE webpage
- All you have to do is decide on a computer name for each server and be consistent in its use and Capitalisation
- I’m not doing this on the 15 minute mark since the hosted domain is doing its checks on the 15 minute mark
- Later on, test everything by commenting out this line in your crontab.
Part 1b: notification script on hosted domain
The hosted domain (or other server) has the following script in its cgi-bin directory:
aliveStamp.php
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
|
<?php
/* Script to write the date/time stamp when this file is called.
This will be used to tell if the computer is up or not as this should
be called every 15 minutes. */
//LOCATION: cgi-bin
$path="../public_html/yourdomain/somepath/";
$server=$_GET['COMPUTER'];
$server = filter_var ($server, FILTER_SANITIZE_STRING);
if (empty($server)) {
$error_message = "<HTML><BODY>ERROR: NO COMPUTER NAME PROVIDED!</BODY></HTML>";
echo $error_message;
exit;
}
$filename=$path."AliveStatus" . $server . ".txt";
$myfile = fopen($filename, "w") or die("Unable to open file!");
$time = time();
fwrite($myfile, $time );
fclose($myfile);
//This script also deletes any "Disable" flag. I'm also doing this in notifyServerDown.php
$filename=$path . "Disable" . $server . ".txt";
unlink($filename);
?>
<HTML>
<body>
<?php echo $server; ?> updated <?php echo $time; ?>
</body>
</html>
|
So, every time this is run, it will create/overwrite a file called AliveStatusServerA.txt and put the current time stamp into it, where “ServerA” is the server name. The contents will be something like 1467647701
. (This is called the “epoch” and can be converted to human readable time in various ways.)
Part 2: Sending out a text message
- Note: Telus and Koodo have a webpage that allows you to send an SMS message from it. All that you have to do is to fill in the fields correctly.
- I don’t know which other cell phone companies have this. If your provider does not do this, (i) contact them, (ii) add a 3G-modem card and a SIM card to some computer of yours so that you can send text messages. Use this computer instead of the “hosted domain”, (iii) change the scripts to send emails instead.
notifyServerDown.php
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
|
<?php
error_reporting(-1);
//$PATH='../public_html/yourdomain/somepath/';
$PATH='../yourdomain/somepath/';
$data = getData($PATH);
prepareSMS($data);
//============================================//
/* This function gets all of the "AliveStatusXXXX.txt" files and reads the timestamps
* It stores the computer name, timestamp, and status in a 2D array. */
function getData($path) {
$data = [];
foreach (array_filter(glob($path . '*txt'), 'is_file') as $filename) {
$file = fopen($filename, "r") or die("Unable to open file $filename !");
$epoch = fgets($file);
fclose($file);
$computer = parse($filename);
//skip invalid files
if ($computer == FALSE) continue;
if (empty($epoch)) continue;
// handle daylight savings time
if (date('I', time())) {
//echo 'We’re in DST!';
$localtime = $epoch - (4*3600); //FOR DAYLIGHT SAVINGS TIME
} else {
//echo 'We’re not in DST!';
$localtime = $epoch - (5*3600); //FOR EST
}
//this will crash the script if the date is invalid format (eg. a word)
$dt = new DateTime("@$localtime"); // convert UNIX timestamp to PHP DateTime
$timestamp = $dt->format('Y-m-d H:i:s'); // output = 2012-08-15 00:00:00
$status = getStatus($epoch);
array_push($data, array($computer,$timestamp,$status));
}
return $data;
}
/* This function finds the computer name from the filename AliveStatusXXX.txt */
function parse($text) {
$pos = strpos($text,"AliveStatus");
if ($pos == FALSE) return FALSE;
$text = substr($text,$pos+strlen('AliveStatus'));
$text = substr($text,0,strlen($text)-4);
return $text;
}
/* This function determines the status based on the timestamp */
function getStatus($last) {
if (empty($last)) return 'empty'; //(only for this script)
$now=time();
$minutes = $now - $last;
$minutes = $minutes / 60;
if ($minutes < 15) {
$result = 'good';
} elseif ($minutes < 30) {
$result = 'warning';
} else {
$result = 'error';
}
return $result;
}
/* This function prepares the text for the SMS based on the status.
It also handles the enable/disable settings based on the existance of the file*/
function prepareSMS($data) {
global $PATH;
foreach ($data as $row) {
$server = $row[0];
$status = $row[2];
$disableFilename=$PATH."Disable".$server.".txt";
switch ($status) {
case "good":
echo "<br>" .$server. " : Last notification at " . $row[1];
unlink($disbleFilename); //delete any disable file
break;
case "warning":
echo "<br>WARNING: $server more than 15 minutes ($row[1])";
if(file_exists($disableFilename)) {
echo "<br>SMS disabled!";
break;
}
sendEmail("WARNING: Server " . $server . " is not responding and may be down.\n Last notification at " . $row[1] );
break;
case "error":
echo "<br>ALERT! $server more than 30 minutes ($row[1])";
if(file_exists($disableFilename)) {
echo "<br>SMS disabled!";
break;
}
sendEmail("ALERT: Server " . $server . " is not responding and may be down.\n Last notification at " . $row[1] );
break;
case "empty":
sendEmail("NO TIME STAMP for $server found");
break;
default:
echo "<br>" .$server. " : Last notification at " . $row[1];
}
}
}
/* This method replaces the sendError() method which no longer works.
* We can no longer go to the telus webpage and fill in the data.
* However sending an email works just fine. */
function sendEmail($MSG) {
$AREACODE1=510; $PHONENUM1=1234567; //cell phone number.
$TO=$AREACODE1.$PHONENUM1."@msg.telus.com";
mail($TO,"SERVER DOWN",$MSG);
echo "<br />Error message emailed!!";
}
/* This function actually sends the SMS */
function sendError($MSG) {
$AREACODE1=510; $PHONENUM1=1234567; //cell phone number.
$AREACODE2=510; $PHONENUM2=7654321; //optional call back number
$MSG = str_replace(' ', '+', $MSG);
//$DATA="&CODE=".$AREACODE1."&NUM=".$PHONENUM1."&MESSAGE=".$MSG."&CALLBACKCODE=".$AREACODE2."&CALLBACK=".$PHONENUM2;
$DATA="&CODE=".$AREACODE1."&NUM=".$PHONENUM1."&MESSAGE=".$MSG;
$URL="http://msg.telus.com/msg/HTTPPostExtMgr"; //YAY TELUS!!
//Method 1: does not work
#$lines = file($URL . "?" . $DATA);
//Method 2: use built in php Curl
//I haven't got this working
/*
//THIS REQUIRES THAT PHP IS COMPILED TO SUPPORT CURL
// create a new cURL resource
$ch = curl_init();
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, $URL);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_POST, TRUE);
curl_setopt($ch, CURLOPT_POSTFIELDS, $DATA);
// grab URL and pass it to the browser
curl_exec($ch);
// close cURL resource, and free up system resources
curl_close($ch);
*/
//Method 3: use Curl command line
$command = 'curl -d "'.$DATA.'" '.$URL.' 2>&1';
$output = shell_exec($command);
echo "message sent!!";
}
?>
|
Part 3: Server Page – Web Interface
Here’s the script that generates the webpage (see the screenshot above)
serverList.php
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
|
<?php
//LOCATION: cgi-bin
error_reporting(-1);// Report all PHP errors
date_default_timezone_set("America/Toronto");
$PATH='../public_html/yourdomain/somepath/';
$data = getData($PATH);
/* This function gets all of the "AliveStatusXXXX.txt" files and reads the timestamps
* It stores the computer name, timestamp, and status in a 2D array. */
function getData($path) {
$data = [];
foreach (array_filter(glob($path . '*txt'), 'is_file') as $filename) {
$file = fopen($filename, "r") or die("Unable to open file $filename !");
$epoch = fgets($file);
fclose($file);
$computer = parse($filename);
//skip invalid files
if ($computer == FALSE) continue;
if (empty($epoch)) continue;
$localtime = $epoch - (4*3600);
//this will crash the script if the date is invalid format (eg. a word)
$dt = new DateTime("@$localtime"); // convert UNIX timestamp to PHP DateTime
$timestamp = $dt->format('Y-m-d H:i:s'); // output = 2012-08-15 00:00:00
$status = getStatus($epoch);
$enable = file_exists($path."Disable".$computer.".txt");
array_push($data, array($computer,$timestamp,$status,$enable));
}
return $data;
}
/* This function finds the computer name from the filename AliveStatusXXX.txt */
function parse($text) {
$pos = strpos($text,"AliveStatus");
if ($pos == FALSE) return FALSE;
$text = substr($text,$pos+strlen('AliveStatus'));
$text = substr($text,0,strlen($text)-4);
return $text;
}
function getStatus($last) {
$now=time();
$minutes = $now - $last;
$minutes = $minutes / 60;
if ($minutes < 15) {
$result = 'good';
} elseif ($minutes < 30) {
$result = 'warning';
} else {
$result = 'error';
}
return $result;
}
?>
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Cache-Control" content="no-cache, no-store, must-revalidate">
<meta http-equiv="Pragma" content="no-cache">
<meta http-equiv="Expires" content="-1">
<meta http-equiv="Refresh" content="900">
<title>Server Status</title>
<link rel="stylesheet" href="status.css" type="text/css">
</head>
<body bgcolor="#CCCCCC">
<h2>List of servers and their statuses</h2>
<form>
<table border=1 cellpadding=6 cellspacing=0 bgcolor="#FFFFFF">
<tr><th>Server</th><th>Time Stamp</th><th>Alarm Status</th><th>Enable SMS</th><th>Delete Server Entry</th></tr>
<?php
//add in each row's data
foreach ($data as $row) {
$server = $row[0];
?>
<tr>
<td><?php echo $row[0]; ?></td>
<td><?php echo $row[1]; ?></td>
<?php echo '<td class="'.$row[2].'">'.$row[2]; ?></td>
<?php
if (! $row[3]) { //enabled or disabled
echo '<td class="enable">';
} else {
echo '<td class="disable">';
}
if ($row[2] == 'good') {
echo '<input type="button" value="YES" class="btnDisable" />';
echo '<input type="button" value="NO" class="btnDisable" />';
} else {
echo '<input type="button" name="'.$server.'" value="YES" class="btnYes clicker" />';
echo '<input type="button" name="'.$server.'" value="NO" class="btnNo clicker" />';
}
?>
</td>
<td>
<?php
echo '<input type="button" name="'.$server.'" value="DELETE" class="btnRefresh clicker" />';
?>
</td>
</tr>
<?php
}
?>
<tr><td colspan=5>
<!-- this button uses javascript instead of FORM POST method-->
<input type="submit" onClick="history.go(0)" name="btnRefresh" value="Refresh"/>
</td>
</tr>
</table>
<p> </p>
<table border=1 cellpadding=6 cellspacing=0 bgcolor="#B0C0CF">
<tr><th>NOTES</td></tr>
<tr><td>
<b>Status:</b><br>
good = contact within past 15 minutes<br>
warning = 15 - 30 minutes with no contact<br>
error = more than 30 minutes with no contact
</td></tr>
<tr><td>
Time stamps are in UTC. I've subtracted 4 hours for EDT
</td></tr>
<tr><td>
Enable/disable SMS is only enabled when a system is down.<br> It is so that you don't keep getting repeated SMS messages if you don't want them.
</td></tr>
<tr><td>
The background colour tells you if the current SMS status is enabled or not.
</td></tr>
<tr><td>The delete button is to delete a server from being checked. <br>
This is useful for testing (if you make up a fake server name and then get SMS messages for it).<br>
If the server exists, the aliveStatus file will just be recreated within 15 minutes time.
</td></tr>
</table>
<script>
//This script gets all of the clicker buttons and then calls the correct page to process them.
//var myBtn = document.getElementsByTagName('input');
var myBtn = document.getElementsByClassName('clicker');
var i;
for (i = 0; i < myBtn.length; i++) {
//myBtn.addEventListener('click', function(event) {
myBtn[i].addEventListener('click', function(event) {
var name = this.getAttribute("name");
var val = this.getAttribute("value");
window.location.href = 'disableSMS.php?COMPUTER=' + name + '&VAL=' +val;
return false;
})
};
</script>
</body>
</html>
|
Here’s the CSS
(I need to change .btnRefresh to .btnDelete)
status.css
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
|
/* CSS for buttons for server status pages */
.btnYes, .btnNo, .btnRefresh, .btnDisable {
background: #060;
color: #fff !important;
font-size: 14px;
font-weight: bold;
margin: 2px;
padding: 2px;
}
.btnNo {
background: #600;
}
.btnRefresh {
background: #006;
font-weight: normal;
}
.btnDisable {
background: #999;
}
.good, .warning, .error {
background: #0C0;
text-align: center;
}
.warning {
background: #FA0;
}
.error {
background: #F12;
}
.enable{
background:#CFC;
}
.disable {
background:#FBB;
}
#myDiv {
display: inline-block;
background: #006;
color: #fff !important;
font-size: 14px;
font-weight: bold;
margin: 2px;
padding: 3px 9px;
}
|
And here’s the disableSMS.php script which is called by serverList.php when you click on a button on the web interface.
disableSMS.php
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
|
<?php
//LOCATION: cgi-bin
/* This program writes a file to signal enable/disable of SMS messages for a particular server. */
$path="../public_html/yourdomain/somepath/";
// if NO, then disable that server's SMS notifications by writing the file
// DisableXXX.txt, put a timestamp in it so that it doesn't screw up serverList.php
// if YES, then enable notifications by deleting the file:
$server=$_GET['COMPUTER'];
$server = filter_var ($server, FILTER_SANITIZE_STRING);
if (empty($server)) {
$error_message = "<HTML><BODY>ERROR: NO server NAME PROVIDED!</BODY></HTML>";
echo $error_message;
exit;
}
$enable=$_GET['VAL'];
$enable = filter_var ($enable, FILTER_SANITIZE_STRING);
//disable SMS: so write the file
if ($enable == "NO") {
$filename=$path . "Disable" . $server . ".txt";
$myfile = fopen($filename, "w") or die("Unable to open file!");
$time = time();
fwrite($myfile, $time );
fclose($myfile);
//enable SMS: so delete the file
} else if ($enable == "YES") {
$filename=$path . "Disable" . $server . ".txt";
unlink($filename);
//delete this server entry
} else if ($enable == "DELETE") {
$filename=$path . "AliveStatus" . $server . ".txt";
unlink($filename);
} else {
die("invalid val parameter");
exit;
}
//now go back to server Status page
header("Location: serverList.php");
?>
|
One problem I encountered:
the location for scripts called by a browser (or curl) is in cgi-bin. However, this cgi-bin is not the same as the location for scripts called by the webhosting “scheduled jobs” ! This means that you have to know which script goes where in order to get the path right, otherwise nothing will work. Basically, all go into cgi-bin except for “notifyServerDown.php” which goes into your “scheduled jobs” location.
UPDATES:
I just found out (thanks Reddit!) that there are two really good packages that you can download that will do this whole thing for you: Nagios or Xymon (but sometimes it’s fun reinventing the wheel).
Instead of running curl on each server, wget is probably a better choice. It is more lightweight and doesn’t require as fast a network response. curl often gives false errors:
1 |
curl: (56) Recv failure: Connection reset by peer
|