Project idea: Services checker and re-starter

Go59954 · May 2012

Hi,

I've had an issue earlier with one of my VPS where the issue became annoying: service collapsing causing server unresponsiveness. Which is apparently out of shortness in memory, with possibility of overselling at provider's side.

I've opened a thread about that issue. And came somehow to a suggestion that might be useful anyway, which is:

A local script that runs at fixed time interval with a cron job, and it check user's critical services (ie Apache/Nginx, Mysql server, DNS..) , and only send an email notification in the case it finds a service is down (and might be set to an email that forwards messages to an sms). Means: No notifications sent=Things are (supposedly) up and running.

However rds100 have suggested that:

@rds100 said: Local bash script is not the best solution, because it can die too, the same way as other services die. The monitoring must be remote.

That's true since running the script won't succeed at times if no memory is available or for other performance reasons. So a remote monitoring service was suggested.

However once more I've suggested a local script, but that also listens on a port and can be monitored externally with a 3rd party monitoring service that notifies by sms/email just when the script itself is down. As I see in this case the monitoring script needs to run all the time in the background, so probably no need for cron jobs.

So, the idea for now is:

A script runs locally on VPS, stays in the background and monitors user-specified critical services. Once any of it is down for x period of time, an email notification is sent to the user including the service that's down for x sequential number of times. OR, the monitoring script automatically tries to restart the service for x number of times before sending each notification. In the case the service is started successfully then a notification is sent stating the service went down but was successfully started- and no action required by user.

While in the case the monitoring script itself is down, and as the script itself listens on a port, an external 3rd party monitoring is set to monitor it's uptime, so an email notification is sent from the external monitoring service that indicating once the monitoring script/full VPS is down or not responding.

That's it. I thought I would just throw it for evaluation if it's useful or not. Plus the fact that I don't think I will be trying to do it by myself anytime soon.

Aldryic · May 2012

You could take a look at @NickM 's OpenStatus. It does service monitoring, and email notification.

Go59954 · May 2012

@Aldryic said: You could take a look at @NickM 's OpenStatus. It does service monitoring, and email notification.

Yes it can do it more or less, however the difference is this one supposedly runs on the same VPS locally. Also it shall try to start died service before sending each notification.

NickM · May 2012

Using a script to try to restart a service that died can lead to even worse issues, which is why OpenStatus doesn't do it. For example, with MySQL, if you're using replication and it dies and tries to restart? You might end up with inconsistent data on your slaves and you'll have to sort it out. Granted, if you're using MySQL replication, you should know that and disable it, but still...

subigo · May 2012

Sign up for http://www.uptimerobot.com and create a custom port check. Done and done.

edit: Oh, you want something that automatically restarts the service too. Just use a simple bash script for that... or just use something like http://supervisord.org

dannix · May 2012

If you need monitoring and restarting check monit

Go59954 · May 2012

@NickM said: Using a script to try to restart a service that died can lead to even worse issues, which is why OpenStatus doesn't do it. For example, with MySQL, if you're using replication and it dies and tries to restart? You might end up with inconsistent data on your slaves and you'll have to sort it out. Granted, if you're using MySQL replication, you should know that and disable it, but still...

Thanks
Well, that's a good suggestion if it can actually get included in OpenStatus. But I don't think it's better being considered problematic just out of probably a few cases where it leads to problems. I mean starting the web server, and DNS if used, both probably won't cause a problem, also Mysql server in most cases (given replication is the main trouble source) , so that can be included in Readme, in addition to a bold notice just in case. So maybe you better consider similar thing in the future updates

It would be great if service starting commands are added just by the user himself as much as needed, in configs. So if he didn't add starting command at Mysql line in configs, then OpenStatus won't try to start Mysql. And if he didn't add any starting command next to a list of services, then no services will be started (same as feature disabled). Also might add a note next to Mysql line in configs and other services that might cause troubles by automatically restarted.

Go59954 · May 2012

@subigo said:

Sign up for http://www.uptimerobot.com and create a custom port check. Done and done.
edit: Oh, you want something that automatically restarts the service too. Just use a simple bash script for that... or just use something like http://supervisord.org

Thanks I have an uptimerobot account, I'll be checking the other suggestion.

Go59954 · May 2012

@dannix said: If you need monitoring and restarting check monit

Thanks for your suggestion and I might try monit. And even though there are full featured monitoring services that must be doing most of things, I was looking into a simple way to get it.

VPSCheap_net · May 2012

the simplest way would be a script in cron

#!/bin/bash ps -ef | grep -v grep | grep Your-Prog if [ $? -eq 1 ] then restart your program fi

camarg · May 2012

from my blog here akamaras.com/linux/linux-script-to-check-if-a-service-is-running-and-start-it-if-its-stopped/

#!/bin/bash

###edit the following
service=service_name
[email protected]
###stop editing

host=`hostname -f`
if (( $(ps -ef | grep -v grep | grep $service | wc -l) > 0 ))
then
echo "$service is running"
else
/etc/init.d/$service start
if (( $(ps -ef | grep -v grep | grep $service | wc -l) > 0 ))
then
subject="$service at $host has been started"
echo "$service at $host wasn't running and has been started" | mail -s "$subject" $email
else
subject="$service at $host is not running"
echo "$service at $host is stopped and cannot be started!!!" | mail -s "$subject" $email
fi
fi

Go59954 · May 2012

@VPSCheap_net said: the simplest way would be a script in cron

!/bin/bash ps -ef | grep -v grep | grep Your-Prog if [ $? -eq 1 ] then restart your program fi

Thank you! Going to test that, I guess cron is the way to go to make it simple.

@camarg said: from my blog here akamaras.com/linux/linux-script-to-check-if-a-service-is-running-and-start-it-if-its-stopped/

Thank you for that great script. I will test it for sometime to figure how it goes under the real condition, once issues repeats.

raindog308 · May 2012

Good Lord, there's a lot of wheel reinvention here.

The Unix way(*) to start a process if it fails is to use inittab, which some distros have retired in favor of upstart. 'respawn' is the configuration you want. This has been in Unix since at least the early 90s.

(*) at least for predominantly SysV-derived Unices like Linux. I don't know what the equivalent is in BSD off hand.

maxexcloo · May 2012

Take a look at my cron/screen based services restarter/starter. I use it and it works well

https://github.com/maxexcloo/User-Daemon

roytam1 · May 2012

There is some worst case: service(for example lighttpd) process is here but sits and does nothing (service stalls)

my freebsd cron script:

#!/bin/sh
fetch -o /dev/null -T 3 http://localhost/echo.php > /dev/null 2>&1
if [ $? -gt 0 ]; then
        /usr/local/etc/rc.d/lighttpd restart
        /usr/local/etc/rc.d/php-fpm restart
fi

subigo · May 2012

@raindog308 said: Good Lord, there's a lot of wheel reinvention here.

The Unix way(*) to start a process if it fails is to use inittab, which some distros have retired in favor of upstart. 'respawn' is the configuration you want. This has been in Unix since at least the early 90s.

(*) at least for predominantly SysV-derived Unices like Linux. I don't know what the equivalent is in BSD off hand.

Respawning a web service like Apache (and depending on how you have Apache setup it won't even work) or MySQL through init is a pain in the ass and not really the point of init. A simple cron script is a lot easier to manage, especially when you need to permanently stop the service for a while.

prometeus · May 2012

You can also use the daemontools
http://cr.yp.to/daemontools.html

Go59954 · May 2012

@raindog308 said: Good Lord, there's a lot of wheel reinvention here.

The Unix way() to start a process if it fails is to use inittab, which some distros have retired in favor of upstart. 'respawn' is the configuration you want. This has been in Unix since at least the early 90s.
() at least for predominantly SysV-derived Unices like Linux. I don't know what the equivalent is in BSD off hand.

Right, probably I should've searched for any available/similar scripts beforehand, but I was also short on time recently.
As for service starting commands that's why I've suggested @NickM to include a config file preloaded with a list of most used services, and next to each one is a space to add starting command by the user himself and to only desired services that he wants to be started if failed, and so commands are left to the user to add depending on distro and version of programs.

joepie91 · May 2012

Have a look at http://puppetlabs.com/.

marrco · May 2012

@dannix said: If you need monitoring and restarting check monit

that

gsrdgrdghd · May 2012

Why use difficult scripts to check if a service is running? Just put "/etc/init.d/service start" in your crontab, if the service is already running nothing will happen.

Go59954 · May 2012

@maxexcloo said: Take a look at my cron/screen based services restarter/starter. I use it and it works well

https://github.com/maxexcloo/User-Daemon

Thanks. looks good, and I'm giving it a try

@roytam1 said: There is some worst case: service(for example lighttpd) process is here but sits and does nothing (service stalls)

my freebsd cron script:

It's a good point to add in a new script as well, thank you.

@subigo said: Respawning a web service like Apache (and depending on how you have Apache setup it won't even work) or MySQL through init is a pain in the ass and not really the point of init. A simple cron script is a lot easier to manage, especially when you need to permanently stop the service for a while.

Thanks

@prometeus said: You can also use the daemontools

http://cr.yp.to/daemontools.html
Might eventually

What a great solution. thanks for posting that!

@joepie91 said: Have a look at http://puppetlabs.com/.

Thanks. And that's full featured.

@gsrdgrdghd said: Why use difficult scripts to check if a service is running? Just put "/etc/init.d/service start" in your crontab, if the service is already running nothing will happen.

Thanks! A good suggestion.Even though it won't notify by email.

William · May 2012

Just as a hint.... why not use Monit?
http://mmonit.com/monit/

Runs as service and can easily be adapted to every app that has a PID file - Checks if the app works etc etc.

Example config for nginx:

#check nginx now check process nginx with pidfile /var/run/nginx.pid start program = "/etc/init.d/nginx start" stop program = "/etc/init.d/nginx stop" if failed host IP.IP.IP.IP port 80 protocol HTTP request / then restart if 100 restarts within 100 cycles then timeout

or php:

check process php-fpm with pidfile /var/run/php5-fpm.pid group phpcgi # phpcgi group start program = "/etc/init.d/php5-fpm start" stop program = "/etc/init.d/php5-fpm stop" ## Test the UNIX socket. Restart if down. if failed unixsocket /tmp/php-cgi.sock then restart ## If the restarts attempts fail then alert. if 100 restarts within 100 cycles then timeout

Can send email, sms and has a nice webinterface where the status can be seen and the process can be restarted manually - Multiple servers can be grouped in one interface by MMonit.

ipxadam · May 2012

We use Hyperic at my day job (monitoring a major US retail website). Its pretty powerful and there is an open source version: hyperic.com

unixguru · May 2012

The best way to do this is to have a check script spawned by crown every so often, it won't take up memory whilst it's executing. Crond rarely fails.

Howdy, Stranger!

Categories

In this Discussion

Project idea: Services checker and re-starter

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Project idea: Services checker and re-starter

Comments