New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
Best way to keep track of your multiple VPS's uptime?
Hi.
So if your like me, and have 40+ VPS's from a handful of providers.
Some VPS's I have:
9 x RamNode
1 x BuyVM
4 x Crissic
3 x Iniz
5 x Dacentec
7 x Digital Ocean
3 x Vultr
And the list goes on....and on...
What do you use to keep track of the uptime of your boxes.
I use Uptime Robot (which isn't the most reliable), and have a few Pingdom accounts for critical boxes.
I don't want to pay for having checks done less than 5 minutes so lately I've been thinking about setting up some software that will check all my boxes every 1-2 minutes and notify me via email as soon as it's down.
Any suggestions?
Comments
I suggest logging into all your VPS every 5 minutes and recording whether it works or not in a notebook.
Give bearmon a try. It's run by @perennate.
librenms surely works if you care to critically monitor and customize it.
Thanks for the mention. bearmon.com does support 1-minute checks for free.
If you are looking for simple distributed uptime monitoring software, though, the bearmon software is open source so you can setup your own cluster:
Both are focused exclusively on uptime monitoring, so they don't have the complexity of most monitoring software (which include code to run load, memory, etc. server checks). This makes it relatively easy to setup.
gobearmon has two types of nodes, controller/worker nodes and viewserver. The viewserver determines which worker node is the current controller; if the viewserver cannot reach the controller, it will promote a new worker. Worker nodes will periodically check with the viewserver to determine which node to use as the controller. The controller maintains current check metadata in-memory, like how many times a certain check has failed.
As long as the current controller and the viewserver don't fail at the same time, gobearmon will keep working.
gobearmon supports multiple MySQL endpoints, so if you have a Galera or other cluster then gobearmon will retry operations on another node in the cluster if some operation fails.
Of course, if you don't care about monitoring from multiple locations, a simple script will do. Here are some threads about that:
Note: bearmon.com is currently running pybearmon, gobearmon is only used on uptime monitoring on Luna Node. Probably I will upgrade bearmon.com some time to increase reliability when OVH BHS goes down.
NIXStats is great including the SMS feature, however, I think it's 5min intervals...
NIXStats for domain monitoring is 1 min interval, for servers it's 3 minutes for now to prevent false positives.
I use NewRelic Synthetics & Servers (both are free), NixStats, Uptime Doctor (free 1 minute monitoring), and updown.io
I use LibreNMS and NixStats for public, to the customer uptime monitoring
I use nixstats and also uptime robot and statuscake
And what about custom software you can run on Linux where one server watches the other ?
+1 Nixstats all the way
Also I can recomend Android app with 1 minute interval:https://play.google.com/store/apps/details?id=com.luckyxmobile.servermonitor&referrer=utm_source=google&utm_medium=organic&utm_term=server+and+website+monitor+google+play&pcampaignid=APPU_1__DuSVuq4NuP5ygPvkYnoBg
The phone is always with me and it is very easy variant.
im use nixstats and uptimerobot. personally love nixstats with text message features.
@vfuse just suggestions, can remove server from nixstats website maybe?
I have also been using hetrix tools
Nah, nixstats is so much unstable! Could'nt add domains, get only 50x error from Cloudflare or a error like this: "Could not fetch domain, domain has to be online." - But the domain(s) are 100% resolvable and "online"... this is far away from a good monitoring service.
NixStats
Check out: https://github.com/Kickball/awesome-selfhosted/blob/master/README.md#monitoring-and-administration
nixstats is in beta status, I don't think it's good idea to use for production.
You can try uptimedoctor.com
It won't add any domain that's not returning a 200 OK status, there's not point in monitoring a domain that could be offline forever. Check if the domain you're adding are returning a http status code 200 (sometimes it might seem like everything is online but for example a wordpress under construction page also doesn't return a 200 OK)
You first have to run the uninstaller, after about 5 minutes a "remove server" button should appear on the dashboard. This is to prevent you from deleting a server which is not uninstalled and would keep sending data to the API for no reason.
Thanks everyone. Gonna give NixStat's a shot.
Hi all, I'm the founder of updown.io so take this with a pinch of salt
I was happy to see people talking about updown.io in this forum and was even more happy to discover great things in it, like NixStats, this looks awesome i'm trying it right away!
And updown.io is using low end boxes, like Digital Ocean's 5$ VPS or OVH's 7€ VPS so I'm quite interested in this area and I hope to find interesting peaces here!
xaitmi, If you're still looking for a robust monitoring solution I encourage you to try updown.io of course, it's not free but it can get pretty cheap depending on how fast you check your servers, the best part is that you can mix fast and slow check and get the best pricing out of it. For example, checking 30 servers every 5 minutes AND 10 servers every 2 minutes would cost just 4.10€ / month.
But advertisement appart, if you want to monitor servers and not web services you'd probably better go with something like NixStats as it provides more details servers health stats that can help you prevent downtimes before they happen, updown.io can't do that.
In my case I'm using OpsDash as a free, self-hosted, server monitoring tool, but it's only free up to five servers, then it's 1$/server/month.
I hope this helps, i'm looking forward to all I will read in this forum ☺