Server Monitoring - the frustrations of Cry Wolf

AlwaysSkint · June 2019

I have three providers for server monitoring and avail of their free services: Hetrix, Zilore and Uptime Robot.
In the main, they work fine and I monitor for the default web page of a hostname but..
Some servers fluctuate between Up and Down at regular intervals, sometimes alerting that they have been down for a few seconds (!), though downtimes of hours have been reported.
I run CSF and have been extending the ping rate from the default 1/s to 12/s (seems reasonable) but, and here's the rub, they shouldn't even be pinging the ICMP, only port 443 (the webserver). The upshot is that CSF sees these probes (to heck knows what other ports) as port scans and rightly blocks the source IP.
I've 'spoken' with Hetrix Tools Support in the past and they say that this is by design. I might add that in no way am I going to whitelist an external 3rd party, just because of their poor implementation. I want a single port (443, 80, ICMP, whatever) to be monitored and that is all.
Just today, I flushed the firewalls of two VPSes of around 1000 permanent blocks, just to get these monitors 'talking' to the servers again. That was short-lived!
Any one else experience these issues?

SirFoxy · June 2019

yes

LeonDynamic · June 2019

Why not use the csf.ignore file for the IPs of Hetrix, Zilore and Uptime. That way you will still be notified if CSF picks up a block but CSF won't add the IP to the csf.deny file.

AlwaysSkint · June 2019

@LeonDynamic said:
Why not use the csf.ignore file for the IPs of Hetrix, Zilore and Uptime. That way you will still be notified if CSF picks up a block but CSF won't add the IP to the csf.deny file.

That's the cop-out certainly and is otherwise known as whitelisting. Why should the b'stards be port scanning in the first place? Due to the continuous attacks on all servers, I turn off alerting for blocks anyway and most attackers get a permanent ban. Similar goes for the ignorant IP neighbours who allow their servers to send out broadcast packets (predominantly Windows and Plex).
I get enough alerting emails in my inbox without CSF saturating it.

vimalware · June 2019

I don't get any false positives on zilore when monitoring https and ssh endpoints(1min).

For ssh I use 3 attempts max per 10/20min in fail2ban.

jh · June 2019

We have a cron that gets our monitoring company's IPs and writes them to a file, then include that file in csf.ignore/allow depending on requirements. It's not that hard.

AlwaysSkint · June 2019

@jh

..include that file in csf.ignore/allow depending on requirements. It's not that hard.

It's a piece of piss to add in entries to csf ignore/allow - that is not the point. They shouldn't need to be added in the 1st place if they didn't do port scanning. Why should you ignore an external source from, in effect, attacking your server? CSF typically tracks 10-12 ports for scanning attempts, so if they're only scanning, say 443, then the trigger wouldn't come into effect.
@vimalware - I use CSF to monitor ssh attempts, so no point in having fail2ban do that task too. I presume that you don't have any other form of intrusion detection. :-/
I'm beginning to think that a simple 'mesh' of ping/webserver tests across my various VPS is the way forward. At least that way one knows what packets are being sent/responded to.

MasonR · June 2019

AlwaysSkint said: I've 'spoken' with Hetrix Tools Support in the past and they say that this is by design. I might add that in no way am I going to whitelist an external 3rd party, just because of their poor implementation. I want a single port (443, 80, ICMP, whatever) to be monitored and that is all.

AlwaysSkint said: That's the cop-out certainly and is otherwise known as whitelisting. Why should the b'stards be port scanning in the first place? Due to the continuous attacks on all servers, I turn off alerting for blocks anyway and most attackers get a permanent ban.

I think you're misunderstanding what Andrei told you. The monitoring services aren't port scanning. Because they are probing your port every x minutes (as part of the uptime check), your security mechanisms are believing that the monitoring services are port scanning your machine, which simply isn't the case.

AlwaysSkint · June 2019

Actually, looking specifically at HetrixTools diagnostics..
The monitor is set to webserver (443) but it also pings and does a MTR, even though that wasn't requested - so there's at least 3 ports being probed. It's not a belief, it's fact.
I'm in the process of deleting so-called webserver monitoring, in favour of only a ping request. Let's see if that request is adhered to.

AlwaysSkint · June 2019

HetrixTools suspended, as still too many false alerts even with ping. HostDoc's 'Gold' timing out to New York & London - I don't think so!

AnthonySmith · June 2019

AlwaysSkint said: The monitor is set to webserver (443) but it also pings and does a MTR, even though that wasn't requested - so there's at least 3 ports being probed. It's not a belief, it's fact.

Which port is being probed with ICMP then?

AlwaysSkint · June 2019

@AnthonySmith said:
Which port is being probed with ICMP then?

Yeah, I know, not strictly correct but I perhaps wrongly assume CSF is counting this in its' tally.
Setting up Nagios on my freebie FinalHosting server

AnthonySmith · June 2019

Not really how CSF works, anyway, I think you are being a bit hard on them expecting them to develop around every possible third party firewall application, even more so when the firewall has literally given you a method of fixing this in 2 seconds.

I know that is not the point, but really, in the grand scheme of things, it does the job and for free or next to free I would say that is good enough.

I gave up on all third party monitoring platforms long ago, none of them are 'great' they either totally lack meaningful information or generate so many false positives it impacts your life on a daily basis.

There is 1 exception but I always forget the name of it, @oliver was the one that put me on to it.

It is a monitoring platform that thought of everything, it makes the rest look like a child's toy and has responsive developers and REALLY never gives false positives.,

But as you would expect it is very expensive compared to the toy monitoring services out there.

So with that in mind I just wrote my own, simple token based system that requires agreements in order to alert, so essentially my own infrastructure monitors itself, as I have presence in 3 countries that works just fine.

AlwaysSkint · June 2019

For full disclosure, server management/monitoring used to be my speciality for one of the big 4 IT companies, so I'm well aware of deploying agents etc. but it's well over-the-top for a simple ping response or site check.
On a freebie monitor, I'm quite happy with say a 5 minute check, it's not like the websites that I host are mission critical. I only run a small scale operation.
Your token system does sound interesting and I'm steering closer to a DIY solution.

dahartigan · June 2019

I love HetrixTools but the thought of creating my own monitoring solution has crossed my mind a lot, especially with the mountains of low priced vps in different locations available so I can definitely agree with you there.

Problem is do you reinvent the wheel or find something on github and hack it until it works? Lol

perennate · June 2019

We use our own in-house simple uptime monitoring system to monitor our own servers and haven't had any issues. It's available here: https://github.com/lunanode/gobearmon (you just need three or four servers to set it up)

You can configure monitoring interval (e.g. once per five minutes) and notification delay (e.g. only notify after it's been down for three monitoring intervals). On each monitoring interval, if downtime is detected, it must be confirmed by two other servers.

It only monitors uptime though, not processes, so if you actually deploy this you'll want a separate system that monitors whether the monitoring system is actually running on each server that you deployed it on ; ).

(If you're open to trying it out, could give you a free account on our platform for uptime monitoring.)

Edit: to be clear, this is simple uptime monitoring (but made robust with redundant checks and such), you can configure it to send you an alert (e-mail, SMS, etc.) when something goes down but historical data and other functionality are very limited.

Edit2: oh yeah also I run a free service https://bearmon.com/ but this is an older version (https://github.com/uakfdotb/pybearmon) and not quite as reliable. Similar design though, and it's free.

AlwaysSkint · June 2019

First impressions of Nagios are good - I installed from source using the supplied PDF, then upgraded (I should've checked!).

AlwaysSkint · June 2019

@perennate Thanks! Cracking offer - lemme see how I get on with Nagios first. Your offer would certainly save quite a bit of time, over installing on my servers. :-)

Xsltel · June 2019

I prefer Zabbix for monitoring stuff, graphs, etc..

HBAndrei · June 2019

@AlwaysSkint I'm sorry you've been having such a bad experience with our monitoring nodes being blocked in your firewall. This may be the result of our platform performing Network Diagnostics (taking PING and MTR samples when downtime is detected). If you wish to disable this feature please open a support ticket.

Cheers.

Neoon · June 2019

Same as @perennate I am using a selfmade software since 2017 to monitor my servers externally.

I ditched Pingdom and Statuscake a while back, since they only offer 5 minute intervals, which is far to high to monitor anything in my opinion.
Props to @perennate to offer 60s for free, most don't.

The rest wanted $$ money to monitor 30+ servers which was a joke and still is in my view.
So Night-Sky was born, works fine for my needs.
https://github.com/Ne00n/Night-Sky

TCP/HTTP checks down to 10s, if you like big log files.

AlwaysSkint · June 2019

@HBAndrei
Thanks for reaching out. I have asked previously about this in Tickets 8152481916 & 3926052609.
As for CSF, the default PS_PORTS is 0:65535,ICMP (as I suspected above) - it'll be like that for a reason. It's tempting to add UDP broadcast too.

Nagios is going well monitoring both ping and HTTP on one remote server. No triggers.

vimalware · June 2019

Sourcegraph commissioned MattHolt (of caddy fame) to build this in Golang a while ago : https://github.com/sourcegraph/checkup
It looks like a healthcheck page. I don't know if it includes notification triggers.

Looks like the easiest to deploy (single binary)

Howdy, Stranger!

Categories

In this Discussion

Server Monitoring - the frustrations of Cry Wolf

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Server Monitoring - the frustrations of Cry Wolf

Comments