Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


What do you look for in an external monitoring service? Thoughts on my approach welcome. - Page 2
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

What do you look for in an external monitoring service? Thoughts on my approach welcome.

2»

Comments

  • in 20 minutes.

  • Hmm. System checks like CPU, RAM usages and an application which will keep playing a sound when system is down so I can wake up and deal with the issue if I'm sleeping and providing 24/7 support.

  • Timtimo13Timtimo13 Member
    edited December 2017

    @CreatePrivateServer said:
    Hmm. System checks like CPU, RAM usages and an application which will keep playing a sound when system is down so I can wake up and deal with the issue if I'm sleeping and providing 24/7 support.

    Depending on your notification settings, you would get a notification when your webserver is down.
    This would be mostly the same. As Mason don't wants to work with local clients, you will not be able to check CPU / RAM / disk (...) usage without changes on the client which is being checked and Masons support for the client.

    If you would need this checks, check out Nagios

  • This looks exactly what I was thinking of building for myself (for the same reasons: python+flask skills) : External service monitoring from a quorum of POPs.

    Not nitpicking, but I would have gone with Postgresql as RDMBS for a greenfield project in 2017.

    PM a url to git or architecture wiki to see if it makes sense to contribute rather than build my own.

    All the Best! :)

    Thanked by 1MasonR
  • MasonRMasonR Community Contributor

    @vimalware said:
    This looks exactly what I was thinking of building for myself (for the same reasons: python+flask skills) : External service monitoring from a quorum of POPs.

    Not nitpicking, but I would have gone with Postgresql as RDMBS for a greenfield project in 2017.

    Haven't decided on a particular database for the main node quite yet and wouldn't mind using psql as I use it quite extensively for a couple projects at work. The choice will probably come down between MariaDB and PostgreSQL.

    PM a url to git or architecture wiki to see if it makes sense to contribute rather than build my own.

    All the Best! :)

    Cheers! Will do when I get the ball rolling more. So far the only thing decided is that the monitoring nodes will have a nginx -> gunicorn -> flask setup. And to try to make them as pluggable as possible so new extensions can be added easily. Got the basic skeleton in place, but wanted to get a couple of the modules banged out first (probably ping + http response) before adding to git.

  • @Timtimo13 said:
    Maybe implement SNMP ? hmm

    Yes, sometimes there is little choice besides SNMP when it comes to monitoring stuff but voluntarily working with this abomination of a protocol? Please tell me you are joking.

    To everyone shouting nagios. I think i've seen enough of nagios (icinga) to say he is better of designing his own solution from the ground up. Lots of room for a cleaner and nicer implementation there. Sure, he won't be able to avoid running some kind of software on the targets to monitor certain things but imo it won't be hard to come up with something that single handedly beats nsclient.

  • Can you pull the CPU and ram from something like htop?

  • MasonRMasonR Community Contributor

    @AuroraZ said:
    Can you pull the CPU and ram from something like htop?

    psutil would be able to grab that info to keep everything in Pythonland. Though for this project, I'd rather stay away from user agents and the like and just focus on an external monitoring system.

    Thanked by 1vimalware
  • As long the 'runners' follow a http-based API, it lays the path for replacing the python bits with a Go binary, if anyone feels like it.

    Thanked by 1MasonR
  • MasonRMasonR Community Contributor

    @vimalware said:
    As long the 'runners' follow a http-based API, it lays the path for replacing the python bits with a Go binary, if anyone feels like it.

    That's a good point. There'd be nothing preventing someone from implementing their own monitor, even in a different language, as long as all the restful interfaces are defined.

  • @MasonR said:

    @AuroraZ said:
    Can you pull the CPU and ram from something like htop?

    psutil would be able to grab that info to keep everything in Pythonland. Though for this project, I'd rather stay away from user agents and the like and just focus on an external monitoring system.

    I was just thinking most if not all Admins install it so the info might be easy to pull. Still have it as an outside monitor because you wouldn't need to install anything special. Was just an idea.

    Thanked by 1MasonR
  • AFAI understand, original objective (and mine) was a Blackbox monitoring system in something other than PHP.

    For whitebox monitoring, lots of solutions exist.

    Thanked by 1MasonR
  • MasonRMasonR Community Contributor

    @vimalware said:
    AFAI understand, original objective (and mine) was a Blackbox monitoring system in something other than PHP.

    For whitebox monitoring, lots of solutions exist.

    Precisely. Basically an open-source python-based uptimerobot.

    Thanked by 1vimalware
  • if you can combine with log analysis & alert, would be awesome.

    there is loggly, logentry, etc. but the don't have uptime / ping monitoring.

    OOT: monitoring ladies bathroom

  • @kassle said:
    if you can combine with log analysis & alert, would be awesome.

    there is loggly, logentry, etc. but the don't have uptime / ping monitoring.

    Perhaps zabbix

    Thanked by 1kassle
  • @kassle said:
    if you can combine with log analysis & alert, would be awesome.

    graylog2 maybe?

    MasonR has a vision for a blackbox monitoring platform.

    I'd rather see a tool that does one thing very well.

    Thanked by 2MasonR kassle
  • MasonRMasonR Community Contributor

    @kassle said:
    if you can combine with log analysis & alert, would be awesome.

    there is loggly, logentry, etc. but the don't have uptime / ping monitoring.

    OOT: monitoring ladies bathroom

    Unfortunately, that's outside of the scope that this aims to accomplish. The code that is produced here wouldn't be deployed to the machines that you want monitored.

    Thanked by 1kassle
  • @MasonR said:

    @kassle said:
    if you can combine with log analysis & alert, would be awesome.

    there is loggly, logentry, etc. but the don't have uptime / ping monitoring.

    OOT: monitoring ladies bathroom

    Unfortunately, that's outside of the scope that this aims to accomplish. The code that is produced here wouldn't be deployed to the machines that you want monitored.

    i see, but with rsyslog (as major linux distro support this) no need to install extra application but extra config :)

  • If you don't mind me chiming in then I would suggest you:

    • Use sanic instead of flask , it's basically a flask-like with asynchronous abilities.

    • For HA, try to use the Zookeeper library, trust me it does wonders. It is hard to use at first, but it will go farther than what you have described. I got a lot of help from the Netflix zookeeper recipes when I started using it.

    • Use Celery to distribute your workload across multiple workers, and do not use a Redis as a broker go for RabbitMQ.

    • Last but not least, I would try to look into using Go instead of python. I know you want to sharpen your python + flask skills. However, in 2017(almost 2018) Go is the king of the hill for these kind of apps.

    Good luck bro, I hope you succeed and I will be waiting to take a look at that source code.

    Thanked by 1MasonR
  • MasonRMasonR Community Contributor

    @IAlwaysBeCoding said:
    If you don't mind me chiming in then I would suggest you:

    • Use sanic instead of flask , it's basically a flask-like with asynchronous abilities.

    Sanic looks nice and might eliminate the need for gunicorn since you can spawn multiple workers. Async is definitely a huge plus.

    • For HA, try to use the Zookeeper library, trust me it does wonders. It is hard to use at first, but it will go farther than what you have described. I got a lot of help from the Netflix zookeeper recipes when I started using it.

    I'll definitely look into Zookeeper as well -- being a complete noob to HA, I'll probably have to fiddle with a few different options out there.

    • Use Celery to distribute your workload across multiple workers, and do not use a Redis as a broker go for RabbitMQ.

    Added to the list of what to look into :)

    • Last but not least, I would try to look into using Go instead of python. I know you want to sharpen your python + flask skills. However, in 2017(almost 2018) Go is the king of the hill for these kind of apps.

    Yeah, not a bad idea. I think my initial pass (at least for the monitoring nodes) will be to use Python as that's what I'm more comfortable with. But since it'll all be API driven, as a Go exercise, I may rewrite the monitor in Go once things are up and running

    Good luck bro, I hope you succeed and I will be waiting to take a look at that source code.

    Cheers, I really appreciate your input!

    Thanked by 1IAlwaysBeCoding
  • MasonRMasonR Community Contributor
    edited January 2018

    Just a quick update --

    I've finally made some progress. The code for the monitoring nodes is pretty much good to go. All can be viewed in the git repo here:

    Types of monitoring checks implemented:

    • status - Returns status of the pyPatrol node
    • ping - Pings (via IPv4) a specified IP/hostname
    • ping6 - Pings (via IPv6) a specified IP/hostname
    • http_response - Checks the HTTP response code of a given URL
    • cert - Checks if an SSL certificate is valid or will expire within a specified threshold
    • tcp_socket - Checks if a specified IP/hostname and port are listening for connections (TCP)
    • steam_server - Checks if a Steam Server running on a specified IP/hostname and port is online

    I tried to put a good amount of effort/time into creating worthwhile documentation and coding structure. Hopefully everything is readable and easy to follow -- happy to readdress if not the case.

    RESTful API interfaces documented here.

    I'm also using Sanic's built-in unittest harness to make sure all the endpoints function and return the right response codes when passed certain data.

    Next on the todo list:

    1. Design database (probably using MariaDB or PostgreSQL)
    2. Write up job dispatch service that polls the database for jobs that need to be run (i.e. ping checks, http_response checks, etc.)
      • Will likely be another Python-based service that uses a Redis Queue
    3. Start working on web front-end
      • The last piece will be tying everything together. End product should a simple and slick front-end
    Thanked by 1vimalware
  • Where is the Yeti checker? Why do you always forget the poor Yeti?

    Looks like it might work out nicely. I like the steam feature and may use this for that feature alone.

    Thanked by 1MasonR
  • I don't want to nitpick you, but what the hell is wrong with your editor. Why is your indent 8 spaces?

    Your code looks like a go style not really a python, heck google uses 2 spaces as a normal indent, but usually everyone uses 4 spaces as indent for python. You used like 8 spaces as indent.

    Thanked by 1MasonR
  • WSSWSS Member

    He uses actual tab.

    Thanked by 2MasonR vimalware
  • Thanked by 1MasonR
  • MasonRMasonR Community Contributor
    edited January 2018

    @IAlwaysBeCoding said:
    Why is your indent 8 spaces?

    No idea. To be honest, as long as the code ran correctly and looked somewhat clean, that was good enough for me. You'll also probably find that the space indents aren't consistent between files since I did half in vi and half in sublime :P

    E: May go back and clean everything up a bit more + add more comments when it's prime time.

Sign In or Register to comment.