Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Anybody facing network issue in VirMach?
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Anybody facing network issue in VirMach?

Is anybody facing network problem on VirMach VPS this month? I am on node NY10GKVM62 and my one website is facing daily downtimes of 2 to 5 mins. This has never happened before so my PMS is off the charts. Is the end neigh?

Hetrix Tools screenshot

Comments

  • yoursunnyyoursunny Member, IPv6 Advocate
    edited August 2021

    Are you testing ICMP, TCP, or HTTP?
    How did you determine this is network issue, instead of server side application issue?
    For example, a daily backup job that locks the database for too long could cause an HTTP request that accesses that database to fail.

    The end is not neigh, as @VirMach does not offer IPv6 yet and thus does not need neighbor discovery protocol.

    Thanked by 1alilet
  • My mailserver with them on another NY node is fine: https://hetrixtools.com/report/uptime/18dfa185a0dfcca7df57a055540744fb/

    Thanked by 1alilet
  • also have some VPS on other nodes in NY (NY10GKVMXX) and no hetrixtools alerts.

    Thanked by 1alilet
  • deankdeank Member, Troll

    OP mentioned PMS.
    Therefore, I was notified of PMS.

    Will there be PMS?
    We shall see.

    Thanked by 1alilet
  • @yoursunny said:
    Are you testing ICMP, TCP, or HTTP?
    How did you determine this is network issue, instead of server side application issue?
    For example, a daily backup job that locks the database for too long could cause an HTTP request that accesses that database to fail.

    The end is not neigh, as @VirMach does not offer IPv6 yet and thus does not need neighbor discovery protocol.

    It's HTTP monitor that checks for a certain word on home page. Checking from 4 different locations. I also personally faced it i.e. as soon as I received downtime email, I opened my website and it was down.
    Just after posting this thread there has been 3 more downtimes where two of them were 4 mins each.

    Thanked by 1kkrajk
  • dfroedfroe Member, Host Rep

    @alilet said: It's HTTP monitor

    Then don't title this thread with 'network issues'. ;)

    You should first probe on L3 before stepping up to the higher OSI layers.
    L7 isn't called the 'network layer'. ;)

    Check if you encounter similiar issues with an ICMP/Ping probe at the same time.

    Thanked by 1alilet
  • JabJabJabJab Member
    edited August 2021

    I think hetrixtools shows the error code in details - if it was timeout, network issue, 50X from your webserver? Make sure those are not 404/50X from your code/script/network. No DNS issues aka sometimes ends on wrong server due to duplicated A entries?

    it was down

    it's like the worst description ever, same sending pigeon letter to mechanic and everything you write it's "my car not worky" :D

    Thanked by 1alilet
  • Didn't know you can check error detail in Hetrix Tools :#

    The log says Error 28: Operation timed out after 10001 milliseconds with 0 bytes received (10002ms)

  • Opened ticket yesterday and today there is no downtime so far. Haven't received response on ticket yet but looks like they fixed something.

  • As per suggestion from VirMach I added a ping monitor and next time when error came I noticed that ping monitor is working 100% fine while website monitors throws error. I am using Clourflare and the actual error is 504 Gateway timeout so I guess something is wrong with my server? I am using nginx on Debian and when I run top then it shows that machine is working fine with no load issues. May be I need to check nginx logs!?

  • Here's error from log

    2021/08/20 06:18:28 [error] 14657#14657: *1927970 upstream timed out (110: Connection timed out) while reading response header from upstream, client: xx.xx.xx.xx, server: www.testing.com, request: "GET / HTTP/1.1", upstream: "fastcgi://unix:/var/run/php/php7.4-fpm.sock", host: "www.testing.com"

  • @alilet said:
    Here's error from log

    2021/08/20 06:18:28 [error] 14657#14657: *1927970 upstream timed out (110: Connection timed out) while reading response header from upstream, client: xx.xx.xx.xx, server: www.testing.com, request: "GET / HTTP/1.1", upstream: "fastcgi://unix:/var/run/php/php7.4-fpm.sock", host: "www.testing.com"

    Some request is taking too long to be processed. Always helpful to check if something had overloaded the server during that time.

  • aliletalilet Member
    edited August 2021

    I restarted machine thinking it may fix the issue but it didn't.

    Here's screenshot of when everything was calm:

    And here's screenshot from exact moment when site went down:

    I wonder what is causing that sudden CPU spike. It's a 4 core VPS with 8GB RAM. Don't know what nginx/PHP setting/config shall I use to fix this issue. The issue is with fastcgi. I can increase fastcgi_read_timeout timeout but this doesn't seem like the right solution.

  • Enable more detail CPU Usage on htop to display IOwait % and steal %

  • MaouniqueMaounique Host Rep, Veteran
    edited August 2021

    I think mysql generates a lot of wa and php waits for it. There might be some layer 7 issue, either an attack or some misconfigured crawler.

  • @chocolateshirt said:
    Enable more detail CPU Usage on htop to display IOwait % and steal %

    I enabled this setting but still I don't see any more CPU detail in htop. It is still showing same information.

  • aliletalilet Member
    edited August 2021

    @Maounique said:
    I think mysql generates a lot of wa and php waits for it. There might be some layer 7 issue, either an attack or some misconfigured crawler.

    I have disabled a couple of plugins (it's WordPress/WooCommerce) so let's see. Here's my nginx config:

    client_body_buffer_size 10K;
    client_header_buffer_size 1k;
    client_max_body_size 8m;
    large_client_header_buffers 4 16k;
    client_body_timeout 12;
    client_header_timeout 12;
    #keepalive_timeout 15;
    send_timeout 10;
    fastcgi_buffers 16 16k;
    fastcgi_buffer_size 32k;
    
  • edited August 2021

    @alilet said:

    @chocolateshirt said:
    Enable more detail CPU Usage on htop to display IOwait % and steal %

    I enabled this setting but still I don't see any more CPU detail in htop. It is still showing same information.

    You need to add CPU average, and press space button on keyboard to change the type

    Thanked by 1alilet
  • After more than 2 months I finally found ut what the issue is. It was related to pm.max_children which was using default value of 2. I changed its value along with other related parameters according to my RAM and now situation is much better.

    There is almost no error except occasional issue which lasts about a second or 2 only unlike previous cases of 2+ minutes. I am still tweaking it a little bit as we speak.

    Thanked by 1chocolateshirt
Sign In or Register to comment.