Anybody facing network issue in VirMach?

alilet · August 2021

Is anybody facing network problem on VirMach VPS this month? I am on node NY10GKVM62 and my one website is facing daily downtimes of 2 to 5 mins. This has never happened before so my PMS is off the charts. Is the end neigh?

Hetrix Tools screenshot

yoursunny · August 2021

Are you testing ICMP, TCP, or HTTP?
How did you determine this is network issue, instead of server side application issue?
For example, a daily backup job that locks the database for too long could cause an HTTP request that accesses that database to fail.

The end is not neigh, as @VirMach does not offer IPv6 yet and thus does not need neighbor discovery protocol.

tr1cky · August 2021

My mailserver with them on another NY node is fine: https://hetrixtools.com/report/uptime/18dfa185a0dfcca7df57a055540744fb/

JabJab · August 2021

also have some VPS on other nodes in NY (NY10GKVMXX) and no hetrixtools alerts.

deank · August 2021

OP mentioned PMS.
Therefore, I was notified of PMS.

Will there be PMS?
We shall see.

alilet · August 2021

@yoursunny said:
Are you testing ICMP, TCP, or HTTP?
How did you determine this is network issue, instead of server side application issue?
For example, a daily backup job that locks the database for too long could cause an HTTP request that accesses that database to fail.

The end is not neigh, as @VirMach does not offer IPv6 yet and thus does not need neighbor discovery protocol.

It's HTTP monitor that checks for a certain word on home page. Checking from 4 different locations. I also personally faced it i.e. as soon as I received downtime email, I opened my website and it was down.
Just after posting this thread there has been 3 more downtimes where two of them were 4 mins each.

dfroe · August 2021

@alilet said: It's HTTP monitor

Then don't title this thread with 'network issues'.

You should first probe on L3 before stepping up to the higher OSI layers.
L7 isn't called the 'network layer'.

Check if you encounter similiar issues with an ICMP/Ping probe at the same time.

JabJab · August 2021

I think hetrixtools shows the error code in details - if it was timeout, network issue, 50X from your webserver? Make sure those are not 404/50X from your code/script/network. No DNS issues aka sometimes ends on wrong server due to duplicated A entries?

it was down

it's like the worst description ever, same sending pigeon letter to mechanic and everything you write it's "my car not worky"

alilet · August 2021

Didn't know you can check error detail in Hetrix Tools

The log says Error 28: Operation timed out after 10001 milliseconds with 0 bytes received (10002ms)

alilet · August 2021

Opened ticket yesterday and today there is no downtime so far. Haven't received response on ticket yet but looks like they fixed something.

alilet · August 2021

As per suggestion from VirMach I added a ping monitor and next time when error came I noticed that ping monitor is working 100% fine while website monitors throws error. I am using Clourflare and the actual error is 504 Gateway timeout so I guess something is wrong with my server? I am using nginx on Debian and when I run top then it shows that machine is working fine with no load issues. May be I need to check nginx logs!?

alilet · August 2021

Here's error from log

2021/08/20 06:18:28 [error] 14657#14657: *1927970 upstream timed out (110: Connection timed out) while reading response header from upstream, client: xx.xx.xx.xx, server: www.testing.com, request: "GET / HTTP/1.1", upstream: "fastcgi://unix:/var/run/php/php7.4-fpm.sock", host: "www.testing.com"

bulbasaur · August 2021

@alilet said:
Here's error from log

2021/08/20 06:18:28 [error] 14657#14657: *1927970 upstream timed out (110: Connection timed out) while reading response header from upstream, client: xx.xx.xx.xx, server: www.testing.com, request: "GET / HTTP/1.1", upstream: "fastcgi://unix:/var/run/php/php7.4-fpm.sock", host: "www.testing.com"

Some request is taking too long to be processed. Always helpful to check if something had overloaded the server during that time.

alilet · August 2021

I restarted machine thinking it may fix the issue but it didn't.

Here's screenshot of when everything was calm:

And here's screenshot from exact moment when site went down:

I wonder what is causing that sudden CPU spike. It's a 4 core VPS with 8GB RAM. Don't know what nginx/PHP setting/config shall I use to fix this issue. The issue is with fastcgi. I can increase fastcgi_read_timeout timeout but this doesn't seem like the right solution.

chocolateshirt · August 2021

Enable more detail CPU Usage on htop to display IOwait % and steal %

Maounique · August 2021

I think mysql generates a lot of wa and php waits for it. There might be some layer 7 issue, either an attack or some misconfigured crawler.

alilet · August 2021

@chocolateshirt said:
Enable more detail CPU Usage on htop to display IOwait % and steal %

I enabled this setting but still I don't see any more CPU detail in htop. It is still showing same information.

alilet · August 2021

@Maounique said:
I think mysql generates a lot of wa and php waits for it. There might be some layer 7 issue, either an attack or some misconfigured crawler.

I have disabled a couple of plugins (it's WordPress/WooCommerce) so let's see. Here's my nginx config:

client_body_buffer_size 10K;
client_header_buffer_size 1k;
client_max_body_size 8m;
large_client_header_buffers 4 16k;
client_body_timeout 12;
client_header_timeout 12;
#keepalive_timeout 15;
send_timeout 10;
fastcgi_buffers 16 16k;
fastcgi_buffer_size 32k;

chocolateshirt · August 2021

@alilet said:

@chocolateshirt said:
Enable more detail CPU Usage on htop to display IOwait % and steal %

I enabled this setting but still I don't see any more CPU detail in htop. It is still showing same information.

You need to add CPU average, and press space button on keyboard to change the type

alilet · October 2021

After more than 2 months I finally found ut what the issue is. It was related to pm.max_children which was using default value of 2. I changed its value along with other related parameters according to my RAM and now situation is much better.

There is almost no error except occasional issue which lasts about a second or 2 only unlike previous cases of 2+ minutes. I am still tweaking it a little bit as we speak.

Howdy, Stranger!

Categories

In this Discussion

Anybody facing network issue in VirMach?

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Anybody facing network issue in VirMach?

Comments