VPS going down for unknown reason related to RAM

jimaek · November 2016

I have 3 VPS servers with a VPS provider. All 3 of them are in different cities.

Each server has 2GB of RAM and 1GB of swap. Centos7 installed.

The problem is that all 3 servers go down multiple times per week and the only way to fix them is to reboot from SolusVM panel.

This is how a crashed server looks like in Solus https://dl.dropboxusercontent.com/s/xpt68acvcv48ole/chrome_2016-11-17_11-51-51.png

note the null/null. Why is that?

After rebooting if I check the logs I get this https://gist.githubusercontent.com/jimaek/7d4826beb825d44a5181c28fbe1c383f/raw/7e3c18567233b7e0cce8947c4414063850e40ce4/gistfile1.txt

Lots of errors related to "Cannot allocate memory" and then my reboot that fixes everything.

Support says that its a problem with my software and not their.

I have a lot of servers deployed. They all run identical software and config. Only servers with this provider have this problem
Most other servers have 512MB RAM and run just fine without errors

Any ideas what is wrong? If its a problem of the provider what proof can I provide them?

cociu · November 2016

what is this ? kvm or openvz ?

cociu · November 2016

also you can delete the "madrid-ginernet"

jimaek · November 2016

Sorry forgot to mention that. Its OpenVZ

cociu · November 2016

i think this is a missconfigured in the node ...

Foul · November 2016

Sounds like ginernet is overselling really bad.

LiteServer · November 2016

Might be the host node running out of memory. Have you already contacted the provider in question with a request to dig into their logs? "Support says that its a problem with my software and not their." sounds a bit like they're just trying to move to problem to your side intead of looking to the cause.
They should be able to pull more usable information from the logs stored on the OpenVZ host node.

jimaek · November 2016

Support:

"we are very sorry, we have analized but we are not able to determine the source of your issue.

Our server nodes have a lot of memory free and there is not any error in our logs."

I considered overselling of RAM an issue as well but they denied it.

Foul · November 2016

jimaek said: I considered overselling of RAM an issue as well but they denied it.

I've ran into the null memory issue from when the node was oversold on ram and it was all being used.

rm_ · November 2016

"Cannot allocate memory", what else do you think can be the cause, other than the node being oversold and running out of memory? Add to that a provider who's way too busy with their intimate "analizing" process, to admit the problem and instead just lying to you.

The root issue however is that people still use OpenVZ in 2016...

LiteServer · November 2016

@jimaek said:
Our server nodes have a lot of memory free and there is not any error in our logs."
I considered overselling of RAM an issue as well but they denied it.

They are most likely heavy overselling their nodes as "Foul" already mentioned, but it's to be expected that the host on question won't admit that.
"is not any error in our logs." pretty much confirms that thay haven't checked their logs. The errors you have should also have shown up in the logs of the OpenVZ host node.

jimaek · November 2016

I asked my ticket to be escalated to their management to see what happens. Thanks for your feedback.

I wonder if they will admit or fix the problem in any way

Streamer · November 2016

How long have they been answering your ticket?

jimaek · November 2016

Took 4 days for first response(I had to remind them). After that they were very fast.

rds100 · November 2016

rm_ said: Cannot allocate memory", what else do you think can be the cause, other than the node being oversold and running out of memory?

ulimit?

Anyway there is a lot that the user can do to try to debug the problem.

jmginer · November 2016

Here our RAM oversold

Foul · November 2016

jmginer said: Here our RAM oversold

Then your support must be incompetent.

The null of null issue is coming from the host node out of memory..

jmginer · November 2016

Not any OOM error in our logs.

[root@bcn1-ovz1 ~]# cat /var/log/messages|grep OOM
[root@bcn1-ovz1 ~]#

Our internal CTs running in our nodes are fine.

Not any related issue reported from any other customer.

We appreciate if anybody know how to debug and let us to know.

Arttu_Rantanen · November 2016

I have not seen another shit company like Gigernet.

cociu · November 2016

Arttu_Rantanen said: I have not seen another shit company like Gigernet.

proof ?

Foul · November 2016

jmginer said: We appreciate if anybody know how to debug and let us to know.

This right here explains that you don't know how to check openvz logs.

Why are you in business?

Arttu_Rantanen · November 2016

@cociu

cociu · November 2016

Arttu_Rantanen said: @cociu

good one !

PieHasBeenEaten · November 2016

I ran into this issue before and its not a memory issue. The easy way to fix it is to delete and recreate the vps. It is what it is!

jimaek · November 2016

I guess I will try to recreate them. But after that I have no idea what else to do.

The null/null makes me think its the provider's problem. Plus I run the exact same software on 170 servers with as low as 512MB RAM without problems. Including openvz servers.

Anyway, I will post here if recreating will fix anything.

racksx · November 2016

Maybe try top or free when you have the error, so you can see your used resources.

WHT · November 2016

@jmginer said:
Not any OOM error in our logs.

> [root@bcn1-ovz1 ~]# cat /var/log/messages|grep OOM
> [root@bcn1-ovz1 ~]#
>

Our internal CTs running in our nodes are fine.

Not any related issue reported from any other customer.

We appreciate if anybody know how to debug and let us to know.

Increase his memory for one week and see if server craches again.

jimaek · November 2016

@racksx said:
Maybe try top or free when you have the error, so you can see your used resources.

I cant connect to the server at all when this happens. So not possible.

jimaek · November 2016

Here is my RAM usage https://dl.dropboxusercontent.com/u/13590841/ShareX/2016/11/chrome_2016-11-22_18-35-39.png

I checked all servers and it doesn't go above 50MB. The rest is just cache which linux normally does.

It looks like linux is caching everything it can until it fills a certain amount of RAM after which it crashes completely. It feels like linux thinks there is 2GB of RAM while in reality there is less and that results in all these errors. At least to me.

jmginer · November 2016

@WHT said:
Increase his memory for one week and see if server craches again.

No sense, @jimaek has a VPS with 2 GB RAM and his memory graph is a fixed line in 600MB, his server has enough memory free. No sense to add more memory.

Other customer, in the same node, has a 4 GB RAM VPS and can reach the 100% memory without any reported issue and running hard APPs inside...

We don't find any issue from our side. We have removed swap memory on @jimaek servers, just to give a try, but I don't expect that this solve their issue.

jimaek · December 2016

If anyone cares the problem continues.

https://dl.dropboxusercontent.com/s/8bu6p9ig92wfpgv/chrome_2016-12-19_15-16-40.png

If someone has any ideas on what exactly it may be please let me know.

Howdy, Stranger!

Categories

In this Discussion

VPS going down for unknown reason related to RAM

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

VPS going down for unknown reason related to RAM

Comments