Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Advertise on LowEndTalk.com
VPS going down for unknown reason related to RAM
New on LowEndTalk? Please read our 'Community Rules' by clicking on it in the right menu!

VPS going down for unknown reason related to RAM

jimaekjimaek Member
edited November 2016 in Help

I have 3 VPS servers with a VPS provider. All 3 of them are in different cities.

Each server has 2GB of RAM and 1GB of swap. Centos7 installed.

The problem is that all 3 servers go down multiple times per week and the only way to fix them is to reboot from SolusVM panel.

This is how a crashed server looks like in Solus https://dl.dropboxusercontent.com/s/xpt68acvcv48ole/chrome_2016-11-17_11-51-51.png

note the null/null. Why is that?

After rebooting if I check the logs I get this https://gist.githubusercontent.com/jimaek/7d4826beb825d44a5181c28fbe1c383f/raw/7e3c18567233b7e0cce8947c4414063850e40ce4/gistfile1.txt

Lots of errors related to "Cannot allocate memory" and then my reboot that fixes everything.

Support says that its a problem with my software and not their.

  1. I have a lot of servers deployed. They all run identical software and config. Only servers with this provider have this problem
  2. Most other servers have 512MB RAM and run just fine without errors

Any ideas what is wrong? If its a problem of the provider what proof can I provide them?

Comments

  • cociucociu Member, Provider
    edited November 2016
  • Sorry forgot to mention that. Its OpenVZ

  • cociucociu Member, Provider

    i think this is a missconfigured in the node ...

  • Sounds like ginernet is overselling really bad.

  • LiteServerLiteServer Member, Provider
    edited November 2016

    Might be the host node running out of memory. Have you already contacted the provider in question with a request to dig into their logs? "Support says that its a problem with my software and not their." sounds a bit like they're just trying to move to problem to your side intead of looking to the cause.
    They should be able to pull more usable information from the logs stored on the OpenVZ host node.

    Thanked by 1sin

    LiteServer.nl - Since 2007 the place where quality meets you!
    NL located // AS60404 // KVM based NVMe, SSD and HDD Storage VPSes

  • Support:

    "we are very sorry, we have analized but we are not able to determine the source of your issue.

    Our server nodes have a lot of memory free and there is not any error in our logs."

    I considered overselling of RAM an issue as well but they denied it.

  • jimaek said: I considered overselling of RAM an issue as well but they denied it.

    I've ran into the null memory issue from when the node was oversold on ram and it was all being used.

  • "Cannot allocate memory", what else do you think can be the cause, other than the node being oversold and running out of memory? Add to that a provider who's way too busy with their intimate "analizing" process, to admit the problem and instead just lying to you.

    The root issue however is that people still use OpenVZ in 2016...

  • LiteServerLiteServer Member, Provider

    @jimaek said:
    Our server nodes have a lot of memory free and there is not any error in our logs."
    I considered overselling of RAM an issue as well but they denied it.

    They are most likely heavy overselling their nodes as "Foul" already mentioned, but it's to be expected that the host on question won't admit that.
    "is not any error in our logs." pretty much confirms that thay haven't checked their logs. The errors you have should also have shown up in the logs of the OpenVZ host node.

    LiteServer.nl - Since 2007 the place where quality meets you!
    NL located // AS60404 // KVM based NVMe, SSD and HDD Storage VPSes

  • I asked my ticket to be escalated to their management to see what happens. Thanks for your feedback.

    I wonder if they will admit or fix the problem in any way

  • How long have they been answering your ticket?

  • Took 4 days for first response(I had to remind them). After that they were very fast.

  • rds100rds100 Member
    edited November 2016

    rm_ said: Cannot allocate memory", what else do you think can be the cause, other than the node being oversold and running out of memory?

    ulimit?

    Anyway there is a lot that the user can do to try to debug the problem.

    -

  • jmginerjmginer Member, Provider
    edited November 2016

    Here our RAM oversold

    Voxility DDoS protected BGP starting from 250 EUR/month. Contact us.
    VPS in Spain ☛ 5.99€/month ★ We accept Bitcoins! ★ DMCA ignore ★
  • jmginer said: Here our RAM oversold

    Then your support must be incompetent.

    The null of null issue is coming from the host node out of memory..

  • jmginerjmginer Member, Provider
    edited November 2016

    Not any OOM error in our logs.

    [[email protected] ~]# cat /var/log/messages|grep OOM
    [[email protected] ~]#
    

    Our internal CTs running in our nodes are fine.

    Not any related issue reported from any other customer.

    We appreciate if anybody know how to debug and let us to know.

    Voxility DDoS protected BGP starting from 250 EUR/month. Contact us.
    VPS in Spain ☛ 5.99€/month ★ We accept Bitcoins! ★ DMCA ignore ★
  • I have not seen another shit company like Gigernet.

  • cociucociu Member, Provider

    Arttu_Rantanen said: I have not seen another shit company like Gigernet.

    proof ?

  • jmginer said: We appreciate if anybody know how to debug and let us to know.

    This right here explains that you don't know how to check openvz logs.

    Why are you in business?

  • PieHasBeenEatenPieHasBeenEaten Member, Moderator

    I ran into this issue before and its not a memory issue. The easy way to fix it is to delete and recreate the vps. It is what it is!

  • I guess I will try to recreate them. But after that I have no idea what else to do.

    The null/null makes me think its the provider's problem. Plus I run the exact same software on 170 servers with as low as 512MB RAM without problems. Including openvz servers.

    Anyway, I will post here if recreating will fix anything.

  • racksxracksx Member without signature

    Maybe try top or free when you have the error, so you can see your used resources.

  • @jmginer said:
    Not any OOM error in our logs.

    > [[email protected] ~]# cat /var/log/messages|grep OOM
    > [[email protected] ~]#
    > 

    Our internal CTs running in our nodes are fine.

    Not any related issue reported from any other customer.

    We appreciate if anybody know how to debug and let us to know.

    Increase his memory for one week and see if server craches again.

  • @racksx said:
    Maybe try top or free when you have the error, so you can see your used resources.

    I cant connect to the server at all when this happens. So not possible.

  • Here is my RAM usage https://dl.dropboxusercontent.com/u/13590841/ShareX/2016/11/chrome_2016-11-22_18-35-39.png

    I checked all servers and it doesn't go above 50MB. The rest is just cache which linux normally does.

    It looks like linux is caching everything it can until it fills a certain amount of RAM after which it crashes completely. It feels like linux thinks there is 2GB of RAM while in reality there is less and that results in all these errors. At least to me.

  • jmginerjmginer Member, Provider

    @WHT said:
    Increase his memory for one week and see if server craches again.

    No sense, @jimaek has a VPS with 2 GB RAM and his memory graph is a fixed line in 600MB, his server has enough memory free. No sense to add more memory.

    Other customer, in the same node, has a 4 GB RAM VPS and can reach the 100% memory without any reported issue and running hard APPs inside...

    We don't find any issue from our side. We have removed swap memory on @jimaek servers, just to give a try, but I don't expect that this solve their issue.

    Thanked by 1GCat
    Voxility DDoS protected BGP starting from 250 EUR/month. Contact us.
    VPS in Spain ☛ 5.99€/month ★ We accept Bitcoins! ★ DMCA ignore ★
  • If anyone cares the problem continues.

    https://dl.dropboxusercontent.com/s/8bu6p9ig92wfpgv/chrome_2016-12-19_15-16-40.png

    If someone has any ideas on what exactly it may be please let me know.

  • @jimaek said:
    If anyone cares the problem continues.

    https://dl.dropboxusercontent.com/s/8bu6p9ig92wfpgv/chrome_2016-12-19_15-16-40.png

    If someone has any ideas on what exactly it may be please let me know.

    Just ask the provider to destroy your container and recreate it manually from SolusVM admin?

    My comments are mine and mine alone, and do not reflect the opinion of my business

  • This is totally an overselling issue. Paxhosting is having the exact same symptoms right now.

  • Bad memory module on host node maybe?

  • PieHasBeenEatenPieHasBeenEaten Member, Moderator

    @stefeman please explain how you know this overselling? Do you have root access to the node? Do you know how much memory and diskspace is in use and free? Just because it smells like a pig doesn't mean its a pig.

  • @PieNotEvenEaten said:
    Just because it smells like a pig doesn't mean its a pig.

    Now I'm curious. What smells like a pig but not a pig? Please provide examples.

  • PieHasBeenEatenPieHasBeenEaten Member, Moderator

    @yura i smell like a pig but im not a pig.

  • @PieNotEvenEaten said:
    @yura i smell like a pig but im not a pig.

    And how do we know that?

    Thanked by 2Yura Dumbledore
  • @PieNotEvenEaten said:
    @yura i smell like a pig but im not a pig.

    Personal anecdotes are hardly a convincing evidence. But I appreciate your honesty and personal story.

    Thanked by 1deadbeef
  • stefemanstefeman Member
    edited December 2016

    @PieNotEvenEaten said:

    @stefeman please explain how you know this overselling? Do you have root access to the node? Do you know how much memory and diskspace is in use and free? Just because it smells like a pig doesn't mean its a pig.

    If you balloon the memory on host node and peak load > RAM capacity, VM's will go down randomly and the end user machines will receive allocation errors. As for why I suspect this to be the case, is because i've done this disgraceful thing called "overloading" myself when aiming for better profits lol. Though what I resold was a 29€/m E3 dedi from online.net via hackforums. A typical summerhost one might say.. Not long after I ran out of users and refunded the remaining one and bailed, but it was a worthy experience.

    Thanked by 1deadbeef
  • @stefeman said:
    This is totally an overselling issue. Paxhosting is having the exact same symptoms right now.

    Did you purchase from Paxhosting? Aren't they the guys that @gcat made a post about?

  • jarjar Provider
    edited December 2016

    After reading this thread it seems weird to me that no one has suggested this possibility:

    That the VM is running out of memory and the host is not graphing the memory usage correctly.

    Someone care to fill me in on why this isn't the default assumption? Graphs, especially in solusvm, are not perfect. A VM running out of memory is usually just that...a VM running out of memory.

    The whole "it doesn't do this elsewhere" is an interesting thing to note but it's not a valid point when used to draw the conclusion that it can't be what's happening here. If all servers worked 100% consistently the same or differences were clear and predictable, sysadmins would have less work to do.

    MagicSpam blackmails providers into buying their software, and ServerHub is a professional spam organization.

  • not a blame post, actual help



    Write a simple C program that allocates some RAM every few seconds (I'm sure there are already programs that do this, just look it up on Google). Make sure there are messages showing how much memory is being allocated. Monitor the container and see if you're getting the same error messages you're getting it now. If you get those messages before your allocated memory then you should let your host know. You can also mount a part of your RAM as disk (ramdisk) and just put files in that mount point to see if you're getting expected behavior.

    Thanked by 1deadbeef
  • I created a ramdisk of 2GB and filled it. I was able to use 100% of RAM without crashing the server.

    Maybe I was lucky and if I repeat this test in 2 days it will crash. I will try to repeat it again when I can.

  • Nov 17 03:57:01 madrid-ginernet systemd[1]: Failed to create cgroup /user.slice/user-0.slice/session-159938525.scope: Cannot allocate memory
    Nov 17 03:57:01 madrid-ginernet systemd[1]: Failed to start Session 159938525 of user root.
    Nov 17 03:57:01 madrid-ginernet systemd-logind[113]: Failed to save session data /run/systemd/sessions/159938525: Cannot allocate memory
    Nov 17 03:57:01 madrid-ginernet systemd-logind[113]: Failed to save user data /run/systemd/users/0: Cannot allocate memory
    Nov 17 03:57:01 madrid-ginernet systemd-logind[113]: Failed to save session data /run/systemd/sessions/159938525: Cannot allocate memory
    Nov 17 03:57:01 madrid-ginernet crond[10235]: pam_systemd(crond:session): Failed to create session: Start job for unit session-159938525.scope failed with 'failed'
    

    Is it me, or is that session number -really- high.

    Thanked by 1doghouch
  • Amazing how many people blame the host and they do not show slightest doubt that some thing else might be wrong :)

    Thanked by 1jar
  • OVZ is a weird beast. It is a collection of patches stitched together with an unpredictible behaviour.
    We also had many problems with it until we understood that small containers in large numbers lead to many threads and soft lockups, crazy wa while iotop shows KBs and not MBs, however, such a problem i see for the first time.

    That does not mean it is impossible, we now have an issue with a KVM which locks up so badly that cannot be killed in any way, save for restarting the node. It happens only for that customer and only for that installation, even though he has multiple DNS servers with us. It also only happens in Debian installations on the node, not Centos or Ubuntu, after a few days trying to debug and 2 nodes restarts, we ended up giving him an IWStack instance when it locked the third time in another VM we gave as a replacement.
    In hundreds of nodes and thousands of customers, you should expect the unexpected, nobody can test al the combinations out there node and VM side.

    Thanked by 1vimalware

    Extremist conservative user, I wish to preserve human and civil rights, free speech, freedom of the press and worship, rule of law, democracy, peace and prosperity, social mobility, etc. Now you can draw your guns.

  • That's why I only do business with big companies

Sign In or Register to comment.