A really weird memory usage

SandwichBagGhost · September 2014

So I have that Strato vServer which is used for production (hosting a site infrastructure of a client). It's basically running SSHD, Nginx, MySQL, PHP and Fail2Ban (currently turned off to fix a few things).

Since the last weeks the RAM usage is getting quite weird. The whole software setup is using less memory than shown in used at free -m.

ps_mem.py

 Private  +   Shared  =  RAM used       Program

196.0 KiB + 116.5 KiB = 312.5 KiB       dbus-daemon
352.0 KiB +  90.0 KiB = 442.0 KiB       udevd
288.0 KiB + 222.5 KiB = 510.5 KiB       init
212.0 KiB + 371.5 KiB = 583.5 KiB       mysqld_safe
440.0 KiB + 375.5 KiB = 815.5 KiB       bash
652.0 KiB + 191.0 KiB = 843.0 KiB       crond
  2.5 MiB + 182.5 KiB =   2.7 MiB       php-fpm
  2.2 MiB + 703.0 KiB =   2.9 MiB       sshd (2)
  1.4 MiB +   1.7 MiB =   3.1 MiB       nginx (2)
  7.8 MiB + 333.5 KiB =   8.2 MiB       mysqld
---------------------------------
                         20.2 MiB
=================================

Free result:

             total       used       free     shared    buffers     cached
Mem:          2048        783       1264          0          0        126
-/+ buffers/cache:        656       1391
Swap:            0          0          0

Top:

top - 09:43:04 up 44 days,  2:11,  1 user,  load average: 0.00, 0.00, 0.00
Tasks:  24 total,   1 running,  23 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.7%us,  0.4%sy,  0.0%ni, 98.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   2097152k total,   803108k used,  1294044k free,        0k buffers
Swap:        0k total,        0k used,        0k free,   129844k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
    1 root      20   0  2900 1132  940 S  0.0  0.1   0:00.09 init
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kthreadd/218414
    3 root      20   0     0    0    0 S  0.0  0.0   0:00.00 khelper/2184143
    4 root      20   0     0    0    0 S  0.0  0.0   0:00.00 rpciod/2184143/
    5 root      20   0     0    0    0 S  0.0  0.0   0:00.00 rpciod/2184143/
    6 root      20   0     0    0    0 S  0.0  0.0   0:00.00 rpciod/2184143/
    7 root      20   0     0    0    0 S  0.0  0.0   0:00.00 rpciod/2184143/
    8 root      20   0     0    0    0 S  0.0  0.0   0:00.00 rpciod/2184143/
    9 root      20   0     0    0    0 S  0.0  0.0   0:00.00 rpciod/2184143/
   10 root      20   0     0    0    0 S  0.0  0.0   0:00.00 rpciod/2184143/
   11 root      20   0     0    0    0 S  0.0  0.0   0:00.00 rpciod/2184143/
   12 root      20   0     0    0    0 S  0.0  0.0   0:00.00 nfsiod/2184143
  147 root      16  -4  2464  624  296 S  0.0  0.0   0:00.01 udevd
  491 dbus      20   0  3000  528  332 S  0.0  0.0   0:00.00 dbus-daemon
 1041 root      20   0  7436 1176  580 S  0.0  0.1   0:08.84 crond
20932 root      20   0 12004 3360 2620 S  0.0  0.2   0:00.06 sshd
20934 root      20   0  6640 1716 1428 S  0.0  0.1   0:00.03 bash
20980 root      20   0 21964 1916  396 S  0.0  0.1   0:00.00 nginx
20981 nginx     20   0 22648 3352 1224 S  0.0  0.2   0:00.01 nginx
20997 root      20   0 33028 3088  968 S  0.0  0.1   0:00.02 php-fpm
21049 root      20   0  6528 1432 1224 S  0.0  0.1   0:00.03 mysqld_safe
21400 mysql     20   0 23952 9752 5516 S  0.0  0.5   0:00.16 mysqld
21450 root      20   0  2564  968  776 R  0.0  0.0   0:00.00 top
25046 root      20   0  8944 1032  524 S  0.0  0.0   0:00.00 sshd

I have been double checking things to make sure nothing else than those processes is running and everything was as expected. Only the software I installed was running and it's memory usage was way way way lower than you'd guess from the free result. No one else other than me and the client has access to the server. The client is just uploading files and from time to time testing his game servers he has written for his own games (currently he is not doing these activities though).

I don't have a clue about what the heck is going on. Not even the software that is running on the sites could use so much memory. Especially when the memory limit is 128 MB.

kkrajk · September 2014

memcache.....

innodb page file (or whatever it is called)

log file allocation etc

And I'm pretty sure you need to run fcgiwrap when you use nginx + php-fpm (Experts please correct if I am wrong) which you are not currently EDIT - Just found out - not necessarily

Or... probably something to do with the vps memory allocation itself (No idea, just wild guessing)...
I've used strato way back in 2006, they were solid with the offering and the support as well, though shared not vps

serverian · September 2014

If this is OpenVZ or some other kind chroot on stereoids, you probably got OOM'd processes which didn't get removed from the memory usage in the container itself.

Rebooting it will solve it.

To confirm, ask your provider to give you the log of OOM'd processes.

SandwichBagGhost · September 2014

@serverian it is Virtuozzo. Container OS is CentOS 6.5 32-Bit up to date. I doubt they'll give me logs as it's Strato AG a huge company in Germany and bla bla bla. They rather not give out these details even to their clients. It's not let's say like RamNode or BuyVM. It's something huge with their own DC...

I'm wondering how I got OOM when the normal memory usage used to be around 50 MB. And I repeat: I have nothing else running other than what you see in the top/ps_mem list. Fail2ban was disabled by me until I get the proper rules for the software (WP, etc...) we use together and configure it properly. I can not imagine having OOM'd processes to be honest.

Let's say if OOM wouldn't be the problem (this is really something I doubt to be the problem) what else could it be?

serverian · September 2014

@SandwichBagGhost said:
serverian it is Virtuozzo. Container OS is CentOS 6.5 32-Bit up to date. I doubt they'll give me logs as it's Strato AG a huge company in Germany and bla bla bla. They rather not give out these details even to their clients. It's not let's say like RamNode or BuyVM. It's something huge with their own DC...

I'm wondering how I got OOM when the normal memory usage used to be around 50 MB. And I repeat: I have nothing else running other than what you see in the top/ps_mem list. Fail2ban was disabled by me until I get the proper rules for the software (WP, etc...) we use together and configure it properly. I can not imagine having OOM'd processes to be honest.

Let's say if OOM wouldn't be the problem (this is really something I doubt to be the problem) what else could it be?

It's 99.99% OOM killer then. Some processes may have peaked at memory usage (An attack most likely, like sshd bruteforce if you are using default port and don't have any ACL on it.) and got killed at a time you didn't notice. Reboot and the memory usage should display correctly.

SandwichBagGhost · September 2014

I am monitoring the node with internal monitoring service of Strato AG. There was never any kind of serious attack that could have caused that. Logs of load and etc are fine. Not even the Wordpress bruteforcers managed to get everything to use so much memory. It mostly stayed under 100 MB even though the load went up of course. That stopped a while ago. Now I may get just one - five attempts that only try it a few times and then stop/get autoblocked. SSH is running on a different port and only accepts keys.

Is there any software that can log OOMs of processes?

serverian · September 2014

@SandwichBagGhost said:
I am monitoring the node with internal monitoring service of Strato AG. There was never any kind of serious attack that could have caused that. Logs of load and etc are fine. Not even the Wordpress bruteforcers managed to get everything to use so much memory. It mostly stayed under 100 MB even though the load went up of course. That stopped a while ago. Now I may get just one - five attempts that only try it a few times and then stop/get autoblocked. SSH is running on a different port and only accepts keys.

Is there any software that can log OOMs of processes?

You don't run the kernel, so you can't see its logs. Your host can see them by dmesg | grep "Out of memory in UB [your container id]"

SandwichBagGhost · September 2014

And there we have the problem. Even as their client I won't get this logs from them. Strato is really business focused. You can't even downgrade without having to cancel your old VM and ordering a totally new one (Virtuo/OpenVZ can down/upgrade on the fly actually... all they'd need to do would be to adjust the next payment). It's very complicated.

kkrajk · September 2014

just pls reboot, check and update (edit - here) like the man says...
it is not gonna take you more than a minute

SandwichBagGhost · September 2014

@ez2uk it's a production server which I am not going to reboot for this. It's pointless because I am pretty sure it would happen again anyway because the last time it went back to normal (two/three days ago) it happened again as you can see in the first post.

If you keep bumping your production servers up and down it's your thing. I will keep mine up.

GoodHosting · September 2014

@SandwichBagGhost said:
ez2uk it's a production server which I am not going to reboot for this. It's pointless because I am pretty sure it would happen again anyway because the last time it went back to normal (two/three days ago) it happened again as you can see in the first post.

You shouldn't have a "production server" in an environment where taking a single VM offline should take your entire site / whatever offline. For MySQL for example, there is a such thing called "MySQL Multi Master" which is incredibly trivial to setup, and even works over great distances (handles internet latancies quite well actually.) For nginx, there is obviously something like a load balencer, such as using the "upstream nodes" configuration option that comes with stock nginx

SandwichBagGhost · September 2014

@GoodHosting let's say it so the budget of my client is small. No way to build a cluster with this budget. The Starto vServer is rock solid. Never had any unplanned downtimes or similar things. Only node reboots because of kernel updates (announced days before it happened).

If they wouldn't have did the updates and reboots the total uptime would be over 420 days. Network is top-notch, too.

GoodHosting · September 2014

@SandwichBagGhost said:

Fair enough then, but the same uptime can be accomplished on 4 LES nodes as an example.

SandwichBagGhost · September 2014

I am not going to go any further. I just do what my client wants me to do. So if he wants a in country (Germany) solution with a dedicated IPv4 and IPv6 with a mid amount of RAM to test his stuff he will get it within his budget along with everything else he wanted (domain, etc...). Serving him since 2010 and he's always happy.

Howdy, Stranger!

Categories

In this Discussion

A really weird memory usage

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

A really weird memory usage

Comments