New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
kernel:NMI watchdog: BUG: Soft lockup- CPU#3 stuck for 22s! [mysqld:4001920]
Hi there,
Since 3 weeks my server have gotten unresponsive till the point I had to reboot in order to get it back working, today, I have been able to diagnose the issue:
message from syslogd@servername at [date]
kernel:NMI watchdog: BUG: Soft lockup- CPU#3 stuck for 22s! [mysqld:4001920]
I assume this is due not enough CPU to handle MySQL but that is strange, server is usually only using 25% CPU, what elso could it be?
Comments
As far as I know, softlock means the lock appears on kernel level - so it might be related to an I/O issue or high wait time.
Are your running on a dedicated or a virtual server? Maybe it could be related to some kind of overcommitment as well, were there are simply no more ressources available.
Lemme guess, quadcore cpu? 25% means 1 full core then.
It is a dedicated server
What's your kernel and distribution? I'll throw the danger money on a 2.6 based CentOS 6.
CloudLinux 7.6
3.10.0-962.3.2.lve1.5.24.9.el7.x86_64
Well damn, there goes that theory, even if it is CentOS (kind-of). You may have a hardware issue, but I'd check your current RAM and HDD before going any further- if you're not perpetually running swap, then I'd start looking into hardware.
Only using around 20-30% RAM and 0% SWAP, and using 45% of disk space. What's the best wait of looking HDD health?
NMI is likely to be a hardware problem. Open a ticket with the provider.
This will be a good start to understand CPU soft lockup http://www.inetservicescloud.com/knowledgebase/what-is-a-cpu-soft-lockup/
If sata, install smartmontools (smartmon-tools sometimes), and run smartctl -a /dev/sd(a, b, c, etc..)
This is a darn good chance to be either CPU or Motherboard, though. I'd suggest syncing everything and running stress to see what you end up.
I have also checked if Raid 1 is working fine and seems to be doing it as well
All tests passed, I guess I will need to get in touch with @Hetzner_OL? Hardware issue?
Shutdown and backup your DB, because they'll probably nuke your box.
I'd install 'stress' and let it run for awhile, and turn it over to them if it hangs up again.
No, she's in a different part of the company than the part that deals with stuff like this. Open a ticket.
Yeah I meant opening a ticket thanks
I do daily backups thanks, will also run stress, thanks for the help
Can you share the logs after :
message from syslogd@servername at [date]
kernel:NMI watchdog: BUG: Soft lockup- CPU#3 stuck for 22s! [mysqld:4001920]
? Maybe in pastebin etc, just remove some sensitive information
Sometimes cause of lockup is in log after the warning from NMI watchdog.
As I rebooted there is not much information, however in March 11 log there is something I do not understand prior the downtime: https://pastebin.com/mDtrwW0B