Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Kernel panic on Hetzner
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Kernel panic on Hetzner

tommmytommmy Member
edited December 2020 in Help

Hi. I recently reinstalled my Ubuntu server which had an awesome uptime. I installed Debian.

Now it has been 2 times my Debian getting kernel panic. Both happen around the same time, 0420-0425 GMT+8 based on monitoring tool reports. Asimple reboot will take the server up again.

This is all I got from KVM console. There is no error in logs inside /var/log either.

Can anyone tell me what I need to do? Since the server run normally again after a reboot, and no logs can help, I think this will be hard. I just started to love Debian Buster and now this happen.

Thanks.

Comments

  • SplitIceSplitIce Member, Host Rep

    The important part of a kernel panic is the stack trace is above your output.

    I recommend setting up netconsole to capture it.

  • Contact hetzner for verification that this was not caused by hardware issues.

  • @tommmy said: There is no error in logs inside /var/log either

    Did you check kern.log and dmesg? It may not be in syslog but sometimes you'll find it in kern.log

  • SplitIceSplitIce Member, Host Rep

    @Daniel15 said: Did you check kern.log and dmesg? It may not be in syslog but sometimes you'll find it in kern.log

    It's a panic. That kernel be dead.

    netconsole is what he needs.

    Thanked by 2eva2000 vimalware
  • @SplitIce said:

    @Daniel15 said: Did you check kern.log and dmesg? It may not be in syslog but sometimes you'll find it in kern.log

    It's a panic. That kernel be dead.

    netconsole is what he needs.

    Oh yeah, you're right. I skimmed the post too quickly and thought it may have been a CPU bug / softlock (which is usually written to kern.log).

  • NetDynamics24NetDynamics24 Member, Host Rep

    Two questions if I may ask:
    1. What are the server specs?
    2. Why did you abandon Ubuntu?

  • @SplitIce said:
    The important part of a kernel panic is the stack trace is above your output.

    I recommend setting up netconsole to capture it.

    Thanks. This looks complex. I'll take my time reading the tutorial.

  • @LTniger said:
    Contact hetzner for verification that this was not caused by hardware issues.

    Do I need to wait for the kernel panic to occur?

  • @tommmy said:

    @LTniger said:
    Contact hetzner for verification that this was not caused by hardware issues.

    Do I need to wait for the kernel panic to occur?

    No. You need to provide them clear logs. Previously there was a reports about random kp's on specific line of hetz servers.

  • @NetDynamics24 said:
    Two questions if I may ask:
    1. What are the server specs?
    2. Why did you abandon Ubuntu?

    1. i7-4770 32GB 3x2TB
    2. I was using the server since 18.04 was the latest LTS. Since the server itself is for my personal use, I tested a lot of stuff. At some point it stopped booting. I actually contacted Hetzner after that to confirm whether the problem if from the system or hardware. Hetzner team did some checks and told me there is nothing wrong. But they noticed the system booted so slow. So they moved the disks to a new server as a precautionary step. Since then I took the chance to try Debian instead.
  • @LTniger said:

    @tommmy said:

    @LTniger said:
    Contact hetzner for verification that this was not caused by hardware issues.

    Do I need to wait for the kernel panic to occur?

    No. You need to provide them clear logs. Previously there was a reports about random kp's on specific line of hetz servers.

    By logs, which logs you are talking about? I checked every single logs in /var/log and I can only see the log before the system went offline and after the server booted. There is a gap between them where the server went offline.

  • @tommmy said:

    @LTniger said:

    @tommmy said:

    @LTniger said:
    Contact hetzner for verification that this was not caused by hardware issues.

    Do I need to wait for the kernel panic to occur?

    No. You need to provide them clear logs. Previously there was a reports about random kp's on specific line of hetz servers.

    By logs, which logs you are talking about? I checked every single logs in /var/log and I can only see the log before the system went offline and after the server booted. There is a gap between them where the server went offline.

    I believe you have to turn on kernel dumping in order for it to save a log of the time of the crash, otherwise it won't save it at all.

    See: https://www.linuxjournal.com/content/oops-debugging-kernel-panics-0

    Thanked by 1NetDynamics24
  • I actually> @Trav said:

    @tommmy said:

    @LTniger said:

    @tommmy said:

    @LTniger said:
    Contact hetzner for verification that this was not caused by hardware issues.

    Do I need to wait for the kernel panic to occur?

    No. You need to provide them clear logs. Previously there was a reports about random kp's on specific line of hetz servers.

    By logs, which logs you are talking about? I checked every single logs in /var/log and I can only see the log before the system went offline and after the server booted. There is a gap between them where the server went offline.

    I believe you have to turn on kernel dumping in order for it to save a log of the time of the crash, otherwise it won't save it at all.

    See: https://www.linuxjournal.com/content/oops-debugging-kernel-panics-0

    Thanks. I am trying netconsole right now since it look much simpler on Ubuntu wiki page.

  • momkinmomkin Member
    edited December 2020

    Most likley your problem is caused by r8169 driver try to downgrade to r8168 buy following this tutorial :

    https://community.hetzner.com/tutorials/installing-the-r8168-driver?title=Installation_des_r8168-Treibers/en

    Thanked by 2Falzo vimalware
  • @momkin said:
    Most likley your problem is caused by r8169 driver try to downgrade to r8168 buy following this tutorial :

    https://community.hetzner.com/tutorials/installing-the-r8168-driver?title=Installation_des_r8168-Treibers/en

    Thanks! I am not sure what is the issue since the kernel panic hasn't occurred yet since I set the netconsole up. That is weird because before this it happened twice around the same time at different day. I'll refer to this link if that is the case.

  • @momkin said:
    Most likley your problem is caused by r8169 driver try to downgrade to r8168 buy following this tutorial :

    https://community.hetzner.com/tutorials/installing-the-r8168-driver?title=Installation_des_r8168-Treibers/en

    Seems that the bug mentioned in that article is for kernel <4.17. Mine is 4.19.0-13-amd64. Should I try it anyway?

  • tommmytommmy Member
    edited December 2020

    I captured the log. This is the trimmed version, with IP removed. Log is 3.6MiB so I think having it in a file is better.
    https://mega.nz/file/GBx0la5T#FYVhjy2nUeL8zJksCPTyUmFqMyghXDpFBg13cPvEOoI

    Logs earlier than the timestamp in log above only contains

    Dec 13 08:19:32 xxx.xxx.xxx.xxx [ 45.369468] systemd-journald[347]: Journal effective settings seal=no compress=yes compress_threshold_bytes=512B

    so I removed that.

    Does anyone know what is going on in this log? I don't understand at all. Should I forward this to Hetzner?

  • tommmytommmy Member
    edited December 2020

    I forgot to update. I asked Hetzner to do a full hardware check up and this is their replies after 10hours of checking.

    The full hardware check of your server has finished now:
    ----------------->%-----------------
    DMESG: Ok
    CPUFREQ-CHECK: Ok
    STRESSTEST-CPU-TEMP: Warning
    FANCHECK: Ok
    STRESSTEST: Ok
    MCE-CHECK: Ok

    HDDTEST Z1X0EEC5: Ok
    HDDTEST Z1X0EEQC: Ok
    HDDTEST Z1Y0ASCM: Ok
    -----------------%<-----------------
    We have replaced the CPU-cooler and booted the server back into the installed OS.

    That was 4 days ago. In these 4 days, I had no kernel panic. I am not sure if that is the cause or what. but I will continue watching.

  • NetDynamics24NetDynamics24 Member, Host Rep

    You can use the command 'sensors' to monitor it.

  • VitalyVitaly Member, Host Rep

    unfortunately, even large hoster provider have difficulty some times.
    But, Hetzner are good provider, most uptime.

    Also, try to force storage check with fsck,
    login to hoster console and write grub fsck values:

    fsck.mode=force fsck.repair=yes
    then boot.

Sign In or Register to comment.