Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Debian/ hard drive missing
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Debian/ hard drive missing

dnwkdnwk Member

Not sure when it happens, but I suddenly found /dev/sda and /dev/sdc cannot be found. /dev/sdb is working fine. How do I find any trace of what happened?

Comments

  • Server still running? Check the logs!

  • Can you run df -h please?

  • dnwkdnwk Member

    @noosVPS said:
    Server still running? Check the logs!

    Yes. Still Running. Is there any keywords I should search for?

    @0xdragon said:
    Can you run df -h please?

    df -h only show me the partition that's working right now

  • @dnwk said:
    df -h only show me the partition that's working right now

    Check dmesg.

  • dnwkdnwk Member
    edited August 2014

    Find it

    Aug 5 21:21:14 kvm kernel: ata1: exception Emask 0x50 SAct 0x0 SErr 0x4090800 action 0xe frozen

    Aug 5 21:21:14 kvm kernel: ata1: irq_stat 0x00400040, connection status changed

    Aug 5 21:21:14 kvm kernel: ata1: SError: { HostInt PHYRdyChg 10B8B DevExch }

    Aug 5 21:21:14 kvm kernel: ata1: hard resetting link

    Aug 5 21:21:14 kvm kernel: ata3: exception Emask 0x50 SAct 0x0 SErr 0x4090800 action 0xe frozen

    Aug 5 21:21:14 kvm kernel: ata3: irq_stat 0x00400040, connection status changed

    Aug 5 21:21:14 kvm kernel: ata3: SError: { HostInt PHYRdyChg 10B8B DevExch }

    Aug 5 21:21:14 kvm kernel: ata3: hard resetting link

    Aug 5 21:21:14 kvm kernel: ata3: SATA link down (SStatus 0 SControl 300)

    Aug 5 21:21:19 kvm kernel: ata3: hard resetting link

    Aug 5 21:21:19 kvm kernel: ata1: SATA link down (SStatus 0 SControl 300)

    Aug 5 21:21:24 kvm kernel: ata1: hard resetting link

    Aug 5 21:21:25 kvm kernel: ata3: SATA link down (SStatus 0 SControl 300)

    Aug 5 21:21:25 kvm kernel: ata3: limiting SATA link speed to 1.5 Gbps

    Aug 5 21:21:30 kvm kernel: ata3: hard resetting link

    Aug 5 21:21:30 kvm kernel: ata1: SATA link down (SStatus 0 SControl 300)

    Aug 5 21:21:30 kvm kernel: ata1: limiting SATA link speed to 1.5 Gbps

    Aug 5 21:21:35 kvm kernel: ata1: hard resetting link

    Aug 5 21:21:35 kvm kernel: ata3: SATA link down (SStatus 0 SControl 310)

    Aug 5 21:21:35 kvm kernel: ata3.00: disabled

    Aug 5 21:21:35 kvm kernel: ata3: EH complete

    Aug 5 21:21:35 kvm kernel: ata3.00: detaching (SCSI 2:0:0:0)

    Aug 5 21:21:35 kvm kernel: ata1: SATA link down (SStatus 0 SControl 310)

    Aug 5 21:21:35 kvm kernel: ata1.00: disabled

    Aug 5 21:21:35 kvm kernel: ata1: EH complete

    Aug 5 21:21:35 kvm kernel: sd 0:0:0:0: rejecting I/O to offline device

    Aug 5 21:21:35 kvm kernel: sd 0:0:0:0: [sda] killing request

    Aug 5 21:21:35 kvm kernel: sd 0:0:0:0: rejecting I/O to offline device

    Aug 5 21:21:35 kvm kernel: md: super_written gets error=-5, uptodate=0

  • dnwkdnwk Member

    On the same day, the other hard drive shows this message

    Aug 5 21:21:41 kvm kernel: Buffer I/O error on device sdc1, logical block 60852517

    Aug 5 21:21:41 kvm kernel: lost page write due to I/O error on sdc1

    Aug 5 21:21:41 kvm kernel: Aborting journal on device sdc1-8.

    Aug 5 21:21:41 kvm kernel: Buffer I/O error on device sdc1, logical block 60850176

    Aug 5 21:21:41 kvm kernel: lost page write due to I/O error on sdc1

    Aug 5 21:21:41 kvm kernel: JBD2: I/O error detected when updating journal superblock for sdc1-8.

  • dnwkdnwk Member

    There must be something happening. It is too coincidence both hard drive failed at the same time

  • Not looking good for you. Either the hard drive (most likely) or cable is failing :-(

  • dnwkdnwk Member

    @noosVPS said:
    Not looking good for you. Either the hard drive (most likely) or cable is failing :-(

    sdc is pretty old. So failure is expected. but sda (first log post) is less than 1 year old

  • dnwk said: but sda (first log post) is less than 1 year old

    A hardware device could fail anytime, even in first week if you are really (un)lucky. Less than 1 year old means you could probably get it replaced inside its warranty period.

  • What does smartctl say about the still working hard drive, especially look at the "Temperature Celsius" line?

  • dnwkdnwk Member

    @rds100 said:
    What does smartctl say about the still working hard drive, especially look at the "Temperature Celsius" line?

    194 Temperature_Celsius 0x0022 026 040 000 Old_age Always -
    26 (0 25 0 0)

  • dnwkdnwk Member

    190 Airflow_Temperature_Cel 0x0022 074 069 045 Old_age Always -
    26 (Min/Max 26/29)

  • dnwkdnwk Member
    edited August 2014

    What does the Pre-Fail means?

    1 Raw_Read_Error_Rate 0x000f 117 099 006 Pre-fail Always -
    120185880

    3 Spin_Up_Time 0x0003 098 098 000 Pre-fail Always -
    0

    4 Start_Stop_Count 0x0032 100 100 020 Old_age Always -
    6

    5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always -
    0

    7 Seek_Error_Rate 0x000f 072 060 030 Pre-fail Always -
    19665357

  • @dnwk the drive is OK and is not hot. I was thinking that because the other drives failed suddenly maybe the server was overheating, but obviously the temperature is OK.

  • netomxnetomx Moderator, Veteran

    Maybe a damaged PSU?

  • dnwkdnwk Member
    edited August 2014

    @netomx said:
    Maybe a damaged PSU?

    Damanged PSU will destroy more than that, I think

  • qpsqps Member, Host Rep

    Please post the full smart output of the drives (without the serial numbers). From that, we can probably tell you if there is a problem with the drives.

  • dnwkdnwk Member

    Current condition: 3 hard drive. 1 of them was not recognized by BIOS. It was on software RAID 1 but the machine are not booting up right now

  • My money is on dead disk controller...

  • dnwkdnwk Member

    Now I am thinking maybe it's motherboard issue

  • Ask the provider to pull this HDD and test if it's recognized in another server.

  • dnwkdnwk Member

    @rds100 said:
    Ask the provider to pull this HDD and test if it's recognized in another server.

    This is what sentris said
    "if hdd not detected, not sure how taking it out to test outside the server is any different."

    I guess it's true. If it is not a hard drive failure, I am in big trouble. I will probably need to replace the server.

  • dnwkdnwk Member

    Any rescue tool ISO recommended?

  • What kind of server is it? Is it your hardware (colocated) or leased? Are the HDDs how swappable?

  • dnwkdnwk Member

    @rds100 said:
    What kind of server is it? Is it your hardware (colocated) or leased? Are the HDDs how swappable?

    DELL L5420 Yes. HDD is swappable. And is coloed.

  • dnwkdnwk Member

    I am on soft RAID 1. Now, when 1 drive fail, How do I boot the system?

  • Go in to IPMI and change the boot device to the bad drive's mirrored twin.
    Hopefully you remembered to install grub on it too.

    Otherwise you're gonna be booting from an emergency disk and manually fixing things.

    Good luck!

  • dnwkdnwk Member

    @joshin said:
    Go in to IPMI and change the boot device to the bad drive's mirrored twin.
    Hopefully you remembered to install grub on it too.

    Otherwise you're gonna be booting from an emergency disk and manually fixing things.

    Good luck!

    Already changed. but doesn't work. I already savage data before a reboot.

Sign In or Register to comment.