Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Dr. Server RAID Failure - All Data Lost - Page 2
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Dr. Server RAID Failure - All Data Lost

24

Comments

  • deankdeank Member, Troll

    Does not have to, but it will.

  • msg7086msg7086 Member
    edited August 2018

    @jsg said:

    When mirroring the remaining drive does NOT have to work harder.

    If we assume the first drive died was the one working hard, then the remaining one will have to work harder.

  • jsgjsg Member, Resident Benchmarker

    @msg7086 said:

    @jsg said:

    When mirroring the remaining drive does NOT have to work harder.

    If we assume the first drive died was the one working hard, then the remaining one will have to work harder.

    No. Mirroring ("having ones data double") simply writes out the buffer twice. The Raid overhead is minimal and the remaining disk is working exactly as hard as if the other disk was still online.

  • @jsg said:

    No. Mirroring ("having ones data double") simply writes out the buffer twice. The Raid overhead is minimal and the remaining disk is working exactly as hard as if the other disk was still online.

    Don't you ... read from disks?

  • jsgjsg Member, Resident Benchmarker

    @msg7086 said:

    @jsg said:

    No. Mirroring ("having ones data double") simply writes out the buffer twice. The Raid overhead is minimal and the remaining disk is working exactly as hard as if the other disk was still online.

    Don't you ... read from disks?

    Mirroring (Raid 1) means that any data is written to both disks - if available. If one disk isn't available it's taken out of the Raid and data is read from and written to just the remaining disk.

    Note that Raid 1 (unlike e.g. Raid 5 or 6) does no striping and no other magic except for minimal housekeeping.

  • deankdeank Member, Troll
    edited August 2018

    I've rarely seen a raid that survived from a disk crash in server environment at least.

    When it rains, it pours applies perfectly on raid incidents. When a drive dies, another or even 3rd one would soon follow and major disaster occurs.

    It is partially a fault of raid cards from what I've observed. They begin to act up once a drive goes down and sometimes corrupt itself.

    Thanked by 1MrH
  • FHRFHR Member, Host Rep

    jsg said: Mirroring (Raid 1) means that any data is written to both disks - if available. If one disk isn't available it's taken out of the Raid and data is read from and written to just the remaining disk.

    When you read with mirroring, data is read from all disks in the array.

  • MikeAMikeA Member, Patron Provider

    I think I'll stick with software raid.

  • jsgjsg Member, Resident Benchmarker

    @deank said:
    I've rarely seen a raid that survived from a disk crash in server environment at least.

    When it rains, it pours applies perfectly on raid incidents. When a drive dies, another or even 3rd one would soon follow and major disaster occurs.

    It is partially a fault of raid cards from what I've observed. They begin to act up once a drive goes down and sometimes corrupt itself.

    I have in fact seen extremely few desasters with Raid, no matter whether hardware or software Raid. One exception: The 410 adapters @AnthonySmith mentioned; I have learned to not trust them.

    In fact I know of just one single desaster. In all other cases the Raid volumes could be rebuilt without any damage remaining. I do agree though that a system should be (otherwise) inactive during a rebuild. With Raid 1 it's probably less critical than with the striped varieties but I personally always recommend to not take the risk.

  • jsgjsg Member, Resident Benchmarker

    @FHR said:
    When you read with mirroring, data is read from all disks in the array.

    Depends on the implementation and the situation. Unless a read request is very large it doesn't make sense anyway nowadays.

  • Yikes. Lucky I have backups.

  • jsg said: Unless a read request is very large it doesn't make sense anyway nowadays.

    It's a VPS node, there's lots of concurrent requests and they get split across the drives.

  • HarambeHarambe Member, Host Rep

    @dahartigan said:
    Yikes. Lucky I have backups.

    Was this on one of the new offer machines? That's just shitty if it's the case. Can't control hardware just deciding to eat it though.

  • dahartigandahartigan Member
    edited August 2018

    @Harambe said:

    @dahartigan said:
    Yikes. Lucky I have backups.

    Was this on one of the new offer machines? That's just shitty if it's the case. Can't control hardware just deciding to eat it though.

    My VPS wasn't affected by this, which is one of those deals you're referring to. I just checked.. phew. I do have a dedicated server that has had shit uptime for the past few days though.

    Edit: but seriously.. backups guys. Damn.

  • jsgjsg Member, Resident Benchmarker

    @willie said:

    jsg said: Unless a read request is very large it doesn't make sense anyway nowadays.

    It's a VPS node, there's lots of concurrent requests and they get split across the drives.

    (a) reads aren't the main stress factor for a drive, writes are.
    (b) The major factor in a nodes load is the OS caches and read/write ordering.
    (c) You will note that with LARGE requests even a hw Raid controller cache doesn't speed things up considerably.
    (d) who says that nodes have more concurrent requests than say a company server?

    @All

    I'm not interested in belief systems and even less in wars based on them. I wrote what I know and what I have experienced. If some here WANT to believe that the remaining drive in a Raid 1 works oh so much harder (that it often soon dies too), just ignore me and accept my apologies.

  • HarambeHarambe Member, Host Rep

    @jsg said: (a) reads aren't the main stress factor for a drive, writes are

    Yeah, but reads are the main stress put on remaining drives in a rebuild.. and other drives die during rebuilds all the time.

    That's the main point that everyone else is making.

  • Kudos though to @radi for his offer to compensate those affected by refunding them. Not many providers would take ownership like that.

  • jsgjsg Member, Resident Benchmarker

    @Harambe said:

    @jsg said: (a) reads aren't the main stress factor for a drive, writes are

    Yeah, but reads are the main stress put on remaining drives in a rebuild.. and other drives die during rebuilds all the time.

    That's the main point that everyone else is making.

    Maybe the real stress is to have a system running during a rebuild as if nothing happened...

  • deankdeank Member, Troll
    edited August 2018

    I have seen some. Made brave refunds and then went down few months later.

  • drserverdrserver Member, Host Rep

    willie said: I wonder what the array size was, what size and brand of drives, whether a recovery was in progress when the 2nd drive failed, etc.

    Array was 6x1tb (samsung 850 pro) less than a six months in production with intel onboard raid controller. that was 4 node fat twin from supermicro. we have lost same pair of drives.

    Thanked by 2willie Aidan
  • drserverdrserver Member, Host Rep

    desfire said: Are they really using BoxBilling as client area?

    we are using hostbill for our billing system

  • drserverdrserver Member, Host Rep

    dahartigan said: My VPS wasn't affected by this

    Only one node had raid issue affecting 35 clients.

  • drserverdrserver Member, Host Rep

    Thank you all for understanding. if there is anything else that you would like to know more about this please send me a PM or open a ticket, also I would like to apologise also to affected users.

  • What are backups? Sounds delicious and something like you can eat but gives you food poisoning

  • 2 drives in an array from the same lot # is worse than 1 drive bc you have a false sense of security thinking you have 2 when you really have 1

    Thanked by 1MrH
  • sidewinder said: 2 drives in an array from the same lot # is worse than 1 drive bc you have a false sense of security thinking you have 2 when you really have 1

    I agree

  • FranciscoFrancisco Top Host, Host Rep, Veteran
    edited August 2018

    @drserver said:

    willie said: I wonder what the array size was, what size and brand of drives, whether a recovery was in progress when the 2nd drive failed, etc.

    Array was 6x1tb (samsung 850 pro) less than a six months in production with intel onboard raid controller. that was 4 node fat twin from supermicro. we have lost same pair of drives.

    See?! I told people the 1TB 850s were terrible!

    Francisco

    Thanked by 2FHR vimalware
  • AnthonySmithAnthonySmith Member, Patron Provider
    edited August 2018

    Francisco said: See?! I told people the 1TB 850s were terrible!

    Dont the pro's usually fail in read only though?

  • letboxletbox Member, Patron Provider

    That's why we stopped using Raid and used replicated instead like we do in our new Services. if one server fail another will up automatically with the data.

  • letboxletbox Member, Patron Provider

    @drserver said:

    willie said: I wonder what the array size was, what size and brand of drives, whether a recovery was in progress when the 2nd drive failed, etc.

    Array was 6x1tb (samsung 850 pro) less than a six months in production with intel onboard raid controller. that was 4 node fat twin from supermicro. we have lost same pair of drives.

    All pro and consumer SSD not made for workload servers they wont last long you may need to stop those you would be better off with DC SSD instead.

Sign In or Register to comment.