Corrupted data probability on md RAID vs. ZFS raid - real experiences

miu · April 2021

Good day.

Proxmox staff mention:
https://pve.proxmox.com/wiki/Software_RAID

    mdraid has zero checks for bitrot, data integrity, 
and most filesystems on top do not provide that either.

    That means, if some data gets corrupted, which happens on any long-running system sooner or later, 
and you normally do not notice until it's too late.

I have few dedicated servers running without OS reinstallation 3+ years w md soft raid (R1 or R10) boot root and home partitions and do not remember case when this happened ever: some data gets corrupted, which happens on any long-running system sooner or later

I would wonder on providers real experiences, opinions, recommendations on this matter.

It is an exaggerated paranoid vision regarding the error rate of md raid?
Or really haves anyone such bad experiences happened and really is there great reason&advantage for use ZFS raid? (when we speak and consider possible md raid error data getting, i do not discuss now about other ZFS advantages (Copy-on-write clone, can use SSD for cache etc.))
should i expect really significant data error disaster when will run several production high-load servers on md as they predict?

(btw: also I do not ask especially that is good install Proxmox on ZFS partitions (there i assume sure yes), but mean it generally ZFS for no-proxmox host node where is no HW raid/controller presented, say LAMP server or database server etc.)

Thanks! for all relevant experiences sharing, opinions, and useful recommendations in advance

rcxb · April 2021

I have certainly seen aging drives quietly corrupting sectors, so there's the possibility of losing data if it just sits around idle on drives for long term, but more likely you'll see bigger problems, like bad sectors stalling out your disk performance, instead of silent corruption.

My solution with any type of raid is to do something like a patrol-read or scrub on all drives.

One option: cat /dev/sd? > /dev/null ; RET=$?

Another option, if you're backing up with something like rsync just add the -c option to one run per week so it'll read the entire contents of every file, which trigger read errors if sectors are going bad. Other types of "full" backups will accomplish the same.

If you really want to be paranoidor have very important data, set-up Tripwire or similar application (AIDE, Samhain, etc.) to verify every file on your disk. Won't just restore a good version of your file for you like z-RAID will, but you'll get alerted early, and can restore a good copy from backup. Will also perform their primary function of alerting you to security/intrusions.

How valuable is your data to you? What odds of quiet corruption are you willing to take?

miu · April 2021

@rcxb said:
One option: cat /dev/sd? > /dev/null ; RET=$?

Another option, if you're backing up with something like rsync just add the -c option to one run per week so it'll read the entire contents of every file, which trigger read errors if sectors are going bad. Other types of "full" backups will accomplish the same.

If you really want to be paranoidor have very important data, set-up Tripwire or similar application to verify every file on your disk. Won't just restore a good version of your file for you like z-RAID will, but you'll get alerted early, and can restore a good copy from backup. Will also perform their primary function of alerting you to security/intrusions.

Thank you very much for useful response and great suggestions/recommendations, this really sounds good and gives sense.

How valuable is your data to you? What odds of quiet corruption are you willing to take?

Most data (or servers) is backed up 1x weekly, there are clients data (websites, DBs, files, data etc.)

For example:

A) if only 1 user (say its /home folder would be corrupted what have say average 20,30 or 50GB of data) i can quite easy and relatively fast recover him on new from regular backup

B ) but when would whole server collapse (MySQL for all user stops work/all dbs collapsed (what already happened me 2x on RAID5, when all DBs and tables became at once and within minute completely unreadable - from this time i never try use more raid5 for DBs), or more worse whole /home folder or partition become unreadable) then would be total disaster because i use usually several large SATAs disk (4,6,8TB) in RAID and large partitions (12,16 but also too 24 TB in RAID5 i have w md raid currently on one server)
So in this case when i should restore whole such partition this would be total DISASTER and very very long time outage (only copy them through 1Gbit network would be several days from all external/remote backups)

I must admit i was feeling safe because though when all is always on data mirroring or minimally parity based RAIDs then i am relatively protected against great data loss at once. But after read mentioned proxmox claims related to md getting data consistency error probability/danger i began be quite scarred and concerned now

When budget allows me it i always try prefer HW RAID controller (where i expect that they do work not only much faster but also much more precise, safe and better as md soft raid), but not possible for all large storage SATA servers yet (in this case i use just MD soft raid only there yet)

thanks for all opinions and suggestions, appreciated!

miu · April 2021

@rcxb said:

BTW: what is your opinion on md RAID consistency checking ?
(/sys/block/mdX/md/sync_action check)
(what is scheduled by cron to turn on automatically as default on debian i think 1x monthly)

I assume This should be also similar option and can does the same purpose (when mdadm should check every sector and when found data inconsistency then automatically repair/correct them on the background - i am right?)

miu · April 2021

from online docs:

Checkarray verifies the consistency of RAID-disks with reading operations. It compares the corresponding blocks of each disk in the array. If, however, while reading, a read error occurs, the check will trigger the normal response to read errors which is to generate the 'correct' data and try to write that out - so it is possible that a 'check' will trigger a write. However in the absence of read errors it is read-only.

This automatic repair function is also mentioned by Neil Brown: "All you need to do is get md/raid5 to try reading the bad block. Once it does that it will get a read error and automagically try to correct it. [...] 'check' (i.e. echo check > /sys/block/mdXX/md/sync_action) will cause md/raid5 to read all blocks on all devices, thus auto-repairing any unreadable blocks.

So i assume regularly scheduled turn on md sync_action also would be quite enough for maintain md RAID member disks in good state without silent corruptions/errors caused by long data not reading/file inactivity..

psb777 · April 2021

Yes, you should do scrubbing (aka sync_action check) regularly, and the odds of encountering inconsistencies is very low. Hard drives have CRC checks, so in the case of bitrot, it is much more likely to report an I/O error instead of silently returning incorrect data. md-raid will try to repair the error by writing the correct data to the bad block.

That said, if you are really paranoid, you can add a layer of dm-integrity under md-raid.

https://raid.wiki.kernel.org/index.php/Dm-integrity

darkimmortal · April 2021

It’s vanishingly rare to see corruption that mdraid wouldn’t catch (ie not reported by disk/controller) on a stable system (no random crashes / symptoms of bad ram).

9/10 cases of corruption caught by btrfs in my experience also bubbled up as sata errors from the disk, so mdraid would have handled them just as well. The remaining cases were bad ram and would have been resolved by ECC.

I’ve never experienced corruption that makes btrfs/zfs essential, but I still use btrfs for peace of mind

Levi · April 2021

If ZFS fail - you are dead in the watter.

miu · April 2021

@LTniger said:
If ZFS fail - you are dead in the watter.

Do u have any own / practice experiences with cases when ZFS failed or collapse - says has not been able rebuild failed redundant-raid member disk on replaced new one, or such array collapsed when 1 drive failed?
Or knowledge / more info about it?

thanks

miu · April 2021

Or anyone other who haves some experiences with any unexpected ZFS general failures?

Levi · April 2021

@miu said:

@LTniger said:
If ZFS fail - you are dead in the watter.

Do u have any own / practice experiences with cases when ZFS failed or collapse - says has not been able rebuild failed redundant-raid member disk on replaced new one, or such array collapsed when 1 drive failed?
Or knowledge / more info about it?

thanks

Yes, I do. Just set zfs as root file system, feed ten millions small video files and stream. Few months and you are toasted.

miu · April 2021

@LTniger said:>
Yes, I do. Just set zfs as root file system, feed ten millions small video files and stream. Few months and you are toasted.

Interesting. It sounds like ZFS would have problems with long time intensive reading of large amount of small files.

In every case thanks for your experiences sharing

rcxb · April 2021

@miu said:
When budget allows me it i always try prefer HW RAID controller

I was only answering the question about data corruption. I would never recommend MD-RAID for a critical server. Not because of data loss, but because of availability. It seems like it was never designed for real, enterprise use.

When a hard drive in a hardware RAID array starts acting-up, the controller quickly shut-off the drive, and awaits a replacement. MD-RAID seems to just continue trying to read from a bad drive, forever, stalling disk i/o your server until you notice and see the log spew. That is just terrible behavior, to let disks continue to be dragged down by one drive, when MD-RAID knows it is having errors, and which it knows you have a good backup of.

Howdy, Stranger!

Categories

In this Discussion

Corrupted data probability on md RAID vs. ZFS raid - real experiences

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Corrupted data probability on md RAID vs. ZFS raid - real experiences

Comments