Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Subscribe to our newsletter

Advertise on LowEndTalk.com

Latest LowEndBox Offers

    Don't build SSD in RAID5
    New on LowEndTalk? Please read our 'Community Rules' by clicking on it in the right menu!

    Don't build SSD in RAID5

    concerto49concerto49 Member
    edited April 2013 in General

    I think it's been talked about here by some providers. So don't do it!

    RAID5 = writing to all the SSD multiple times per write and kills it. Doesn't help. Hurts performance. Hurst reliability.

    Full article: http://thessdreview.com/daily-news/latest-buzz/skyera-reveals-raid-5-hinders-reliability-of-ssd-arrays/

    Serving you the best VPS, Web hosting, dedicated servers and more - Cloud Shards | Query Foundry
    We operate the network AS62638 | Available in Syd AU and Dallas, Los Angeles and NYC USA

    Comments

    • MrAndroidMrAndroid Member
      edited April 2013

      IMHO for a SSD Raid, you'd be better with Raid 1 or Raid 10, and unless your raid card supports TRIM, use software raid.

      The Original Daniel.

    • And even with RAID1 or RAID10 make sure you use 4K block size, otherwise you are also causing write amplification.

      -

    • @rds100 said: And even with RAID1 or RAID10 make sure you use 4K block size, otherwise you are also causing write amplification.

      Most SSD natively use 8K blocks since a while ago and the more recent ones have 16K blocks.

      @MrAndroid said: IMHO for a SSD Raid, you'd be better with Raid 1 or Raid 10, and unless your raid card supports TRIM, use software raid.

      I agree and never promoted RAID5 or even RAID6 on SSD. It's just I've read some providers suggesting it here, so hopefully this is a warning.

      Serving you the best VPS, Web hosting, dedicated servers and more - Cloud Shards | Query Foundry
      We operate the network AS62638 | Available in Syd AU and Dallas, Los Angeles and NYC USA
    • @concerto49 said: Most SSD natively use 8K blocks since a while ago and the more recent ones have 16K blocks

      Which ones? Any official source, table, classification, etc. ?

      -

    • @rds100 said: Which ones? Any official source, table, classification, etc. ?

      Since 20nm it's 16K. Since maybe 25nm it's 8K. Can't remember. As to official source, Anand has a nice table:

      http://anandtech.com/show/6884/crucial-micron-m500-review-960gb-480gb-240gb-120gb

      Scroll down. Page size in the table.

      This is for Intel/Micron.

      Sandisk/Toshiba have some similar transition. They make toggle.

      Serving you the best VPS, Web hosting, dedicated servers and more - Cloud Shards | Query Foundry
      We operate the network AS62638 | Available in Syd AU and Dallas, Los Angeles and NYC USA
    • @concerto49 interesting, thanks for the table! Now if we could convice the filesystem to use 8/16k blocks... oh, well.

      -

    • MrAndroidMrAndroid Member
      edited April 2013

      @concerto49 said: I agree and never promoted RAID5 or even RAID6 on SSD. It's just I've read some providers suggesting it here, so hopefully this is a warning.

      Raid 6 on SSDs sounds very horrible. I can imagine them dying very quickly.

      The Original Daniel.

    • Reading the article, it appears to be such an article from a company selling things.

      "Of COURSE this already existing technology is going to kill your SSDs and set your house on fire... the only solution is to buy our proprietary hardware."

      So it may not be pure gospel here.

    • marcmmarcm Member

      @concerto49 - Thanks for the tip. I have never liked RAID 5 anyway, even with HDDs. Dual Parity RAID 6 with several hot spares is the way to go if you want a half way decent RAID array for storage and / or backup.

    • @Damian said: "Of COURSE this already existing technology is going to kill your SSDs and set your house on fire... the only solution is to buy our proprietary hardware."

      Ignore the marketing parts. It's still a good warning over RAID5. There are probably other sources saying so, but let's not get there.

      Serving you the best VPS, Web hosting, dedicated servers and more - Cloud Shards | Query Foundry
      We operate the network AS62638 | Available in Syd AU and Dallas, Los Angeles and NYC USA
    • MaouniqueMaounique Member
      edited April 2013

      IF we were to think of wear and tear of SSDs, we wont be using any ssd cached nodes, would we ? I cant think of a more horrible cycle generator than that. Yet, they are in production and work well for months if not more than a year in various setups, including in hardware ones such as cachecade.
      When the drives will fail, they will be replaced, this is like saying, hey, dont drive your ATV on rough roads, it will break faster.
      In the same logic we shouldnt use the servers for virtualization, this means higher CPU load, higher temperature average and faster breakdown.
      The servers and enterprise grade SSDs are meant for heavy duty, however, we are now considering phasing out local storage and moving to SAN wherever possible.
      "DD tests" will be giving lower results, but I think an enterprise grade storage is better for everyone and who needs consistency and reliability will choose that over local storage every day. We don't like raid failures and as the nodes numbers increase it is more and more likely to have them.

      Thanked by 1agonyzt

      Extremist conservative user, I wish to preserve human and civil rights, free speech, freedom of the press and worship, rule of law, democracy, peace and prosperity, social mobility, etc. Now you can draw your guns.

    • ShadosShados Member

      The reality is that with wear leveling and redundant internal space, most SSDs will far outlast HDDs even in write-heavy environments - you essentially have to write to every block up the maximum number of per-block writes before any of them start to fail. Well, ideally, anyway.

      Besides, there are better reasons to avoid RAID5:
      1. The RAID5 write hole
      2. Performance loss due to partial-stripe writes being horrible (synchronously read stripe, modify & generate parity, write out instead of just generating parity and writing out as in a full-stripe write)

      Although if you really want it, you could always just use ZFS and go with RAID-Z, which neatly solves both those particular issues.

    • FRCoreyFRCorey Member

      Intel DC S3700's are what people should be using. They have onboard dram cache's that are backed up by capacaitors and they are HET drives high endurance so they're designed to be abused.

    • tjbtjb Member

      Raid 5 and 6 do not require a full stripe write. Only the blocks that have changed, along with the parity blocks will need to be updated. Controllers optimised for HDD’s try to do full stripe writes because it avoids the expensive reads from drives in the stripe – but only if the OS/App sends the writes that would allow this.

      If the OS/APP doesn't update all the blocks in a stripe, then the controller will read in those blocks not already cached so that the parity can be calulated - then the changed blocks and the parity can be written out.

      There’s no value in writing the blocks on disks that haven’t changed, in fact it’s detrimental because those drives could be servicing reads from other IO threads at the time.

      Raid-5 should have the same, or fewer disk writes per host write as raid-10. Consider the case where a stripe consisting of 5 data blocks and one parity block has two of those data blocks updated - that means a total of 3 blocks need to be written, while for raid 10, you would have 4 writes. For raid 6 it will vary, depending on how often you update multiple blocks per stripe.

    • In short, it depends on the level of redundancy and number of drives.
      Leaving necromancy aside, I would be interested in some statistics from providers regarding the % of ssd they have in the number of drives in total, and the percentage of failed drives in both categories. We had no SSD failure yet, but quite a few HDDs. On the other hand, those were in places where we colo or rent, it probably depends a lot on quality too.

      Extremist conservative user, I wish to preserve human and civil rights, free speech, freedom of the press and worship, rule of law, democracy, peace and prosperity, social mobility, etc. Now you can draw your guns.

    • GunterGunter Member
      edited March 2014

      Damn it DigitalOcean. °-_-

    • Just had our first SSD failure yesterday. Apparently the SSD went completely dead - not detectable at all.

      -

    • rds100 said: the SSD went completely dead - not detectable at all.

      So that looks like a board problem, not really wearing off the cells.

      Extremist conservative user, I wish to preserve human and civil rights, free speech, freedom of the press and worship, rule of law, democracy, peace and prosperity, social mobility, etc. Now you can draw your guns.

    • Yes, i don't think it saw much wear, it was a relatively new SSD. It's still under warranty even.

      -

    • nonubynonuby Member
      edited March 2014

      There was an interesting article on HackerNews or perhaps via Twitter, a major startup lead devops was asked some questions (post talk/conference) by a feisty young hardware boy, they posed a number of points regarding SSD usage on their database cluster, challenging statements regarding brands, firmware, os tuning, and general reliability were made, alluding to the unreliableness/cost-effectiveness of SSD in the long term and other edge negative implications.

      The devops thought for a while and responded "I don't care, I just rent them", a moment of enlightenment had occurred.

      You shouldn't expect anything to last for a particular duration of time, weather its 2 months, 2 years or 8 years, they can fail at any time in any configuration, there's no such thing as single node durability in absolute terms. It seems a lot of effort (days of review reading, testing, tuning) goes into select SSDs brand/models for optimal reliability and then when disaster does occur complete shock that raid 10 samsungz 747s had failed followed by 3 days of painful poorly planned recovery via some untested r1soft or other poorly thoughout backup soluton.

      In the startup's case, it wasnt noticable, the cluster kicked it out, AWS marked the node as degraded (or dead in the instant failure scenario) their controlling software noted this via the API, a new one was span up and a little light changed from green to red to green on some 100x100 matrix somewhere.

      tl;dr - failure can happen anytime, do you put the same amount of effort into disaster recovery (and testing such solution?)

      In practical terms do you have standby SSD drives ready (a warranty turnover wont suffice), or standby cold nodes, or capacity on the grid to migrate guests vm, and method to recovery quickly with minimal disruptions to clients? So WHEN a failure when it does occur its a minor blip rather than a shitstorm

      Thanked by 2Maounique k0nsl
    • nonuby said: So WHEN a failure when it does occur its a minor blip rather than a shitstorm

      Yep, shit will happen, it depends how will you get out of it. So far we had one total raid failure and some 3-4 disks replaced. The failed was SSD but the controller was to be blamed, so it wasnt a ssd vs hdd issue. The failed raid meant one node had to be restored from some 12 hours old backup on a stand-by node, took a couple of hours or so.

      The cloud has now N+2 at least, I mean, when it will approach that load, we will add more pods. But stricto senso, the SAN can fail too, even if it is one of those expensive hitachi ones with redundant everything, from controllers to firmware, if that happens, will need to ask hitachi to come and restore it, they do not allow outside intervention, however, the data should be safe even in those situations, it will take at most 24 hours to get back on track. We do have another, recently bought, even more expensive, 100% SLA compared to 99.99, the current one, but the data is not replicated there, it costs 1 mil Eur and only the enterprise sector is using it.

      Extremist conservative user, I wish to preserve human and civil rights, free speech, freedom of the press and worship, rule of law, democracy, peace and prosperity, social mobility, etc. Now you can draw your guns.

    • Both (R1 and R5) will do two writes for every single write. The real difrerence between them is that in R5 you sould read both blocks before writting them, so:

      • R1: one write means two write iops
      • R5: one write means two read iops and two write iops

      So I am not sure if having R1 will affect disks longevity. In R6 it would do three reads and three writes, so in this case they would last less.

    • 2 more iops under heavy disk access will not do great, but for ssd you do not need many disks, most people will use 4 and in this case you would be better with raid 10. If you need large space with many disks, say, over 6-8, then you will do raid 5 or 6 because you need the capacity and the extra read/write is distributed along many disks, there will be many IOPS so it wont matter, really.

      Extremist conservative user, I wish to preserve human and civil rights, free speech, freedom of the press and worship, rule of law, democracy, peace and prosperity, social mobility, etc. Now you can draw your guns.

    • pcanpcan Member

      For what is worth, I recently put on service a $100K IBM Power server (a high performance database server) with 6 SSD drives. I was aware of the issue listed by OP and I escalated a support request about the preferred RAID level to the IBM main offices. The answer was that RAID5 is the best choice for SSD drives on that machine; RAID 10 was explicitly not recomended.
      I believe that the RAID5 issue listed on the original paper is either outdated or biased to promote other products.

      According to my experience, SSD failure rate is far lower than mechanical drives. Over about 100 drives, I had one failure due to a known Intel SSD BIOS bug and another failure almost immediately after putting the drive in production (a obvious manufacturing flaw).

    • Coming out of RAID setup, which are the best SSD's till now!

      Or it's still the same Intel>Samsung>Any other.

      Anyone selling Blesta Owned Lifetime Under $250? | Regards.

    • I told that from the LONGEVITY perspective, it should be similar. And really, as one write iop converts to two backend write iops but, with raid5, sometimes it could be a full write stripe. In that case, it is an advantage. For example:

      • Raid 10 with 4+4 (for example): 8 write secuencial iops should be 16 write iops.
      • Raid 5 with 8+1 (for example): 8 write secuencial ips could be (in any cases) 9 write iops.

      I mean that Raild 5 could have some advantages and write less.

      But of course, in a random small write pattern, it is better to have it on Raid 1, but with SSD, SAS, NL SAS, FC and every kind of drive.

      Really, in every environment, depending on the size, the best is a conbination of R1, R5 and R6 and distribute the load across them depending on the io pattern.

      In a small environment, if you do not know the pattern, it is safe to make it in R1

    • When ganging up multiple SSD's for striping, more times than not, it's the controller that becomes the bottle neck. So your never going to realize the "theoretical" IOPS and/or throughput anyway. Sometimes an R5 makes sense (empirically) so you can utilize the added capacity you gain. Adamantly going with an R10 in this case may not gain you anything, but lost capacity. (is that a double negative?)

      I'm guessing this is why IBM recommended the R5 in the previous post.

      I'm sure someone is going to say change the controller, but chasing the bottle necks can be cost prohibitive it the real world. So sometimes one just needs make the best of what he has and call it a day.

    • I'd go with the "who cares" argument.

      SSD prices are falling, capacities are increasing and anything important should have redundancy and backups.

      So When the SSD does die you replace it and get on with life just as you would if a HDD died.

    • debinskidebinski Member
      edited March 2015

      In our case I guess we don't care if it dies, its covered under Dell gold support. But we do care about tweaking the maximum throughput for given set of hardware. I'm presently setting up a multi-user environment (terminal server) that runs an application that is never CPU or memory bound. But is always IO bound. We got everything from 16 SLC SSD's on two controllers (which is actually the slower tier), to (2) 1.2TB Fusion IO PCIe16x cards, to a 384GB RAM drive (the fastest tier) for temp files, and 384GB RAM for the CPU's. (768GB total). The server cost $100 grand. The application cost more. The application uses all of these paths simultaneously while processing. It can really move stuff around. (not a "lowend" box)

      The point I was making in a nutshell was sometimes a raid5 makes sense (over a raid10), especially if the workload is mostly read IOPS or if the controller supporting R10 can't deliver - hence the recommendation from IBM. (not the case with the Dell R720)

    • Sorry if this is off topic, but @Maounique won't one SAN thingy for a load of nodes failing be worse than the local storage in one node failing?

    • MaouniqueMaounique Member
      edited March 2015

      linuxthefish said: Sorry if this is off topic, but @Maounique won't one SAN thingy for a load of nodes failing be worse than the local storage in one node failing?

      Yes, it would, however, our older SAN has active-active dual controllers, a lot of built-in redundancy and is under warranty by toshiba for 99.99 uptime and 0 dataloss if operated correctly (and they are the only ones to operate, including installing disks and only some types of disks). It was 150k.
      Our newer one has everything included, even 24 hour UPSes, has huge redundancines, is guranteed 100% uptime (even firmware update wont take it down) and it costs 1 mil.

      Yeah, it may fail, but I believe the chance for that to happen is much less than all the nodes storage to fail in a given time. We offer free and paid backup options, including snapshots, stored in a diifferent storage system. If it fails and people have no back-ups, that wont be our problem, we did everything we could, including huge costly investments.

      dragon2611 said: SSD prices are falling, capacities are increasing

      Yeah, but this often happens at the expense of reliability. We are going for the MLC ones only, for now. They are expensive and not so big, but much more reliable and fast, IMO.

      Thanked by 1linuxthefish

      Extremist conservative user, I wish to preserve human and civil rights, free speech, freedom of the press and worship, rule of law, democracy, peace and prosperity, social mobility, etc. Now you can draw your guns.

    • Let me make a simple calculation.

      I need, say, 1TB net. A 250GB SSD is 100$, a 500GB one is 200$ (assumed). As a hoster I need, say, 100 sets.

      With RAID5 I'll use 5 250GB disks to get 1 TB net. -> 5 x 100$ -> 500$
      With RAID1 I'll use 4 500GB disks to get 1 TB net. -> 4 x 200$ -> 800$

      Times 100 (for 100 systems) -> (500 disks) 50.000$ and (400 disks) 80.000$

      Let's assume the failure rate is 5%/year, i.e. 5 disks in 100 disks will go belly up each year. So in my RAID5 setup I'll have to replace 25 disks/year (2.500$) and in my RAID1 setup it's 20 disks (4.000$).
      Let's stupidly assume, prices don't change over time (which is OK, because if they change that will be reflected in both setups) and let's assume I calculate my solution for a lifetime of 4 years (a typical bookkeeping lifetime).

      So, all in all, the RAID5 solution will be 50k$ + 3 x 2.500$ -> 57.5 k$. The RAID1 solution will be 80k$ + 3 x 4.000$ -> 92 k$.

      But you say, wear out will be higher in RAID5. OK, let's increase the yearly failure/replacement rate at 15% then for RAID5. Which makes 50k + 3 x 7.5k -> 72.5 k$

      Turning that into systems I'd arrive at 100 RAID1 systems of 1 TB net for 4 years for 92 k$. Or at 126 RAID5 systems of 1 TB net for 4 years, also at 92 k$.

      Or, know what, let's go amok and assume that RAID5 SSD fail 5 times as often as RAID1 systems, OK. Then we arrive at 87.5 k$. Still cheaper than RAID1.

      I think we can agree that for one and the same amount getting more RAID5 systems vs RAID1 systems systems, we'd chose the solution that brings us more systems to earn money with (or that is cheaper), wouldn't we?

      Thanked by 1vimalware

      My favourite prime number is 42. - \forall cpu in {intel, amd, arm}: cpu->speed -= cpu->speed/100 x irandom(15, 30) | state := hacked

    • MaouniqueMaounique Member
      edited March 2015

      TBH, the original article is not looking very legit.
      Anyway, it is not compared with raid 10 where the writing is more, as the system keeps 2 copies instead of a calculated parity.
      So, I do not understand the issue. More frequent small rights are better than larger writes at once? Overall the writing cycles are more with raid 10 in average per cell.

      Extremist conservative user, I wish to preserve human and civil rights, free speech, freedom of the press and worship, rule of law, democracy, peace and prosperity, social mobility, etc. Now you can draw your guns.

    Sign In or Register to comment.