Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


How are NVME drives holding up in long term?
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

How are NVME drives holding up in long term?

Considering replacing all our SSD and spinning disks with NVME but concerned about how these drives will hold up over several years. Anyone running NVME over 2-3 years? Anyone had any NVME's 'crash' after being in service for over 6 months.

Thanks!

Comments

  • WebProjectWebProject Host Rep, Veteran

    NVME should be more reliable than any SSD or HDD, I managed to burn my first SSD drive on my home PC within 6 months as it was manufacturer fault so I had replacement.

  • vfusevfuse Member, Host Rep

    We've had a 2 nvme's give out after about 220TB written to them (~9 month usage) at hetzner on our logging cluster. We replace the servers now after ~200TB write.

    Thanked by 1pluush
  • edited March 2020

    vfuse said: We've had a 2 nvme's give out after about 220TB written to them (~9 month usage) at hetzner

    What models are they? 220TB seems low unless they are 256GB drives.
    Did you run smart tool to check regularly?

    Thanked by 1pluush
  • PureVoltagePureVoltage Member, Patron Provider

    No issues yet other than one DOA.
    Using enterprise U.2 drives however.

  • They went to 5-year warranties faster than when SSD's started coming with 5-year warranties. Lower component count typically leads to higher MTBF.

  • vfusevfuse Member, Host Rep

    @greattomeetyou said:

    vfuse said: We've had a 2 nvme's give out after about 220TB written to them (~9 month usage) at hetzner

    What models are they? 220TB seems low unless they are 256GB drives.
    Did you run smart tool to check regularly?

    They're mainly SAMSUNG MZVLB512HAJQ-00000 (consumer nvme's). It could also have to do with the temperature (sensor1 averages 60 degrees c, sensor2 averages 95c).

  • mehargagsmehargags Member
    edited March 2020

    NVMe drives tend to misbehave or go faulty faster if temperature is not optimal. @vfuse you might want to report it to Hetzner to check the rack cooling

  • vfusevfuse Member, Host Rep

    We already reported when they failed, only thing we noticed is that Helsinki the servers are much cooler compared to Falkenstein. They're all really hot in Falkenstein even tho they're in different dc# for HA.

  • PulsedMediaPulsedMedia Member, Patron Provider

    vfuse said: They're mainly SAMSUNG MZVLB512HAJQ-00000 (consumer nvme's). It could also have to do with the temperature (sensor1 averages 60 degrees c, sensor2 averages 95c).

    I have always been under the impression that SSDs are not that temperature sensitive for hot temp, cold yes, but not hot. There seems to be consensus that the flash chips itself work better at higher temp, which makes cooling solutions difficult when controller needs to be kept cool but the actual flash hotter.

    vfuse said: We already reported when they failed, only thing we noticed is that Helsinki the servers are much cooler compared to Falkenstein. They're all really hot in Falkenstein even tho they're in different dc# for HA.

    It's actually located in Tuusula ;) It's newer, probably lower use % and finnish climate is rather cold typically year around, not many 30C+ days, but lots around the 0C mark :)
    We have DC in Helsinki and something like 7 months of the year just outside air circulation is pretty much all that is needed, it's only those 2-3 mid summer months we need to crank up the AC.

  • FranciscoFrancisco Top Host, Host Rep, Veteran

    @mehargags said:
    NVMe drives tend to misbehave or go faulty faster if temperature is not optimal. @vfuse you might want to report it to Hetzner to check the rack cooling

    We put double sided heatsinks on all of ours just to be safe.

    Lots of air moving over them too.

    Francisco

  • PulsedMediaPulsedMedia Member, Patron Provider

    Francisco said: We put double sided heatsinks on all of ours just to be safe.

    We use one sided typically, typical chinesium "rubber band" mount which we replace with a zip tie and pressure on controller chip not flash chips. What kind of heatsinks are you using?

  • FranciscoFrancisco Top Host, Host Rep, Veteran

    @PulsedMedia said:

    Francisco said: We put double sided heatsinks on all of ours just to be safe.

    We use one sided typically, typical chinesium "rubber band" mount which we replace with a zip tie and pressure on controller chip not flash chips. What kind of heatsinks are you using?

    https://www.amazon.com/gp/product/B07PS9S2DZ/ref=ppx_yo_dt_b_asin_title_o00_s00?ie=UTF8&psc=1

    Each pack includes one of each. One 'band' based if you want, and one 'full double sided enclosure' based.

    These kits have 2 thermal pads, one for each side, so it squeezes it together like a sandwich.

    Francisco

    Thanked by 2PulsedMedia Aidan
  • PulsedMediaPulsedMedia Member, Patron Provider

    Thanks. I've seen those, but have not got any so far. Tried many of the cheaper ones tho. I'll buy some and do metrics on them too :)

  • webguyzwebguyz Member
    edited March 2020

    We use a heatsink brand called Warship which you can find on Ebay. We also turn up the fans a bit on the supermicros we use them in. Average temp is 28C. Might hit 30C if the disk is very busy. Kind of pricy at around 5 bux but a worthwhile investment I think. Everything I have read suggests excessive heat is a real killer and causes throttling. Have about 25 vms on each hyper-v server and they are very fast. Really like the Sabrent 2TB NVME models.

  • PulsedMediaPulsedMedia Member, Patron Provider

    @webguyz said:
    We use a heatsink brand called Warship which you can find on Ebay. We also turn up the fans a bit on the supermicros we use them in. Average temp is 28C. Might hit 30C if the disk is very busy. Kind of pricy at around 5 bux but a worthwhile investment I think. Everything I have read suggests excessive heat is a real killer and causes throttling. Have about 25 vms on each hyper-v server and they are very fast. Really like the Sabrent 2TB NVME models.

    https://www.ebay.com/itm/WARSHIP-M-2-NGFF-PCIE-NVMe-2280-SSD-Heatsink-Cooling-Fin-Radiator-Thermal-Pads/273027268702?epid=14009596637&hash=item3f91b1805e:g:OKMAAOSwFOZbIShb

    this one? Could not find for 5$.

    anyone tried those full copper ones?

  • PulsedMediaPulsedMedia Member, Patron Provider

    Cool, and i see quantity discounts too. I'll buy a few of those as well! :)
    Planning to ramp up the ZEN MiniDedis by end of this year once we can automate them, these might make good for a good standardized solution if the thermals hold up.

  • NDTNNDTN Member, Patron Provider, Top Host

    Use U.2 Enterprise NVMe like Intel P4610, we have been deploying a lot of NVMe servers in the past years and none giving issues. For example, the endurance of the Intel P4610 1.6TB is 12.25PBW while consumer NVMe like Intel 660P 2TB only has 400TB.

  • Hetzner_OLHetzner_OL Member, Top Host

    As a general rule of thumb, if you think that there is ever an issue with the performance of the hardware that you think is re-occurring, please communicate with our team about it. In some situations, our team may need to try to document any potential issues with hardware to see if it's part of a larger problem. --Katie

  • TimboJonesTimboJones Member
    edited March 2020

    @PulsedMedia said:

    vfuse said: They're mainly SAMSUNG MZVLB512HAJQ-00000 (consumer nvme's). It could also have to do with the temperature (sensor1 averages 60 degrees c, sensor2 averages 95c).

    I have always been under the impression that SSDs are not that temperature sensitive for hot temp, cold yes, but not hot. There seems to be consensus that the flash chips itself work better at higher temp, which makes cooling solutions difficult when controller needs to be kept cool but the actual flash hotter.

    I'd be curious where you heard that consensus from, that sounds like nonsense.

    (on a side note, I'd think that'd make cooling solutions easier, when you have a side that needs cooling and a side that can take the heat).

    Edit: Found an article that likely refers to the high temperature for NAND controllers you were talking about. https://www.eeweb.com/profile/eli-tiomkin/articles/industrial-temperature-and-nand-flash-in-ssd-products

  • PulsedMediaPulsedMedia Member, Patron Provider

    TimboJones said:

    I'd be curious where you heard that consensus from, that sounds like nonsense.

    Higher than the controller chip.

    I did not save links, but i've seen this multiple times in regard to M.2 cooling especially with the new PCI-E Gen 4 drives and the difficulty of their cooling and why most drives don't have coolers on them, where too cold nand chips is no good neither and they will wear out faster at colder temps.

  • @PulsedMedia said:

    TimboJones said:

    I'd be curious where you heard that consensus from, that sounds like nonsense.

    Higher than the controller chip.

    I did not save links, but i've seen this multiple times in regard to M.2 cooling especially with the new PCI-E Gen 4 drives and the difficulty of their cooling and why most drives don't have coolers on them, where too cold nand chips is no good neither and they will wear out faster at colder temps.

    Last paragraph of the link I posted:

    The best way to optimize the data retention of a NAND-based SSD is to limit the temperature at which the NAND flash is stored. When the drive has reached or is approaching its end of life, limiting the time of exposure to high temperature will also help extend the data retention.

  • bacloudbacloud Member, Patron Provider

    Started NVMe VPS from July of 2017. P3600 Intel, all NVMe drives are ok, no one is dead.

    Thanked by 1pluush
  • PulsedMediaPulsedMedia Member, Patron Provider

    TimboJones said: The best way to optimize the data retention of a NAND-based SSD is to limit the temperature at which the NAND flash is stored. When the drive has reached or is approaching its end of life, limiting the time of exposure to high temperature will also help extend the data retention.

    That talks about data retention and does not define high temp. Is high temperature in this regard 40C? 100C? 200C? 300C?

    I was talking about overall write cycles. Data retention when talked about usually referes to number of years the data is safe unpowered at the device.

  • pluushpluush Member
    edited March 2020

    I have used NVMe for almost 2 years. But not in server environment (desktop PC). I bought a (supposedly) non-retail SM961 which has 2LC (instead of 3LC), no problems so far, and IIRC never a disk-related crash. And I expect them to last longer than TLC SATAs. Would kinda be disappointed if they give up at 200TB...

    my 4yo Tablet PC SATA m.2 SSD even wrote 15TB NAND without actively abusing it.

Sign In or Register to comment.