Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Proxmox & NVME huge data written numbers in SMART - anyone else experiencing this?
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Proxmox & NVME huge data written numbers in SMART - anyone else experiencing this?

CdoeCdoe Member
edited January 2019 in General

Hello,
anyone experiencing huge numbers in SMART? I'm running proxmox on server with 4 virtual machines running on Debian. I've checked the nvme smart and the Data Units Written are way too high. I've got two Intel SSDPE2MX450G7 running in raid 1.

SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning: 0x00
Temperature: 26 Celsius
Available Spare: 97%
Available Spare Threshold: 10%
Percentage Used: 20%
Data Units Read: 27,836,670 [14.2 TB]
Data Units Written: 724,552,635 [370 TB]
Host Read Commands: 320,954,705
Host Write Commands: 7,553,830,582
Controller Busy Time: 50
Power Cycles: 24
Power On Hours: 5,576
Unsafe Shutdowns: 3
Media and Data Integrity Errors: 0
Error Information Log Entries: 0

Is this a firmware bug? Because when I divide data written by number of power on seconds it would be 180MB/s on average, which is not possible as the VMs are mostly idling.

edit:

zpool iostat rpool 60

capacity operations bandwidth
pool alloc free read write read write


rpool 342G 74.3G 0 91 10.0K 1.95M
rpool 342G 74.3G 0 90 7.80K 1.95M
rpool 342G 74.3G 0 107 7.60K 2.91M
rpool 342G 74.3G 0 85 22.1K 2.15M
rpool 342G 74.3G 0 92 8.47K 2.16M
rpool 342G 74.3G 0 90 6.67K 1.71M

Comments

  • rm_rm_ IPv6 Advocate, Veteran

    Cdoe said: I've got two Intel SSDPE2MX450G7 running in raid 1.

    Did you own them from the start? Did it show zero written right after purchase?

    Also does it keep increasing at the same rate right now? You can check with iotop if there's any disk activity.

  • proxmox version ?

  • FalzoFalzo Member
    edited January 2019

    Cdoe said: I've checked the nvme smart

    how? where does this output come from, and does it show that [370TB] number directly or did you add it?

    however, it is simply wrong. according to this https://www.intel.com/content/dam/support/us/en/documents/solid-state-drives/Intel_SSD_Smart_Attrib_for_PCIe.pdf

    'data units written' is the number of 512 byte units written.

    724,552,635 * 512 byte makes it 370GB instead.
    either you or the software used to calculate that data got it wrong by factor 1000 ;-)

    I may be too stupid, to read my own reference.

    still I'd also doubt the numbers being correct. maybe it comes down to the part of the filesystem having another blocksize and the conversion for that number which should be done by the controller does not work as intended...

    Thanked by 1Wolf
  • CdoeCdoe Member
    edited January 2019

    @rm_ said:

    Cdoe said: I've got two Intel SSDPE2MX450G7 running in raid 1.

    Did you own them from the start? Did it show zero written right after purchase?

    Also does it keep increasing at the same rate right now? You can check with iotop if there's any disk activity.

    Yes, both disk were brand new. SMART values are pretty similar in this case.

    Data Units Read: 27,836,766 [14.2 TB]
    Data Units Written: 724,543,634 [370 TB]
    Data Units Read: 27,838,911 [14.2 TB]
    Data Units Written: 724,576,211 [370 TB]

    iotop is pretty normal for low usage server, 500kB/s with 10-15MB/s spikes. Nothing even close to 180MB/s avg.

    I don't know how it grows over time, I just noticed it today.

    @cociu said:
    proxmox version ?

    It was Proxmox 5 since the beginning, 5.3-7 currently.

    @Falzo said:

    Cdoe said: I've checked the nvme smart

    how? where does this output come from, and does it show that [370TB] number directly or did you add it?

    The output comes from smartctl -a /dev/nvme[0-1]n1

    On some forums I've found this:

    According to page 9 this Intel SSD Data Center release note doc, Firmware 8DV10130/8B1B012D fixed a "Drive reports higher than actual data units read and write" issue.

    Back in times it was a bug, but the firmware updated came out in 2015, so I don't think it affects my server.

  • RhysRhys Member, Host Rep
    edited January 2019

    @Cdoe said:
    Back in times it was a bug, but the firmware updated came out in 2015, so I don't think it affects my server.

    Doesn't mean your drive got that firmware update.

  • @Rhys said:

    @Cdoe said:
    Back in times it was a bug, but the firmware updated came out in 2015, so I don't think it affects my server.

    Doesn't mean your drive got that firmware update.

    Well this model has been launched Q3 2016, so I think it's safe to assume the firmware is already patched.

  • eoleol Member

    @Cdoe said:
    Well this model has been launched Q3 2016, so I think it's safe to assume the firmware is already patched.

    Nope.

  • CdoeCdoe Member
    edited January 2019

    @eol said:

    @Cdoe said:
    Well this model has been launched Q3 2016, so I think it's safe to assume the firmware is already patched.

    Nope.

    Yep. Both NVMEs got firmware MDV10290 which is latest one available for this model.

  • eoleol Member

    @Cdoe said:

    @eol said:

    @Cdoe said:
    Well this model has been launched Q3 2016, so I think it's safe to assume the firmware is already patched.

    Nope.

    Yep. Both NVMEs got firmware MDV10290 which is latest one available for this model.

    Nice.

  • @eol said:

    @Cdoe said:

    @eol said:

    @Cdoe said:
    Well this model has been launched Q3 2016, so I think it's safe to assume the firmware is already patched.

    Nope.

    Yep. Both NVMEs got firmware MDV10290 which is latest one available for this model.

    Nice.

    Sorry I'm not familiar with forum's clowns. I answered you, because I thought you could help me with the issue. My bad!

  • eoleol Member

    @Cdoe said:
    Sorry I'm not familiar with forum's clowns.

    Me neither.

  • perennateperennate Member, Host Rep
    edited January 2019

    Cdoe said: Is this a firmware bug? Because when I divide data written by number of power on seconds it would be 180MB/s on average, which is not possible as the VMs are mostly idling.

    Cdoe said: iotop is pretty normal for low usage server, 500kB/s with 10-15MB/s spikes. Nothing even close to 180MB/s avg.

    Where did you get 180 MB/s? Seems to be 20 MB/s average.

    370 TB * (1024^2 MB/TB) / (5576 powered on hours) / (1hr/3600second) = 20

    One of us sucks at maths just need to figure out if it's you or me.

  • @perennate said:

    Where did you get 180 MB/s? Seems to be 20 MB/s average.

    370 TB * (1024^2 MB/TB) / (5576 powered on hours) / (1hr/3600second) = 20

    One of us sucks at maths just need to figure out if it's you or me.

    Yup, my bad. Anyway it's still way more than it should be according to zpool iostat

  • perennateperennate Member, Host Rep
    edited January 2019

    Cdoe said: Yup, my bad. Anyway it's still way more than it should be according to zpool iostat

    2 MB/s is still a lot if you just have 4 idling VMs. That means the VMs are writing 1 TB every six days. Might make sense to try to find the source of the high I/O.

    But otherwise check the SMART info again tomorrow and see what the rate of change is, as rm_ suggested.

  • eoleol Member

    iotop.

Sign In or Register to comment.