Jump to content

NVMe Drives Dying at Incredible Rate


Recommended Posts

I have an Unraid system consisting of 8x 8TB Sabrent Rocket 4 Plus drives. Counting my parity disk which crapped out today, 3 of the 8 drives have died within 3 months of light use. (Plex server, home assistant VM) The drives that have died so far have all been the ones getting the most use. (Parity disk and most heavily trafficked data disks) This makes me slightly suspicious that Unraid might be at least partially responsible for their infant mortality problems. So far, Sabrent support is replacing the drives as they die. But, I'm quickly becoming an extremely expensive customer, and just want to make sure I'm not doing something silly here. Whatever is going wrong is causing the drives to fail in such a catastrophic way that multiple computer BIOSes cannot detect the drives even exist anymore. (Or in one case the drive would be available so intermittently/briefly that you couldn't even get a SMART report off it.) I know that I'm going against the grain by using nvme drives as my data pool, which is not recommended/supported. But, it's not unsupported because there are like firmware corrupting, catastrophic problems here, right?

Link to comment

I am guessing that you are not exceeding or getting close to the drives TBW rating, that is fairly hard to do in such a short time.

 

So discounting that, do you see any anamolies like high temperature or any smart issues before they die? Probably sending the data to a metrics backend like influxdb might help with analysis

It is also possible that you ended up with a bad batch. Has any of the replaced drives died too? 

 

Unraid itself doesn't seem to do excessive use on drives. My parity drives are mostly found spun down, because I have a read heavy workload. But it really depends on how heavy your use is

Edited by apandey
Link to comment

No. I keep a pretty close eye on temps. This is all in an ITX form factor. But, one with quite good cooling. (3 noctua 120mm fans) The drives tend to idle at around 30C and get up around 55-60 while under load. (Pretty rare) Like you, I have set up a very read heavy environment. The parity drive likely did a complete write once when it was initialized and then a pretty light workload since. (Probably wrote a few hundred gigs since parity was first achieved)

Link to comment
  • 7 months later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...