Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

NVMe Parity fails repeatedly

Featured Replies

Hi All,

I wanted to get some guidance to a nasty problem I am facing.

 

I have an all flash unraid box. 2x2TB SSDs, 2xNVMe (1x500GB, 1x 2TB).

I have the bigger NVMe assigned as the Parity Drive.

 

The Parity Drive failed for the second time now, within 4Months. (1st Failure after 1-2Months, got it replaced by the vendor, 2nd Failure now occured.)

 

Is there something fundamental that I am missing when using NVMe Parity Drives?

 

I assumed that I fried the first NVMe because it was mounted beneath the MoBo - hence I moved it to a CPU-Cooler Exposed position with the second.

Is a NVMe Heatsink a must? Are there other things I am missing?

 

Thanks for your help!

Solved by JorgeB

  • Community Expert

How much is/was the total write amount on that drive?

 

Parity gets as many writes as all other drives combined, and is always "100% full" as far as the drive is concerned which means it can't make use of most of its lifetime-extending tricks. 

 

Not a recommended setup, SSDs should be in a pool.

Edited by Kilrah

  • Author

So the Idea Would be to always have spinning rust as the parity? :(

I wanted to get a low energy box, so spinning rust was out of the picture.

 

Is there anything i could do to avoid spinning rust?

 

The total write amount wasn't too high - but the read was high

Edited by ivangoetelek

  • Community Expert

The diags are after a reboot, did you test power cycling the server in case the device dropped offline? If it did, just rebooting it's usually not enough to get it back.

  • Author

Indeed, I power cycled the machine.

 

I have issued an RMA, but I have no idea how to behave differently so i won't end up with another bricked NVME in 2mo...

 

Any Idea?

  • Community Expert

ZFS pool instead of array, but that'll be a poor solution for your selection of drives and need emptying everything out...

Edited by Kilrah

  • Community Expert

What model is the device? I've used an NVMe device as parity for some time and never had issues, of course, if you are writing extreme amounts of data there could be, you should be able to check how much was written to the other array devices on SMART, then add that up to see if it was anywhere close to the parity device TBW max.

  • Community Expert
10 hours ago, ivangoetelek said:

Is a NVMe Heatsink a must? Are there other things I am missing?

What were the temps? Flash memory actually loves heat, but their controllers do not. I could peg tmax on my NVMe in a well ventilated install with little effort, so I always add a heatsink.

 

For a ~4TB  low power NAS, I'd just forego parity and rely on backups for uptime, unless you really need uptime and cant afford a few hours downtime. 

  • Author
3 hours ago, Kilrah said:

ZFS pool instead of array, but that'll be a poor solution for your selection of drives and need emptying everything out...

I feel that this can't be the solution

2 hours ago, JorgeB said:

What model is the device? I've used an NVMe device as parity for some time and never had issues, of course, if you are writing extreme amounts of data there could be, you should be able to check how much was written to the other array devices on SMART, then add that up to see if it was anywhere close to the parity device TBW max.

Verbatim Vi3000 2TB - just some garbage NVMe - not much of anything, but it was cheap

In general I do not write that much of data to the array... I feel that I could write a lot more :D

I can't really figure out the SMART Reports - so I have no idea how much data I have actually written...

 

1 hour ago, Michael_P said:

What were the temps? Flash memory actually loves heat, but their controllers do not. I could peg tmax on my NVMe in a well ventilated install with little effort, so I always add a heatsink.

 

For a ~4TB  low power NAS, I'd just forego parity and rely on backups for uptime, unless you really need uptime and cant afford a few hours downtime. 

Temps were up to 85°C - 90°C while doing the parity check.

No parity also seems like a non option, any downtime is just a huge pain in the behind.

 

 

 

Isn't there anything I can do?

Is it possible that I had 2 consecutive Bad Devices - sounds like a challenge for high school statistics class :D

  • Community Expert

Unlikely, but for that use given it's seriously mistreated you'd really want a good quality TLC drive, not the cheapest of the bunch.

  • Community Expert
11 hours ago, ivangoetelek said:

Verbatim Vi3000 2TB - just some garbage NVMe - not much of anything, but it was cheap

Possibly just bad devices.

  • Author

hmpf - okay, so I'll just try again with the next Vi3000, and if that fails again in some months, I shall revive this topic...

 

Is there anything I can do when it comes to logging / diagnostics, so if the next device fails, I can easily isolate the actual problem? I understood that rebooting after the drive failed didn't help much.

  • Community Expert
Just now, ivangoetelek said:

understood that rebooting after the drive failed didn't help much.

As you surmised the syslog in the diagnostics is the RAM version that starts afresh every time the system is booted.  You should enable the syslog server (probably with the option to Mirror to Flash set) to get a syslog that survives a reboot so we can see what leads up to a crash.  The mirror to flash option is the easiest to set up (and if used the file is then automatically included in any diagnostics), but if you are worried about excessive wear on the flash drive you can put your server's address into the remote server field.  

 

2 minutes ago, ivangoetelek said:

so I'll just try again with the next Vi3000, and if that fails again in some months, I shall revive this topic...

It could be that the brand is the issue?  Perhaps you should try another one that might be  better quality?

  • Community Expert
21 minutes ago, ivangoetelek said:

try again with the next Vi3000

I would recommend using a better quality device

  • Community Expert
28 minutes ago, ivangoetelek said:

Is there anything I can do when it comes to logging / diagnostics, so if the next device fails, I can easily isolate the actual problem? I understood that rebooting after the drive failed didn't help much.

If it fails save the diagnostics before rebooting, the diagnostics package will include the SMART information (that you can also see by clicking on the drive slot name and going to the Attributes tab).

  • Author
22 minutes ago, itimpi said:

You should enable the syslog server

I do hava that enabled - could this be the cause for the parity failing?

 

I have found the section where the drive died - but I can't make much of it...

Syslog_nvme_dying.log

  • Community Expert
1 minute ago, ivangoetelek said:

I have found the section where the drive died - but I can't make much of it...

That shows that you appeared to complete the parity check successfully but the several hours later suddenly started getting read and write errors on the parity drive.   No indication I can see as to why.

  • Author

But Parity Check in general is rather a read intensive part instead of a write intensive part - which in turn means to me: Heat was the issue, not the TBW - what do you think?

  • Community Expert

Could be if you said it was reaching 90°C. A decent drive would throttle to limit temp, but...

  • Author

So I RMA'd the faulty drive - let's see what tehy say/do - also I have ordered a beefed up NVMe Cooler - maybe that helps.

 

But seriously - is it possible that the syslog is responsible for huge write effort?

Other than that I cannot imagine much write at all.

 

Is there any recommendation when it comes to e.g. appdata? to keep it away from the parity drive?

  • Community Expert

No, syslog is negligible.

  • Community Expert
48 minutes ago, ivangoetelek said:

also I have ordered a beefed up NVMe Cooler - maybe that helps

 

Doesn't need to be fancy, most anything will do

  • Community Expert

I will say, don't use the two piece ones (bottom plate and top heatsink), just use the top heatsink and leave the bottom as it is

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.