Jump to content

Parity drive error unRaid 6.11.5


itlists
Go to solution Solved by trurl,

Recommended Posts

Posted (edited)

Hello,

 

Server has been running fine for years. Recently upgraded to 6.11.5 and that has been trouble-free as well.

 

Few minutes ago, got some alerts that one of the parity drives and one data drive has an error. But there's no details about the error. Both drives show disabled.

SMART self-test on the parity drive comes back with no errors.

Performed server reboot and problem is still there.

 

Diagnostics attached.

 

Thanks for your help!

 

Edited by itlists
  • Solution
Posted
1 hour ago, itlists said:

Performed server reboot and problem is still there.

 

Diagnostics attached.

Diagnostics includes the current syslog, which is in RAM like the rest of the OS. Diagnostics can tell us how things are now, but can't tell us anything about what happened before boot.

 

Disk3 has

199 UDMA_CRC_Error_Count    -O-R--   200   200   000    -    1347

These are recorded by the drive when it receives inconsistent data as determined by checksum. These are almost always connection problems. Often these won't cause a problem because the data is resent. And connection problems often don't result in CRC errors since the drive never receives any data to checksum. You should be getting a SMART warning ( 👎) for this disk on the Dashboard page. You can click on it to acknowledge and it will warn again if it increases.

 

Other than that, SMART for disk3 looks OK, and SMART for parity looks OK.

1 hour ago, itlists said:

SMART self-test on the parity drive comes back with no errors.

According to SMART reports, parity has had no self-test run. Disk3 did pass some short tests, but that was a couple of years ago. Neither have had extended tests.

 

Unraid disables a disk when a write to it fails for any reason. But the failed write updates parity so it can be recovered by rebuilding. And even though one of the disabled disks is parity, parity2 was updated. So the disks are now out-of-sync with the array and have been "kicked out".

 

After a disk is disabled, it isn't used again until rebuilt. It is instead emulated by parity. Reads from the disk are emulated from the parity calculation by reading all other disks, and writes to the disk are emulated by updating parity so the emulated write can be read. The initial failed write is emulated, and any subsequent writes are emulated, and these can all be recovered by rebuilding. (In your case, the only parity still being read or updated is parity2 since parity is disabled).

 

Bad connections are much more common than bad disks, and that is probably what happened here, but unless you have syslog from before reboot, can't say for sure. No obvious problems currently.

 

Emulated disk3 is mounted and has plenty of data, so that's all good. The emulated contents is what you will get when you rebuild.

 

Your configuration looks good. Most people would consider dual parity overkill since you only have 3 data disks in the array.

 

It's usually safer to rebuild to spares and keep the originals in case of problems, but it should be OK to rebuild onto the same disks after checking all connections.

 

  • Thanks 1
Posted
10 hours ago, trurl said:

Diagnostics includes the current syslog, which is in RAM like the rest of the OS. Diagnostics can tell us how things are now, but can't tell us anything about what happened before boot.

 

Disk3 has

199 UDMA_CRC_Error_Count    -O-R--   200   200   000    -    1347

These are recorded by the drive when it receives inconsistent data as determined by checksum. These are almost always connection problems. Often these won't cause a problem because the data is resent. And connection problems often don't result in CRC errors since the drive never receives any data to checksum. You should be getting a SMART warning ( 👎) for this disk on the Dashboard page. You can click on it to acknowledge and it will warn again if it increases.

 

Other than that, SMART for disk3 looks OK, and SMART for parity looks OK.

According to SMART reports, parity has had no self-test run. Disk3 did pass some short tests, but that was a couple of years ago. Neither have had extended tests.

Thanks for the comprehensive reply!

 

Yes, disk3 has had many CRC errors in the past. I've reseated the drive previously and its been fine. It hasn't ever been disabled or kicked out of the array before.

 

10 hours ago, trurl said:

Unraid disables a disk when a write to it fails for any reason. But the failed write updates parity so it can be recovered by rebuilding. And even though one of the disabled disks is parity, parity2 was updated. So the disks are now out-of-sync with the array and have been "kicked out".

 

After a disk is disabled, it isn't used again until rebuilt. It is instead emulated by parity. Reads from the disk are emulated from the parity calculation by reading all other disks, and writes to the disk are emulated by updating parity so the emulated write can be read. The initial failed write is emulated, and any subsequent writes are emulated, and these can all be recovered by rebuilding. (In your case, the only parity still being read or updated is parity2 since parity is disabled).

So does this mean that I have to go into 'New Config' and re-add the 'failed' parity drive and disk3 back into the array?

I've physically removed the parity drive to test it in an external enclosure and its picked up fine by a laptop. So most likely the drive is good.

 

 

10 hours ago, trurl said:

Bad connections are much more common than bad disks, and that is probably what happened here, but unless you have syslog from before reboot, can't say for sure. No obvious problems currently.

 

Emulated disk3 is mounted and has plenty of data, so that's all good. The emulated contents is what you will get when you rebuild.

 

Your configuration looks good. Most people would consider dual parity overkill since you only have 3 data disks in the array.

 

It's usually safer to rebuild to spares and keep the originals in case of problems, but it should be OK to rebuild onto the same disks after checking all connections.

 

I don't have spare drives to rebuild onto to, so will have to do it in-place on the existing array.

See my question above - this requires doing 'New Config' and re-adding the parity and disk3 drives?

 

Thanks!

Posted
26 minutes ago, itlists said:

See my question above - this requires doing 'New Config' and re-adding the parity and disk3 drives?

NO!   If you use New Config you are giving up the option to rebuild the existing contents and I do not think this is what you want?

 

the process of rebuilding a disk onto itself is covered here in the online documentation accessible via the ‘Manual’ link at the bottom of the GUI or the DOCS link at the top of each forum page.

  • Thanks 1
Posted (edited)
32 minutes ago, itimpi said:

NO!   If you use New Config you are giving up the option to rebuild the existing contents and I do not think this is what you want?

 

the process of rebuilding a disk onto itself is covered here in the online documentation accessible via the ‘Manual’ link at the bottom of the GUI or the DOCS link at the top of each forum page.

Gotcha! Thanks for the link. Will attempt this today.

 

 

Rebuild has started... will take a day and a bit. Hopefully all good after that.

 

Edited by itlists
Posted
2 hours ago, trurl said:

Click on the disk to get to its page, go to Self-Test section or tab, click the button to do short test or extended test.

Oh yes, the self-test was done and didn't show any errors, yet unRaid reported *some* error.

Anyway, the rebuild is in progress... another 12 hrs to go.

Posted (edited)

Read errors again this morning. This time on all 5 drives :o

 

Diagnostics attached.

 

Going to re-assign the drives and rebuild array like previously and hopefully it will work again

 

After stopping array and rebooting server, the devices are showing up as 'missing' and I can't unassign the slots, nor can I start the array.

Any suggestions please?

 

1370887841_Screenshot2023-03-04at8_10_40AM.png.3d4b989bfb8278501f168f3546690f97.png

n3supernas-diagnostics-20230304-0801.zip

Edited by itlists
Posted

I would suspect something that is common to all these drives.   

 

Things that occur to me is the power cabling and if they are attached to an HBA whether that is properly seated in the motherboard.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...