itlists Posted February 24, 2023 Posted February 24, 2023 (edited) Hello, Server has been running fine for years. Recently upgraded to 6.11.5 and that has been trouble-free as well. Few minutes ago, got some alerts that one of the parity drives and one data drive has an error. But there's no details about the error. Both drives show disabled. SMART self-test on the parity drive comes back with no errors. Performed server reboot and problem is still there. Diagnostics attached. Thanks for your help! Edited February 25, 2023 by itlists Quote
Solution trurl Posted February 24, 2023 Solution Posted February 24, 2023 1 hour ago, itlists said: Performed server reboot and problem is still there. Diagnostics attached. Diagnostics includes the current syslog, which is in RAM like the rest of the OS. Diagnostics can tell us how things are now, but can't tell us anything about what happened before boot. Disk3 has 199 UDMA_CRC_Error_Count -O-R-- 200 200 000 - 1347 These are recorded by the drive when it receives inconsistent data as determined by checksum. These are almost always connection problems. Often these won't cause a problem because the data is resent. And connection problems often don't result in CRC errors since the drive never receives any data to checksum. You should be getting a SMART warning ( 👎) for this disk on the Dashboard page. You can click on it to acknowledge and it will warn again if it increases. Other than that, SMART for disk3 looks OK, and SMART for parity looks OK. 1 hour ago, itlists said: SMART self-test on the parity drive comes back with no errors. According to SMART reports, parity has had no self-test run. Disk3 did pass some short tests, but that was a couple of years ago. Neither have had extended tests. Unraid disables a disk when a write to it fails for any reason. But the failed write updates parity so it can be recovered by rebuilding. And even though one of the disabled disks is parity, parity2 was updated. So the disks are now out-of-sync with the array and have been "kicked out". After a disk is disabled, it isn't used again until rebuilt. It is instead emulated by parity. Reads from the disk are emulated from the parity calculation by reading all other disks, and writes to the disk are emulated by updating parity so the emulated write can be read. The initial failed write is emulated, and any subsequent writes are emulated, and these can all be recovered by rebuilding. (In your case, the only parity still being read or updated is parity2 since parity is disabled). Bad connections are much more common than bad disks, and that is probably what happened here, but unless you have syslog from before reboot, can't say for sure. No obvious problems currently. Emulated disk3 is mounted and has plenty of data, so that's all good. The emulated contents is what you will get when you rebuild. Your configuration looks good. Most people would consider dual parity overkill since you only have 3 data disks in the array. It's usually safer to rebuild to spares and keep the originals in case of problems, but it should be OK to rebuild onto the same disks after checking all connections. 1 Quote
itlists Posted February 24, 2023 Author Posted February 24, 2023 10 hours ago, trurl said: Diagnostics includes the current syslog, which is in RAM like the rest of the OS. Diagnostics can tell us how things are now, but can't tell us anything about what happened before boot. Disk3 has 199 UDMA_CRC_Error_Count -O-R-- 200 200 000 - 1347 These are recorded by the drive when it receives inconsistent data as determined by checksum. These are almost always connection problems. Often these won't cause a problem because the data is resent. And connection problems often don't result in CRC errors since the drive never receives any data to checksum. You should be getting a SMART warning ( 👎) for this disk on the Dashboard page. You can click on it to acknowledge and it will warn again if it increases. Other than that, SMART for disk3 looks OK, and SMART for parity looks OK. According to SMART reports, parity has had no self-test run. Disk3 did pass some short tests, but that was a couple of years ago. Neither have had extended tests. Thanks for the comprehensive reply! Yes, disk3 has had many CRC errors in the past. I've reseated the drive previously and its been fine. It hasn't ever been disabled or kicked out of the array before. 10 hours ago, trurl said: Unraid disables a disk when a write to it fails for any reason. But the failed write updates parity so it can be recovered by rebuilding. And even though one of the disabled disks is parity, parity2 was updated. So the disks are now out-of-sync with the array and have been "kicked out". After a disk is disabled, it isn't used again until rebuilt. It is instead emulated by parity. Reads from the disk are emulated from the parity calculation by reading all other disks, and writes to the disk are emulated by updating parity so the emulated write can be read. The initial failed write is emulated, and any subsequent writes are emulated, and these can all be recovered by rebuilding. (In your case, the only parity still being read or updated is parity2 since parity is disabled). So does this mean that I have to go into 'New Config' and re-add the 'failed' parity drive and disk3 back into the array? I've physically removed the parity drive to test it in an external enclosure and its picked up fine by a laptop. So most likely the drive is good. 10 hours ago, trurl said: Bad connections are much more common than bad disks, and that is probably what happened here, but unless you have syslog from before reboot, can't say for sure. No obvious problems currently. Emulated disk3 is mounted and has plenty of data, so that's all good. The emulated contents is what you will get when you rebuild. Your configuration looks good. Most people would consider dual parity overkill since you only have 3 data disks in the array. It's usually safer to rebuild to spares and keep the originals in case of problems, but it should be OK to rebuild onto the same disks after checking all connections. I don't have spare drives to rebuild onto to, so will have to do it in-place on the existing array. See my question above - this requires doing 'New Config' and re-adding the parity and disk3 drives? Thanks! Quote
itimpi Posted February 24, 2023 Posted February 24, 2023 26 minutes ago, itlists said: See my question above - this requires doing 'New Config' and re-adding the parity and disk3 drives? NO! If you use New Config you are giving up the option to rebuild the existing contents and I do not think this is what you want? the process of rebuilding a disk onto itself is covered here in the online documentation accessible via the ‘Manual’ link at the bottom of the GUI or the DOCS link at the top of each forum page. 1 Quote
itlists Posted February 24, 2023 Author Posted February 24, 2023 (edited) 32 minutes ago, itimpi said: NO! If you use New Config you are giving up the option to rebuild the existing contents and I do not think this is what you want? the process of rebuilding a disk onto itself is covered here in the online documentation accessible via the ‘Manual’ link at the bottom of the GUI or the DOCS link at the top of each forum page. Gotcha! Thanks for the link. Will attempt this today. Rebuild has started... will take a day and a bit. Hopefully all good after that. Edited February 24, 2023 by itlists Quote
trurl Posted February 24, 2023 Posted February 24, 2023 2 hours ago, itlists said: physically removed the parity drive to test it in an external enclosure Generally simpler and much safer to test in Unraid. Quote
itlists Posted February 24, 2023 Author Posted February 24, 2023 2 hours ago, trurl said: Generally simpler and much safer to test in Unraid. How to do that? Quote
trurl Posted February 24, 2023 Posted February 24, 2023 Click on the disk to get to its page, go to Self-Test section or tab, click the button to do short test or extended test. 1 Quote
itlists Posted February 25, 2023 Author Posted February 25, 2023 2 hours ago, trurl said: Click on the disk to get to its page, go to Self-Test section or tab, click the button to do short test or extended test. Oh yes, the self-test was done and didn't show any errors, yet unRaid reported *some* error. Anyway, the rebuild is in progress... another 12 hrs to go. Quote
trurl Posted February 25, 2023 Posted February 25, 2023 51 minutes ago, itlists said: unRaid reported *some* error lots of things that might mean. Diagnostics before reboot might have given details. Quote
itlists Posted February 25, 2023 Author Posted February 25, 2023 14 hours ago, trurl said: lots of things that might mean. Diagnostics before reboot might have given details. K, will keep that in mind. Didn't know reboot clears the data needed for diag. Quote
itlists Posted February 25, 2023 Author Posted February 25, 2023 Quick update: parity check complete few minutes ago without errors. Looks all good now. Thank you! Quote
itlists Posted March 4, 2023 Author Posted March 4, 2023 (edited) Read errors again this morning. This time on all 5 drives Diagnostics attached. Going to re-assign the drives and rebuild array like previously and hopefully it will work again After stopping array and rebooting server, the devices are showing up as 'missing' and I can't unassign the slots, nor can I start the array. Any suggestions please? n3supernas-diagnostics-20230304-0801.zip Edited March 4, 2023 by itlists Quote
itimpi Posted March 4, 2023 Posted March 4, 2023 I would suspect something that is common to all these drives. Things that occur to me is the power cabling and if they are attached to an HBA whether that is properly seated in the motherboard. Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.