January 5, 20215 yr I searched around and could not find an answer, or possibly craft the correct search string. Either way here is my situation: I have a system set up with 2 parity drives and 5 data drives. These are spread across 3 motherboard SATA ports and 4 SATA ports on a PCIE card. This system has been a rock for years and was recently upgraded to v6. During a Parity-Check with write corrections to parity enabled the motherboard ports all stopped working (parity 1, disk 3 and 4) the sync error count was getting ridiculously high as well as the read error count on the drives that had disconnected. I also noticed the temps were no longer being displayed for the affected disks. I took a look at the array and saw a bunch of files were missing. I immediately shut the system down cleanly and started searching the net to find out what the problem could be. After reading about the Marvell issue I powered the system on, went into the BIOS and disabled virtualization extensions. Upon booting into unRAID all the data and parity disks were accounted for and green, the array came on-line as usual and the missing files were present and intact. I started a Parity-Check with write corrections to parity disabled. It is not very far in and I have zero read errors but the sync errors are in the 5 digits and climbing towards the massive number I saw before I initially shut down the system. I need a quick sanity check to make sure I understand the Parity-Check / write corrections to parity function. If I restart my Parity-Check with write corrections enabled will it simply overwrite the parity errors with the correct parity? Or do I need to worry about it corrupting currently valid files on the array? Since I did not have to rebuild I assume this is correct but want to make sure. Any advice would be much appreciated. side note: an LSI card is now on the way
January 5, 20215 yr Community Expert If possible before rebooting and preferably with the array started Go to Tools - Diagnostics and attach the complete Diagnostics ZIP file to your NEXT post in this thread.
January 5, 20215 yr Community Expert Parity check doesn't change data. Since you had a bad correcting parity check it isn't surprising their are sync errors. Go ahead and let the noncorrecting parity check finish just as a test that your BIOS changes have cured your Marvell problems for now. But you must correct parity.
January 5, 20215 yr Community Expert Diagnostics look fine except of course the sync errors in syslog. Since you have WD Red drives be sure to add SMART attributes 1 (and 200 if exists) to be monitored for each of those. Do you have Notifications setup to alert you immediately by email or other agent as soon as a problem is detected?
January 5, 20215 yr Author I do now, and those definitely looks like good attributes to monitor. I wish every support community was this helpful.
January 5, 20215 yr Community Expert If it happens again grab diags before rebooting, but sounds like the typical IOMMU related Ryzen on-board controller issue.
January 5, 20215 yr Author Same thing happened again overnight so I guess that is not my issue. alphao-diagnostics-20210105-0804.zip
January 5, 20215 yr Community Expert Looks like the same issue, without the IOMMU error since it's disable, look for a BIOS update, v6.9 also works better for some with this issue.
Archived
This topic is now archived and is closed to further replies.