squarepeg Posted January 5, 2021 Share Posted January 5, 2021 I searched around and could not find an answer, or possibly craft the correct search string. Either way here is my situation: I have a system set up with 2 parity drives and 5 data drives. These are spread across 3 motherboard SATA ports and 4 SATA ports on a PCIE card. This system has been a rock for years and was recently upgraded to v6. During a Parity-Check with write corrections to parity enabled the motherboard ports all stopped working (parity 1, disk 3 and 4) the sync error count was getting ridiculously high as well as the read error count on the drives that had disconnected. I also noticed the temps were no longer being displayed for the affected disks. I took a look at the array and saw a bunch of files were missing. I immediately shut the system down cleanly and started searching the net to find out what the problem could be. After reading about the Marvell issue I powered the system on, went into the BIOS and disabled virtualization extensions. Upon booting into unRAID all the data and parity disks were accounted for and green, the array came on-line as usual and the missing files were present and intact. I started a Parity-Check with write corrections to parity disabled. It is not very far in and I have zero read errors but the sync errors are in the 5 digits and climbing towards the massive number I saw before I initially shut down the system. I need a quick sanity check to make sure I understand the Parity-Check / write corrections to parity function. If I restart my Parity-Check with write corrections enabled will it simply overwrite the parity errors with the correct parity? Or do I need to worry about it corrupting currently valid files on the array? Since I did not have to rebuild I assume this is correct but want to make sure. Any advice would be much appreciated. side note: an LSI card is now on the way Quote Link to comment
trurl Posted January 5, 2021 Share Posted January 5, 2021 If possible before rebooting and preferably with the array started Go to Tools - Diagnostics and attach the complete Diagnostics ZIP file to your NEXT post in this thread. Quote Link to comment
squarepeg Posted January 5, 2021 Author Share Posted January 5, 2021 Here is my diag zip, thanks! alphao-diagnostics-20210104-2217.zip Quote Link to comment
trurl Posted January 5, 2021 Share Posted January 5, 2021 Parity check doesn't change data. Since you had a bad correcting parity check it isn't surprising their are sync errors. Go ahead and let the noncorrecting parity check finish just as a test that your BIOS changes have cured your Marvell problems for now. But you must correct parity. Quote Link to comment
squarepeg Posted January 5, 2021 Author Share Posted January 5, 2021 Great, will do, thanks again! Quote Link to comment
trurl Posted January 5, 2021 Share Posted January 5, 2021 Diagnostics look fine except of course the sync errors in syslog. Since you have WD Red drives be sure to add SMART attributes 1 (and 200 if exists) to be monitored for each of those. Do you have Notifications setup to alert you immediately by email or other agent as soon as a problem is detected? Quote Link to comment
squarepeg Posted January 5, 2021 Author Share Posted January 5, 2021 I do now, and those definitely looks like good attributes to monitor. I wish every support community was this helpful. Quote Link to comment
JorgeB Posted January 5, 2021 Share Posted January 5, 2021 If it happens again grab diags before rebooting, but sounds like the typical IOMMU related Ryzen on-board controller issue. Quote Link to comment
squarepeg Posted January 5, 2021 Author Share Posted January 5, 2021 Same thing happened again overnight so I guess that is not my issue. alphao-diagnostics-20210105-0804.zip Quote Link to comment
JorgeB Posted January 5, 2021 Share Posted January 5, 2021 Looks like the same issue, without the IOMMU error since it's disable, look for a BIOS update, v6.9 also works better for some with this issue. Quote Link to comment
squarepeg Posted January 11, 2021 Author Share Posted January 11, 2021 Dropped in an LSI 9211 in IT mode and that seems to have solved it. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.