taalas Posted October 1, 2019 Share Posted October 1, 2019 Hi, I have been using my Unraid server for a couple of years now. Last night during a monthly parity check some errors occurred while reading from one of the data drives (main page shows 917 errors for that drive). The system log shows those read errors as well: Oct 1 04:50:26 Spire kernel: md: disk6 read error, sector=3689965144 Oct 1 04:50:26 Spire kernel: md: disk6 read error, sector=3689965152 Oct 1 04:50:26 Spire kernel: md: disk6 read error, sector=3689965160 Oct 1 04:50:26 Spire kernel: md: disk6 read error, sector=3689965168 Oct 1 04:50:26 Spire kernel: md: disk6 read error, sector=3689965176 The parity check that is run monthly by the scheduler is a correcting parity check. When finished it stated that 2 parity errors were corrected. I assume that parity on this 2 positions is now wrong since the data from the data drive could not be read, I am not sure though. Is there any way to determine which files are actually affected by this read operation? What is the best way to proceed now? I normally would have assumed that I replace the defective drive with a new one and let it rebuild parity, but since parity was corrected (possibly with wrong values) I am not so sure anymore. Should I stop the mover schedule for now to prevent further writing to the faulty drive? Any other recommendations? Quote Link to comment
Frank1940 Posted October 1, 2019 Share Posted October 1, 2019 First thing, get a Diagnostics file ( Tools >>> Diagnostics ) and upload it to a NEW post. This gives the Guru's some real meat to work on. Second thing, I, personally, would not be writing any new data to the array until I could get this resolved. (Your data is 'safe' on the cache drive at this point.) Question: Did the parity check actually complete or was it aborted? Quote Link to comment
taalas Posted October 2, 2019 Author Share Posted October 2, 2019 The parity check completed (stating that it corrected 2 parity errors), hence my thought that it might have read faulty from the failing drive and written wrong parity in those 2 places. It is a correcting parity check that runs on the first of every month. Since I won't be home the next days I am thinking about shutting down the server until I have a plan on how to proceed. Quote Link to comment
taalas Posted October 10, 2019 Author Share Posted October 10, 2019 Any advice on how to proceed fixing this would be really appreciated. I would like to replace the failing drive as soon as possible just wanting to make sure I am doing it correctly. Quote Link to comment
JorgeB Posted October 10, 2019 Share Posted October 10, 2019 On 10/1/2019 at 6:26 PM, taalas said: The parity check that is run monthly by the scheduler is a correcting parity check. Change that, it should be non correct. On 10/1/2019 at 6:26 PM, taalas said: Is there any way to determine which files are actually affected by this read operation? Not easily without pre existing checksums or if using btrfs, you can just rebuild the disk and then use ddrescue on the old one, it will identify the affected files and then you can replace them form a backup if available. Quote Link to comment
taalas Posted October 10, 2019 Author Share Posted October 10, 2019 Thanks! So if I replace the drive and let it rebuild from parity, what is the supposed damage if the last (correcting) parity check detected 2 parity errors (and possibly made 2 faulty corrections on the parity disk). 2 files, 2 sectors? The log showed a lot more read errors (917) but the parity check ended with 2 errors. Quote Link to comment
JorgeB Posted October 10, 2019 Share Posted October 10, 2019 2 sectors, most likely on the same file, with some luck it might no even be on a file. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.