schmidtflo Posted July 23 Share Posted July 23 Hey, following situation, my unraid server is version 6.11.5 (wanted to get my array to a good state again, before updating..) - My UnRaid was running fine, build two years ago, with 2x 18TB (1 parity) and a 14TB added later; also using 4x 6TB from my old Synology (having already over 7 years of runtime). - First disk started to show some errors (very few), parity was always able to correct it, disk was not disabled. - A second disk (in the logs disk3) started to show CRC errors now and then, I saw it can just occur and nothing to worry about. - I bought another 18TB, precleared it and kept it as hot spare - Now a few days ago I finally decided to retire the first disk with the errors, and replace it by the hot spare 18TB. Used the normal guide for that, then started to rebuild the array. - Suddenly, after a few hundred GB, disk3 started to throw a lot of errors (nearly as much errors as reads). Rebuild went down to 2MB/s for some hours. Later on, it continued normally with expected rebuild speeds. It finished a few hours ago, just showing that there were a lot of errors, but indicating parity still valid. - directly afterwards, disk3 was disabled because of the error counts. After a reboot (where I switched to a different SATA cable), it now is not even mountable anymore ("Unmountable: Wrong or no file system"). SMART check shows nothing strange (from what I can see), except for CRC errors. A extended smart check is running now. What should I do now? Might there be some data loss? I still have the old removed HDD, did nothing to it yet. The new 18TB HDD could hold the data of both failed disks without any issues, is there a way to move the data? galileo-diagnostics-20240723-1912.zip Quote Link to comment
schmidtflo Posted July 23 Author Share Posted July 23 Oh, and I just noticed, Fix common problems also reports "Unable to write to disk5: Drive mounted read-only or completely full." - which is the disk I just replaced. Quote Link to comment
JorgeB Posted July 23 Share Posted July 23 14 minutes ago, schmidtflo said: Suddenly, after a few hundred GB, disk3 started to throw a lot of errors (nearly as much errors as reads). At this point you should have stopped the rebuild, since it was rebuilding a corrupt disk. Disk3 look OK, lots of CRC errors so probably just needs a new SATA cable, is old disk5 still readable? Quote Link to comment
schmidtflo Posted July 23 Author Share Posted July 23 (edited) Old party history shows nothing too strange except for the rebuild. Quote Disk3 look OK, lots of CRC errors so probably just needs a new SATA cable I already switched to a different SATA port at the reboot, afterwards it was not mountable anymore. Quote is old disk5 still readable? I have dismounted it from the server. Should I reinstall it again to check? It was not disabled and still readable when I removed it. Edited July 23 by schmidtflo Quote Link to comment
JorgeB Posted July 23 Share Posted July 23 30 minutes ago, schmidtflo said: afterwards it was not mountable anymore. Because it's disable, see if it mounts with UD. 30 minutes ago, schmidtflo said: Should I reinstall it again to check? Yes, check it UD also, with the array stopped for this one. Quote Link to comment
schmidtflo Posted July 23 Author Share Posted July 23 Both were mounted successfully (GUI popup said error while mounting, but syslog seems fine, they are listed as mounted in GUI and can be accessed via terminal [browsing the folder structure without any obvious errors) I in between actually replaced the SATA cable (beforehand i just not used it anymore, but when reinstalling it the old disk i put back in also threw CRC errors, so I then used a factory new cable, now no error is shown anymore. So for me, both disks seem to be in a fine-ish state. How to continue now? Probably disk3 (with the high error amounts) can be mounted again without errors now, as the SATA Cable was swapped? Is then everything fine, or do I have to "restart" the rebuild? Quote Link to comment
schmidtflo Posted July 23 Author Share Posted July 23 As addon: The old disk (that I replaced with the new 18TB one) shows 11 CRC errors and 15 Reallocated sector count, which is why I wanted to replace it. Quote Link to comment
JorgeB Posted July 24 Share Posted July 24 Post new diags to see SMART for old disk5 Quote Link to comment
schmidtflo Posted July 24 Author Share Posted July 24 Sure, the disk5 i removed originally can be found as sdf in the smart logs galileo-diagnostics-20240724-0956.zip Quote Link to comment
JorgeB Posted July 24 Share Posted July 24 SMART is showing some issues, run an extended SMART test. Quote Link to comment
schmidtflo Posted July 24 Author Share Posted July 24 Yep, seems to be failing. My strategy would be, mount disk3 (that one with the broken SATA cable) again (how to do this, as unraid marks it as disabled?). Then, do I have to re-build the parity from disk5 (the actually exchanged one) again, like by formatting the new disk5 and start a rebuild? Or is it fine, as the rebuild wasn't cancelled by the system, even tho the errors occured? galileo-smart-20240724-1446.zip Quote Link to comment
trurl Posted July 24 Share Posted July 24 19 hours ago, schmidtflo said: Old party history shows nothing too strange except for the rebuild. Any parity errors at all needs to be understood and not dismissed as "nothing too strange". The screenshot suggests you have configured scheduled parity checks to correct parity errors. Better to schedule checks to not correct. If errors are found, determine the cause. After eliminating the cause, then correct parity. You don't want to allow a bad disk or connection to corrupt parity. The small number of parity errors on previous checks could be due to unclean shutdowns, or possibly RAM. Quote Link to comment
trurl Posted July 24 Share Posted July 24 If you were running a correcting check when you had the large number of parity errors, then probably you have corrupted parity so rebuild will not be good. Quote Link to comment
schmidtflo Posted July 24 Author Share Posted July 24 Quote If you were running a correcting check when you had the large number of parity errors, then probably you have corrupted parity so rebuild will not be good. The large number of errors only occured while rebuilding the array after replacing the faulty disk. Therefore my idea was to bring the SATA-cable-error disk back into the array (therefore array should be good except for the new disk with wrong parity calculated), then format the new disk to start another rebuild. What do you think of this, is this a viable solution? Quote Link to comment
schmidtflo Posted July 24 Author Share Posted July 24 Or, should I check a few files on disk5 (the replacement disk) if they are fine and not broken? Quote Link to comment
JorgeB Posted July 24 Share Posted July 24 Since disk5 is failing you can mount it with UD and try to copy what you can to new disk(s), or try to clone it with ddrescue, when done do a new config with the remaining disks to re-enable disk3. Quote Link to comment
schmidtflo Posted July 24 Author Share Posted July 24 50 minutes ago, JorgeB said: Since disk5 is failing you can mount it with UD and try to copy what you can to new disk(s), or try to clone it with ddrescue, when done do a new config with the remaining disks to re-enable disk3. Okay, just to make sure I understood correctly: - clone old disk5 with ddrescue to a brand new one (I'll just go and buy one) - take out new disk5 (18TB) that I just installed out of the NAS and not use it anymore (for now) - use cloned disk5 as regular disk5 in the array from now on, create a new config with all disks (all disks, including disk3 which is then useable again with new config as well as cloned disk5), this will re-generate the parity on the parity-drive Or can I use new disk5 (18TB), format it, and ddrescue old disk5 onto it, then using it as regular disk5 from now on? Quote Link to comment
JorgeB Posted July 24 Share Posted July 24 11 minutes ago, schmidtflo said: - clone old disk5 with ddrescue to a brand new one (I'll just go and buy one) - take out new disk5 (18TB) that I just installed out of the NAS and not use it anymore (for now) - use cloned disk5 as regular disk5 in the array from now on, create a new config with all disks (all disks, including disk3 which is then useable again with new config as well as cloned disk5), this will re-generate the parity on the parity-drive Correct, cloned disk should be the same capacity as the old disk5, or it won't mount on the array, it will still mount with UD if needed. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.