rts.empire Posted September 28, 2023 Share Posted September 28, 2023 Hi all, A couple of days ago, I had a disk failure - disk 5 was disabled. No problem, installed a new drive and went to rebuild, but it was estimated to take 180+ days. I checked my disk logs and found disk 3 was coming up with a bunch of IO errors. I cancelled that and changed out the cables and checked connections for Disk 3 in case I bumped something when I put the new drive in. Now disk 3 is active but states "unmountable/ no file system". I've done a few restarts while trying a few other things with the dead disk to see if data was recoverable and Disk 3 will sometimes mount and be readable (but with a bunch of IO errors and very slow read rates) and now is back to "unmountable". My plan is to replace the two failed disks with two new ones arriving tomorrow and reconfigure (with lost data) but after a couple of read errors on a third disk, I want to know if these issues are likely from something else (LSI card or PSU), my cancelling the rebuild when it was running slow or just bad luck and two drives failed at once. I have attached two diagnostics - first one is from the initial failure. Second one is from just now with the second disk issues (first failed disk is not mounted). unraid-diagnostics-first.zip unraid-diagnostics-second.zip Quote Link to comment
JorgeB Posted September 28, 2023 Share Posted September 28, 2023 There are still issues with disk3: Sep 28 12:02:02 UNRAID kernel: sd 4:0:2:0: [sdk] tag#2187 UNKNOWN(0x2003) Result: hostbyte=0x0b driverbyte=DRIVER_OK cmd_age=0s Sep 28 12:02:02 UNRAID kernel: sd 4:0:2:0: [sdk] tag#2187 CDB: opcode=0x88 88 00 00 00 00 00 00 00 00 40 00 00 00 08 00 00 Sep 28 12:02:02 UNRAID kernel: I/O error, dev sdk, sector 64 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 Sep 28 12:02:02 UNRAID kernel: md: disk3 read error, sector=0 Try swapping cables with another disk to see where the problem follows, there's also no SMART report for that disk, see if you can get one manually after that. Quote Link to comment
rts.empire Posted September 28, 2023 Author Share Posted September 28, 2023 I've removed disk5 and changed cables on disk3and it still says unmountable. Although it did run a short-SMART test (attached). At this stage I can accept if the drive is probably dead - but I want to be sure that's all it is before I pop more drives in as two went in quick succession (not caused by other hardware or faults) Thanks for your help unraid-smart-20230928-1920 (1).zip unraid-diagnostics-20230928-1930.zip Quote Link to comment
JorgeB Posted September 28, 2023 Share Posted September 28, 2023 Disk3 is still reporting read errors, and they are logged as a disk problem, SMART also shows some issues, since there's already a disabled disk best bet to try and recover the data would be to clone it with ddrescue, if you use a disk of the same capacity you can then try to rebuild the current disabled disk, can post the complete instructions later if you want. Quote Link to comment
rts.empire Posted September 28, 2023 Author Share Posted September 28, 2023 Thanks that would be great! Quote Link to comment
JorgeB Posted September 28, 2023 Share Posted September 28, 2023 Once you have the cloned disk (it needs to be the same capacity as the original or it won't mount) you can try this, note that it will only work if parity is valid: -Tools -> New Config -> Retain current configuration: All -> Apply -Check all assignments and assign any missing disk(s) if needed, including the cloned disk3 and the new disk5, replacement disk5 should be same size or larger than the old one -IMPORTANT - Check both "parity is already valid" and "maintenance mode" and start the array (note that the GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the checkbox, but it won't be as long as it's checked) -Stop array -Unassign disk5 -Start array (in normal mode now) and post new diags Quote Link to comment
rts.empire Posted September 29, 2023 Author Share Posted September 29, 2023 (edited) Currently trying to complete the clone using ddrescue. My array is is stopped and all the disks are unmounted so I can clone disk 3 to another disk of the same size. However, while doing so my parity drive (which was fine when I restarted unRAID and first stopped the array) suddenly dropped out of the array, now shows a "missing disk" in the parity spot and the parity drive is in unassigned devices, unable to be assigned to the array and having a heap of read errors. Meanwhile the new disk I am attempting to clone Disk3 too is dropping in and out of the unassigned devices. I could accept bad luck with two drives (which are older), but surely the parity drive (which is near new) hasn't given up the ghost too? unraid-diagnostics-20230929-2359.zip Edited September 29, 2023 by rts.empire Quote Link to comment
JorgeB Posted September 29, 2023 Share Posted September 29, 2023 That looks more like a power/connection issue, do you have another PSU you could use? Quote Link to comment
rts.empire Posted September 29, 2023 Author Share Posted September 29, 2023 I'll have to borrow the one from my desktop tomorrow and see how that goes. Quote Link to comment
rts.empire Posted October 2, 2023 Author Share Posted October 2, 2023 I now have a new PSU and replacement disk 5 installed. Disk 3 cloned as per the instructions above. I have posted diagnostics below but appears to all be working well so I will now rebuild disk 5, then thinking I will do extended SMART tests on all my disks and check they are okay. Thank you very much for your help! unraid-diagnostics-20231002-0921.zip 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.