JimmyC Posted March 28, 2022 Share Posted March 28, 2022 I was performing a routine upgrade from an old 1.5 TB drive to my previous 8TB parity drive (upgraded parity to 14 TB about a week back). Upon boot, I saw that I needed to assign disk 3 as expected. However, once I started the array and commenced data rebuild, disk 1 immediately entered an error state. Unraid alerts post to a slack channel I set up: Files Davis 4:46 PM Warning [TOWER] - Disk 3, drive not ready, content being reconstructed WDC_WD80EMAZ-00WJTA0_7JJYY38C (sdd) 4:47 Alert [TOWER] - Disk 1 in error state (disk dsbl) WDC_WD30EFRX-68AX9N0_WD-WMC1T3212961 (sdb) I then lost remote access to system completely (web/ssh/ping), although server was still powered on. I run it headless and didn't have a monitor handy so I power-cycled first to see if anything would change. Server booted but I lost ping again shortly after and never got to GUI on this boot. Pulled server and swapped SATA cable for Disk 1 with a spare, while migrating Disk 3 back to original drive, and booted again. Still seeing errors on Disk 1 and Disk 3 now shows not installed, guessing that was because I did commit the prior change before the Disk 1 problem. Started array in maint mode to run file system check as recommended in wiki. Results of reiserfsck on Disk 1: reiserfsck 3.6.27 Will read-only check consistency of the filesystem on /dev/md1 Will put log info to 'stdout' The problem has occurred looks like a hardware problem. If you have bad blocks, we advise you to get a new hard drive, because once you get one bad block that the disk drive internals cannot hide from your sight,the chances of getting more are generally said to become much higher (precise statistics are unknown to us), and this disk drive is probably not expensive enough for you to you to risk your time and data on it. If you don't want to follow that follow that advice then if you have just a few bad blocks, try writing to the bad blocks and see if the drive remaps the bad blocks (that means it takes a block it has in reserve and allocates it for use for of that block number). If it cannot remap the block, use badblock option (-B) with reiserfs utils to handle this block correctly. bread: Cannot read the block (2): (Input/output error). I'm at a point where I'm not sure what my next step should be to reduce potential for data loss. Since I've only been running single parity, I have an unrecoverable array currently, but I do still have the 1.5 TB drive with whatever data it contained and believe it to be in a working state. Diagnostics attached, albeit from my most recent boot only. I have not shut down or made any further array changes. tower-diagnostics-20220327-1856.zip Quote Link to comment
Solution JorgeB Posted March 28, 2022 Solution Share Posted March 28, 2022 Single parity can't emulate two disks, so no point in trying a filesystem check, you can force enable disk1 to try again to rebuild disk3. This will only work if parity is still valid: -Tools -> New Config -> Retain current configuration: All -> Apply -Check all assignments and assign any missing disk(s) if needed, including disk3 -IMPORTANT - Check both "parity is already valid" and "maintenance mode" and start the array (note that the GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the checkbox, but it won't be as long as it's checked) -Stop array -Unassign disk3 -Start array (in normal mode now), ideally the emulated disk3 will now mount and contents look correct, if it doesn't you should run a filesystem check on the emulated disk or post new diags -If the emulated disk mounts and contents look correct stop the array -Re-assign the disk to rebuild and start array to begin. Quote Link to comment
JimmyC Posted March 28, 2022 Author Share Posted March 28, 2022 I believe parity is still valid at this point and will proceed with this suggestion and advise. To clarify, would you want me to create the new config with new or old Disk 3 connected? I was thinking if I could restore previous config with valid parity and emulate possibly failing disk 1, I could then use my spare 8TB to replace the failed 3 TB disk rather than the smaller 1.5 TB for now. Quote Link to comment
JorgeB Posted March 28, 2022 Share Posted March 28, 2022 2 minutes ago, JimmyC said: To clarify, would you want me to create the new config with new or old Disk 3 connected? That is with the new disk, if that fails or if you still have the old disk and it's healthy you can do a new config with it, sync parity, then try upgrading again. Quote Link to comment
JimmyC Posted March 28, 2022 Author Share Posted March 28, 2022 I already had the old Disk 3 connected so I tried the new config operation as outlined and am no longer seeing Disk 1 disabled in maintenance mode. I have a parity-check (no corrections) running currently just to verify where everything stands. I've reviewed Disk 1 SMART data and am not terribly concerned for the overall health of that drive. I would guess that I must have bumped SATA cable when I was in the case and just caused a temporary comms issue on it. I feel like you have me on the right track here, but will report back in a day or two when I have finished check and rebuild. I also feel now like this is some simple troubleshooting I should have known about, but (knock on wood) I've been running Unraid for about a decade and this is the largest problem I've encountered. Pretty solid code. Quote Link to comment
JimmyC Posted April 4, 2022 Author Share Posted April 4, 2022 Just to update, I was able to use this guide to revive my array with no data loss. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.