Jump to content

v6.9.2 Disk 1 entered error state during Disk 3 upgrade - Data Lost?


Go to solution Solved by JorgeB,

Recommended Posts

I was performing a routine upgrade from an old 1.5 TB drive to my previous 8TB parity drive (upgraded parity to 14 TB about a week back).  Upon boot, I saw that I needed to assign disk 3 as expected.  However, once I started the array and commenced data rebuild, disk 1 immediately entered an error state.  Unraid alerts post to a slack channel I set up:


Files Davis  4:46 PM
Warning [TOWER] - Disk 3, drive not ready, content being reconstructed
WDC_WD80EMAZ-00WJTA0_7JJYY38C (sdd)
4:47
Alert [TOWER] - Disk 1 in error state (disk dsbl)
WDC_WD30EFRX-68AX9N0_WD-WMC1T3212961 (sdb)

 

I then lost remote access to system completely (web/ssh/ping), although server was still powered on.  I run it headless and didn't have a monitor handy so I power-cycled first to see if anything would change.  Server booted but I lost ping again shortly after and never got to GUI on this boot.  Pulled server and swapped SATA cable for Disk 1 with a spare, while migrating Disk 3 back to original drive, and booted again.  Still seeing errors on Disk 1  and Disk 3 now shows not installed, guessing that was because I did commit the prior change before the Disk 1 problem.  Started array in maint mode to run file system check as recommended in wiki.  Results of reiserfsck on Disk 1:

 

reiserfsck 3.6.27

 

Will read-only check consistency of the filesystem on /dev/md1
Will put log info to 'stdout'

 

The problem has occurred looks like a hardware problem. If you have
bad blocks, we advise you to get a new hard drive, because once you
get one bad block  that the disk  drive internals  cannot hide from
your sight,the chances of getting more are generally said to become
much higher  (precise statistics are unknown to us), and  this disk
drive is probably not expensive enough  for you to you to risk your
time and  data on it.  If you don't want to follow that follow that
advice then  if you have just a few bad blocks,  try writing to the
bad blocks  and see if the drive remaps  the bad blocks (that means
it takes a block  it has  in reserve  and allocates  it for use for
of that block number).  If it cannot remap the block,  use badblock
option (-B) with  reiserfs utils to handle this block correctly.

 

bread: Cannot read the block (2): (Input/output error).

 

I'm at a point where I'm not sure what my next step should be to reduce potential for data loss.  Since I've only been running single parity, I have an unrecoverable array currently, but I do still have the 1.5 TB drive with whatever data it contained and believe it to be in a working state.  Diagnostics attached, albeit from my most recent boot only.  I have not shut down or made any further array changes.

tower-diagnostics-20220327-1856.zip

Link to comment
  • Solution

Single parity can't emulate two disks, so no point in trying a filesystem check, you can force enable disk1 to try again to rebuild disk3.

 

This will only work if parity is still valid:

 

-Tools -> New Config -> Retain current configuration: All -> Apply
-Check all assignments and assign any missing disk(s) if needed, including disk3
-IMPORTANT - Check both "parity is already valid" and "maintenance mode" and start the array (note that the GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the checkbox, but it won't be as long as it's checked)
-Stop array
-Unassign disk3
-Start array (in normal mode now), ideally the emulated disk3 will now mount and contents look correct, if it doesn't you should run a filesystem check on the emulated disk or post new diags
-If the emulated disk mounts and contents look correct stop the array
-Re-assign the disk to rebuild and start array to begin.

 

 

Link to comment

I believe parity is still valid at this point and will proceed with this suggestion and advise.

 

To clarify, would you want me to create the new config with new or old Disk 3 connected?  I was thinking if I could restore previous config with valid parity and emulate possibly failing disk 1, I could then use my spare 8TB to replace the failed 3 TB disk rather than the smaller 1.5 TB for now.

Link to comment

I already had the old Disk 3 connected so I tried the new config operation as outlined and am no longer seeing Disk 1 disabled in maintenance mode.  I have a parity-check (no corrections) running currently just to verify where everything stands. 

 

I've reviewed Disk 1 SMART data and am not terribly concerned for the overall health of that drive.  I would guess that I must have bumped SATA cable when I was in the case and just caused a temporary comms issue on it.

 

I feel like you have me on the right track here, but will report back in a day or two when I have finished check and rebuild.

 

I also feel now like this is some simple troubleshooting I should have known about, but (knock on wood) I've been running Unraid for about a decade and this is the largest problem I've encountered.  Pretty solid code.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...