Two drives went disabled and loads of errors all of a sudden. What do do from here?


detz

Recommended Posts

Diagnostics attached.

Got a warning yesterday morning which I forgot about until I tried to use plex. Logged in to find two drives disabled and a lot of errors on other drives. I rebooted, one drive was the missing and the other parity was still disabled. Decided to shutdown to be safe and ordered two new drives. What should I try before just swapping in the new drives?

 

1208765276_ScreenShot2021-08-16at8_35_50PM.thumb.png.f3e1f0cb8a7e8be2de935a519e65ad8b.png

cylon-diagnostics-20210816-2028.zip

Link to comment

Do the diagnostics correspond to the screenshot as the screenshot does not show any drive missing,  but the diagnostics suggest it is not online.  Since the diagnostics show parity and disks 4 and 5 are also offline I would suspect the disks are fine but that there is a problem with something common to those 4 drives such as SATA or Power cabling.

 

I would suggest powering down; checking all cabling; reboot; start the array; and post new diagnostics.   That might give a better clue on how to advise you to continue.

Link to comment
3 hours ago, itimpi said:

Do the diagnostics correspond to the screenshot as the screenshot does not show any drive missing,  but the diagnostics suggest it is not online.

Is that screenshot perhaps with the array started in Maintenance mode? No disk is showing a filesystem or size, for example.

Link to comment

Shutdown, rechecked everything (even changed some of the cables around to see if other drives would report bad) and I have the same results.

It wasn't missing, it was Unmountable, sorry. 

 

I rate smart on each (fast) and the both checked okay. They also show fine clicking on the disk info icon for each. Should I just start a rebuild? I'm assuming it's a controller issue and will be researching a new one as both of these were on the motherboard controller.

 

129936123_ScreenShot2021-08-17at12_34_11PM.thumb.png.8d1d4b746113120ef8766630f827202c.png

cylon-diagnostics-20210817-1233.zip

Edited by detz
Link to comment
24 minutes ago, itimpi said:

A rebuild will not clear an ‘unmountable state - the rebuilt disk would also be unmountable.

 

Handling of unmountable disks is covered here in the online documentation accessible via the ‘Manual link at the bottom of the Unraid GUI. 
 

 

Okay, so I think I should try a repair first on disk3. I ran the check and got, should I remove the -n and try a repair?

 

Phase 1 - find and verify superblock...
bad primary superblock - bad CRC in superblock !!!

attempting to find secondary superblock...
.found candidate secondary superblock...
verified secondary superblock...
would write modified primary superblock
Primary superblock would have been modified.
Cannot proceed further in no_modify mode.
Exiting now.

 

What option do I have for the parity drive, it doesn't have these same options? After reading the forums it looks like I can rebuild parity but I should probably get the array stable first, right? I'm nervous that if this a controller issue the longer I have an unprotected array the more likely it will flake out again and I'll start losing data.

Edited by detz
Link to comment
21 minutes ago, detz said:

What option do I have for the parity drive, it doesn't have these same options? After reading the forums it looks like I can rebuild parity but I should probably get the array stable first, right?

Parity doesn't have a filesystem to repair.

 

You have to rebuild both disabled disks. Usually repair before rebuilding is recommended if you are rebuilding to the same disk. If you are rebuilding to a new disk then rebuild before repair would be OK and might be a good test to see if your hardware problems are fixed.

 

It is possible that hardware problems are causing the unmountable condition, since all disks must be read to emulate the disabled disk.

Link to comment
17 minutes ago, detz said:

 

Okay, so I think I should try a repair first on disk3. I ran the check and got, should I remove the -n and try a repair?

 

Phase 1 - find and verify superblock...
bad primary superblock - bad CRC in superblock !!!

attempting to find secondary superblock...
.found candidate secondary superblock...
verified secondary superblock...
would write modified primary superblock
Primary superblock would have been modified.
Cannot proceed further in no_modify mode.
Exiting now.

 

What option do I have for the parity drive, it doesn't have these same options? After reading the forums it looks like I can rebuild parity but I should probably get the array stable first, right? I'm nervous that if this a controller issue the longer I have an unprotected array the more likely it will flake out again and I'll start losing data.

 

Yes - go for the repair and if prompted for it add the -L option.

 

if the repair goes well then when you restart the array in normal mode the disk should mount and you should see all your files.   If that is not the case or anything unexpected happens then report back for further advice.

 

For the parity drive it is a case of rebuilding it to remove the disabled state.

 

as to what caused the original problem it could be anything and just a momentary glitch, particularly if a cable was not perfectly seated.

Link to comment
29 minutes ago, itimpi said:

 

Yes - go for the repair and if prompted for it add the -L option.

 

if the repair goes well then when you restart the array in normal mode the disk should mount and you should see all your files.   If that is not the case or anything unexpected happens then report back for further advice.

 

For the parity drive it is a case of rebuilding it to remove the disabled state.

 

as to what caused the original problem it could be anything and just a momentary glitch, particularly if a cable was not perfectly seated.

 

Okay, I think that helped. I ran the check and it repairs a bunch of stuff then after running the check again I got

 

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 4
        - agno = 3
        - agno = 2
        - agno = 5
        - agno = 6
        - agno = 7
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

 

And the drive shows up but it's still disabled. I did a couple checks and the files list out but I think it's still emulated?

233387855_ScreenShot2021-08-17at2_42_24PM.thumb.png.4681130fb77625d7615a46c451cc40d2.png

 

Edited by detz
Link to comment

oof, there is a lot (TB) of data in the lost in found and free space is now almost 7 TB where the drive was almost full before.  I'm assuming this means it wasn't able to repair? What other options do I have here? 😞

Edited by detz
Link to comment
35 minutes ago, detz said:

oof, there is a lot (TB) of data in the lost in found and free space is now almost 7 TB where the drive was almost full before.  I'm assuming this means it wasn't able to repair? What other options do I have here? 😞


one option is to try and mount the original disk using the UD plugin to see if is in a better state than the emulated one (one reason not to rush into a rebuild over its contents).   You may have to use the option in the UD settings to change the UUID on the drive to avoid it clashing with the emulated dtive.

Link to comment
10 minutes ago, itimpi said:


one option is to try and mount the original disk using the UD plugin to see if is in a better state than the emulated one (one reason not to rush into a rebuild over its contents).   You may have to use the option in the UD settings to change the UUID on the drive to avoid it clashing with the emulated dtive.

 

If the lost and found is showing everything I can live with it being gone, but I'm confused why I'm losing data if I should be able to withstand a 2 drive failure. What did I do wrong here?

Link to comment
2 minutes ago, detz said:

 

If the lost and found is showing everything I can live with it being gone, but I'm confused why I'm losing data if I should be able to withstand a 2 drive failure. What did I do wrong here?

The problem is that a drive going unmountable means there is file system corruption (basically a write appeared to work, but the data written was wrong for some reason) and parity does not protect against this.   

Link to comment
56 minutes ago, itimpi said:


one option is to try and mount the original disk using the UD plugin to see if is in a better state than the emulated one (one reason not to rush into a rebuild over its contents).   You may have to use the option in the UD settings to change the UUID on the drive to avoid it clashing with the emulated dtive.

 

I use UD to just mount with the array stopped and I can browse the drive without issues. It's hard to tell if anything is gone though as files are so fragmented on the drives I'm not sure what should be here. Is there a way to put this drive back in operation and see what happens? 

Link to comment
1 hour ago, itimpi said:

free space is now almost 7 TB

1 hour ago, detz said:

If the lost and found is showing everything I can live with it being gone

The lost+found would have been created on that same disk which you say has a lot more free space than before so it must not have everything you thought was there.

 

2 minutes ago, detz said:

So after I did the UD I added it back to the array and it's unmountable again.

What exactly do you mean by "added it back to the array"? There is more than one way to interpret that and I'm not sure any of them are good ideas.

Link to comment
3 minutes ago, detz said:

So after I did the UD I added it back to the array and it's unmountable again


A bit worried that you said this.    Simply adding the disk back into the array is likely to have lost its contents as it probably triggered a rebuild - thus destroying its current contents and putting the emulated contents there instead.

Link to comment

I stopped the array and unassigned it (using the dropdown)

I then mounted it in UD and browsed around in ssh (pushed the Mount button as it showed up down there when I unselected it above)

I didn't start the array though.

I then re-selected it back in the dropdown "added it back" and started the array

Its unmountable

Edited by detz
Link to comment
9 hours ago, detz said:

I stopped the array and unassigned it (using the dropdown)

I then mounted it in UD and browsed around in ssh (pushed the Mount button as it showed up down there when I unselected it above)

I didn't start the array though.

I then re-selected it back in the dropdown "added it back" and started the array

Its unmountable

That was not the right thing to do - and you have probably caused the physical drive (that was OK) to be overwritten with the contents of the emulated drive (which was not).

 

EDIT:  actually on reading more carefully if you did not start the array with the disk unassigned you may be OK as that is the point at which Unraid commits drive assignments, and you also did not mention a rebuild starting on he drive.   I would try unassigning it again and checking that it can still be mounted by UD.   If so keep it unassigned until you get further advice.

 

Link to comment
19 minutes ago, detz said:

I stopped the array and unassigned it (using the dropdown)

I then mounted it in UD and browsed around in ssh (pushed the Mount button as it showed up down there when I unselected it above)

I didn't start the array though.

I then re-selected it back in the dropdown "added it back" and started the array

Its unmountable

Can you mount Unassigned Disks without the array started? I've never tried.

 

If the array was never started while the disk was Unassigned then maybe it doesn't start rebuild. Unless you mounted it read-only in UD it will be slightly out-of-sync though.

 

Post a screenshot and new diagnostics.

Link to comment
9 hours ago, itimpi said:

That was not the right thing to do - and you have probably caused the physical drive (that was OK) to be overwritten with the contents of the emulated drive (which was not).

 

EDIT:  actually on reading more carefully if you did not start the array with the disk unassigned you may be OK as that is the point at which Unraid commits drive assignments, and you also did not mention a rebuild starting on he drive.   I would try unassigning it again and checking that it can still be mounted by UD.   If so keep it unassigned until you get further advice.

 

 

Stopped array again, changed to 'no device' in dropdown (didn't start array) and the drive was mounted in UD correctly. I can browse the drive fine in ssh. I copied some files around and scp'd one locally and it was fine. Again, not sure if everything is there but it appears to be working.

 

9 hours ago, trurl said:

Can you mount Unassigned Disks without the array started? I've never tried.

 

If the array was never started while the disk was Unassigned then maybe it doesn't start rebuild. Unless you mounted it read-only in UD it will be slightly out-of-sync though.

 

Post a screenshot and new diagnostics.

 

I didn't mount it read-only but it doesn't appear to be re-built.

 

 

Screen Shot 2021-08-18 at 3.21.30 AM.png

Screen Shot 2021-08-18 at 3.21.34 AM.png

cylon-diagnostics-20210818-0321.zip

Link to comment

Do you have a spare disk you could put in as disk3?   If so then you could take the following approach:

  • run a check/repair on the emulated disk3.   I am not sure of the current state of things but running an extra check/repair will not do any harm.  At this point the emulated disk should mount when you restart the array in normal mode although you may have some (or even a lot) of files in the lost+found folder.
  • put the spare disk in as disk3 and let UnRaid build the emulated contents onto that physical disk.
  • after the rebuild is finished (optionally) discard the contents of the lost+Found folder
  • use rsync (or any similar copying software) to copy all files that do not exist on the new disk3 from the old disk3 mounted in UD to the new disk3.    Hopefully this will pick up any files that ended up in lost+found 
  • If you did not do it earlier you can probably now discard the contents of the lost+found folder.

if you have no spare disk then you can take a similar approach although in this case you do not do the rebuild at the point mentioned above and instead copy the contents of the old disk3 mounted in UD onto the emulated disk before attempting any rebuild.    When that completes the old disk3 is now free to be used for rebuild purposes.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.