Power Failure Recovery Help (please?)


Go to solution Solved by trurl,

Recommended Posts

Hi,

I am running V6.9.2.

Had a power failure overnight and system went down.

Initially my 2 data disks did not mount.  I started the array in maintenance mode and did xfs_repair under check file system status for both disks.  Restarted array after this, but disks still show crosses against them.  I can access the shares and contents, so bit confused.  Dashboard says disks are disabled.  Cant start docker.

Any suggestions?

Thanks

Rob

Link to comment

You are likely to get better informed feedback if you post your system’s diagnostics zip file.

 

If a disk shows a red ‘x’ then thus means that a write to it failed and Unraid has stopped using it (put it into a ‘disabled’ state).   If you have sufficient parity drives then Unraid will be emulating the drive and showing its contents using all the other array drives plus the parity drive to work out what should be on that drive.

 

The  normal to clear the disabled state is to rebuild the physical dtive to match the emulated one.  Do the ‘emulated drive(s) show the expected content, and do they have a list+found folder?    Your answers plus the diagnostics would help with deciding if this is the best next step.

 


 

  • Like 1
Link to comment

Is that the latest diagnostics? Both disabled/emulated drives are mountable, and since you have dual parity both can be rebuilt.

 

Since you rebooted after the disks became disabled can't see anything about that in syslog. Disks are fairly new and nothing in SMART report that suggests disk problems. Most likely bad connections.

 

Syslog is still showing corruption on disk1 though. There is no lost+found share.

 

Check filesystems again and capture the output so you can post it along with new diagnostics.

  • Thanks 1
Link to comment

Updated diagnostics and have re-run xfs_repair on disk 1

 

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
bad CRC for inode 2147483777
bad CRC for inode 2147483777, will rewrite
Bad mtime nsec 4182577976 on inode 2147483777, resetting to zero
Bad ctime nsec 4182577976 on inode 2147483777, resetting to zero
cleared inode 2147483777
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 1
        - agno = 0
        - agno = 2
        - agno = 3
clearing reflink flag on inodes when possible
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done

 

thez00-diagnostics-20220522-1356.zip

Link to comment

Hi - just pre-empting next steps... is below appropriate from manual?

 

Rebuilding a drive onto itself

There can be cases where it is determined that the reason a disk was disabled is due to an external factor and the disk drive appears to be fine. In such a case you need to take a slightly modified process to cause Unraid to rebuild a 'disabled' drive back onto the same drive.

  1. Stop array
  2. Unassign disabled disk
  3. Start array so the missing disk is registered
  4. Important: If the drive to be rebuilt is a data drive then check that the emulated drive is showing the content you expect to be there as the rebuild process simply makes the physical drive match the emulated one. Is this is not the case then you may want to ask in forums for advice on the best way to proceed.
  5. Stop array
  6. Reassign disabled disk
  7. Start array to begin rebuild. If you start the array in Maintenance mode you will need to press the Sync button to start the rebuild.

 

Link to comment

Those diagnostics look OK, no lost+found, but your docker.img is corrupt probably due to the corruption on disk1 since you have no cache. You will have to recreate it, but I recommend you disable Docker and VM Manager in Settings until you get your array back to normal. Do you plan to have a fast pool (cache) soon?

 

Do you have backups of anything important and irreplaceable?

 

 

  • Thanks 1
Link to comment

Luckily I am just transitioning across slowly, and learning as I go, so nothing totally lost, except my time.

Once I get a new CPU to run the show (instead of the crappy old dell lappy) the plan was to set up a cache on an SSD, but that is for the future when time and $$ permit. (and a working UPS)

In the meantime, if I run the procedure above will that get me back to square 1?

Or is there more to do with the docker. (I dont run any VM).

Thanks

Edited by robr
forgot
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.