Jump to content

Data rebuild failure following disabled drives after power loss


DaveW42

Recommended Posts

Hi, I am running Unraid 6.12.4 with two parity disks, 16 disks in the array, an NVME cache drive, and a few unassigned devices drives. Most of the drives are connected to one of two LSI Logic SAS9211-8I 8PORT Int 6GB Sata+SAS Pcie 2.0 cards, with the others connected to SATA ports on the motherboard.  I also have an IO CREST Internal 5 Port Non-RAID SATA III 6GB/s M.2 B+M Key Adapter Card connected to one of two NVME slots on my motherboard (Asus ROG STRIX X570-E GAMING) to provide the opportunity to add a few more SATA drives.  Before the problems emerged, no drives were connected to that IO Crest device.

 

I had the case open and had installed an SSD drive (SK Hynix 1TB) to (I believe) a SATA port on the motherboard and (separately) an NVME drive (WD 2TB) to one of the open USB ports on the motherboard using an Inateck USB NVME enclosure, with the intention of using either of these for a new gaming VM.  After sealing things up, putting the system back in place, and turning it on,  the system lost power briefly.  When it came back up Disk 1 and Parity 2 came up as disabled, with the contents of Disk 1 being emulated.  I ran an extended Smart Test on the parity disk and there were no errors.  I tried adding another HDD to the system (i.e., connecting to the IO Crest via the backplane on my computer case) with the intention of using it to replace Disk 1, but the new disk did not show in unassigned devices.  I ran a regular smart test on Disk 1, and there were no errors.  Given this I rebooted the array in safe mode, unassigned the disabled drives (Disk 1 and Parity 2), and then started up the array to make sure those drives were removed.  I then rebooted in safe mode again, shutdown the array again, added Disk 1 and Parity 2 back in their original position, and started parity sync and the data rebuild process. 

 

Things went badly very quickly at this point, with disk 9 and disk 16 almost immediately coming up as disabled.  I briefly saw a message flash about CRC errors involving an unassigned device.   At around 26.6% of the Disk 1 data rebuild, I was no longer able to interact with the unraid system.   However, I could see that a separate Unraid Win 10 virtual machine was still running without issue.  I didn’t touch anything and about 15 minutes later the system became responsive again and I could click on menus etc.  The system currently shows the following:

 

·        Parity 2: red x (parity device is disabled)

·        Disk 1: Green but lists as “unmountable: unsupported or no file system.” 

·        Disk 9: red x, lists as “unmountable: unsupported or no file system.”

·        Disk 16: Green but lists as “unmountable: unsupported or no file system.”

 

I don’t believe that Parity 2 has seen much/any real activity as a result of the rebuild.  In its disabled state it shows 1 read, 4 writes, and 2 errors.

 

Attached is the diagnostic file.  In terms of next steps, should I power down the system, check all cables and make sure that the LSI cards are properly seated in the motherboard?  Help would be greatly, greatly appreciated.

 

Dave

nas24-diagnostics-20230912-0015.zip

Link to comment

Thanks, JorgeB.  Didn't realize I was supposed to use the GUI, and am happy to use it. 

 

Here is the output for Disk 9 with the -v option specified.

 

 

Phase 1 - find and verify superblock... - block cache size set to 6101544 entries Phase 2 - using internal log - zero log... zero_log: head block 3794193 tail block 3794187 ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this.

 

Here is the output for Disk 1 with the -v option specified.

 

 

Phase 1 - find and verify superblock... - block cache size set to 6071976 entries Phase 2 - using internal log - zero log... zero_log: head block 116970 tail block 116966 ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this.

 

Thanks!

 

Dave

Link to comment

Thanks!

 

Below are the results for Disk 9.

 

Dave

 

Phase 1 - find and verify superblock...
        - block cache size set to 6101544 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 3794193 tail block 3794187
ALERT: The filesystem has valuable metadata changes in a log which is being
destroyed because the -L option was used.
        - scan filesystem freespace and inode maps...
clearing needsrepair flag and regenerating metadata
sb_fdblocks 125178818, counted 127618391
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 2
        - agno = 8
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 1
        - agno = 7
        - agno = 9
        - agno = 3
Phase 5 - rebuild AG headers and trees...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Maximum metadata LSN (1:3794198) is ahead of log (1:2).
Format log to cycle 4.

        XFS_REPAIR Summary    Mon Sep 18 09:49:15 2023

Phase        Start        End        Duration
Phase 1:    09/18 09:44:43    09/18 09:44:43
Phase 2:    09/18 09:44:43    09/18 09:45:14    31 seconds
Phase 3:    09/18 09:45:14    09/18 09:46:41    1 minute, 27 seconds
Phase 4:    09/18 09:46:41    09/18 09:46:41
Phase 5:    09/18 09:46:41    09/18 09:46:42    1 second
Phase 6:    09/18 09:46:42    09/18 09:48:03    1 minute, 21 seconds
Phase 7:    09/18 09:48:03    09/18 09:48:03

Total run time: 3 minutes, 20 seconds
done

 

 

Link to comment

Thanks!

 

Below are the results for Disk 1.

 

Dave

 

 

Phase 1 - find and verify superblock...
        - block cache size set to 6071976 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 116970 tail block 116966
ALERT: The filesystem has valuable metadata changes in a log which is being
destroyed because the -L option was used.
        - scan filesystem freespace and inode maps...
clearing needsrepair flag and regenerating metadata
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 2
        - agno = 7
        - agno = 12
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 9
        - agno = 1
        - agno = 10
        - agno = 8
        - agno = 11
        - agno = 3
Phase 5 - rebuild AG headers and trees...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Maximum metadata LSN (11:116995) is ahead of log (1:2).
Format log to cycle 14.

        XFS_REPAIR Summary    Mon Sep 18 10:05:02 2023

Phase        Start        End        Duration
Phase 1:    09/18 10:03:08    09/18 10:03:08
Phase 2:    09/18 10:03:08    09/18 10:03:37    29 seconds
Phase 3:    09/18 10:03:37    09/18 10:03:49    12 seconds
Phase 4:    09/18 10:03:49    09/18 10:03:49
Phase 5:    09/18 10:03:49    09/18 10:03:50    1 second
Phase 6:    09/18 10:03:50    09/18 10:03:59    9 seconds
Phase 7:    09/18 10:03:59    09/18 10:03:59

Total run time: 51 seconds
done
 
 

Link to comment

Thanks, JorgeB!   Data rebuild is commencing as indicated.

 

As an additional data point in case anyone is curious,  despite having so many drives the parity rebuild process is only requiring an additional 30 watts with respect to power consumption (80 plus platinum PSU).

 

Dave

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...