DaveW42 Posted September 12, 2023 Share Posted September 12, 2023 Hi, I am running Unraid 6.12.4 with two parity disks, 16 disks in the array, an NVME cache drive, and a few unassigned devices drives. Most of the drives are connected to one of two LSI Logic SAS9211-8I 8PORT Int 6GB Sata+SAS Pcie 2.0 cards, with the others connected to SATA ports on the motherboard. I also have an IO CREST Internal 5 Port Non-RAID SATA III 6GB/s M.2 B+M Key Adapter Card connected to one of two NVME slots on my motherboard (Asus ROG STRIX X570-E GAMING) to provide the opportunity to add a few more SATA drives. Before the problems emerged, no drives were connected to that IO Crest device. I had the case open and had installed an SSD drive (SK Hynix 1TB) to (I believe) a SATA port on the motherboard and (separately) an NVME drive (WD 2TB) to one of the open USB ports on the motherboard using an Inateck USB NVME enclosure, with the intention of using either of these for a new gaming VM. After sealing things up, putting the system back in place, and turning it on, the system lost power briefly. When it came back up Disk 1 and Parity 2 came up as disabled, with the contents of Disk 1 being emulated. I ran an extended Smart Test on the parity disk and there were no errors. I tried adding another HDD to the system (i.e., connecting to the IO Crest via the backplane on my computer case) with the intention of using it to replace Disk 1, but the new disk did not show in unassigned devices. I ran a regular smart test on Disk 1, and there were no errors. Given this I rebooted the array in safe mode, unassigned the disabled drives (Disk 1 and Parity 2), and then started up the array to make sure those drives were removed. I then rebooted in safe mode again, shutdown the array again, added Disk 1 and Parity 2 back in their original position, and started parity sync and the data rebuild process. Things went badly very quickly at this point, with disk 9 and disk 16 almost immediately coming up as disabled. I briefly saw a message flash about CRC errors involving an unassigned device. At around 26.6% of the Disk 1 data rebuild, I was no longer able to interact with the unraid system. However, I could see that a separate Unraid Win 10 virtual machine was still running without issue. I didn’t touch anything and about 15 minutes later the system became responsive again and I could click on menus etc. The system currently shows the following: · Parity 2: red x (parity device is disabled) · Disk 1: Green but lists as “unmountable: unsupported or no file system.” · Disk 9: red x, lists as “unmountable: unsupported or no file system.” · Disk 16: Green but lists as “unmountable: unsupported or no file system.” I don’t believe that Parity 2 has seen much/any real activity as a result of the rebuild. In its disabled state it shows 1 read, 4 writes, and 2 errors. Attached is the diagnostic file. In terms of next steps, should I power down the system, check all cables and make sure that the LSI cards are properly seated in the motherboard? Help would be greatly, greatly appreciated. Dave nas24-diagnostics-20230912-0015.zip Quote Link to comment
JorgeB Posted September 12, 2023 Share Posted September 12, 2023 There are read errors on at least four disks across both controllers, suggesting a power/connection problem, check/replace cables and/or use a different PSU if available and post new diags after array start. Quote Link to comment
DaveW42 Posted September 13, 2023 Author Share Posted September 13, 2023 Thanks, JorgeB. Will do. You've got me thinking that my current PSU might not be enough given the number of devices I am running (I think current PSU is 800w). I will buy a new PSU, install it, and then post when I have the new diags (might be two or three days). Dave Quote Link to comment
DaveW42 Posted September 16, 2023 Author Share Posted September 16, 2023 Ok, I purchased a new 1500W PSU (Corsair HX1500i). This should be more than enough power for the system. My previous PSU was 850W. I also checked the cables and the seating of the LSI cards and these looked ok. Attached is the new diagnostic file. Thanks! Dave nas24-diagnostics-20230915-2332.zip Quote Link to comment
JorgeB Posted September 16, 2023 Share Posted September 16, 2023 No disk errors so far, but you need to check filesystem on disk1 and emulated disk9, run it without -n Quote Link to comment
DaveW42 Posted September 16, 2023 Author Share Posted September 16, 2023 I ran the following in a terminal in maintenance mode: xfs_repair -v /dev/md9 xfs_repair -v /dev/md1 In both cases I receive error messages saying "No such file or directory" and "fatal error -- couldn't initialize XFS library" Dave Quote Link to comment
JorgeB Posted September 17, 2023 Share Posted September 17, 2023 That's for v6.11 and older, the link explains how to use the GUI to run it, but if you prefer the CLI add p1, e.g.: xfs_repair -v /dev/md9p1 Quote Link to comment
DaveW42 Posted September 17, 2023 Author Share Posted September 17, 2023 Thanks, JorgeB. Didn't realize I was supposed to use the GUI, and am happy to use it. Here is the output for Disk 9 with the -v option specified. Phase 1 - find and verify superblock... - block cache size set to 6101544 entries Phase 2 - using internal log - zero log... zero_log: head block 3794193 tail block 3794187 ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. Here is the output for Disk 1 with the -v option specified. Phase 1 - find and verify superblock... - block cache size set to 6071976 entries Phase 2 - using internal log - zero log... zero_log: head block 116970 tail block 116966 ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. Thanks! Dave Quote Link to comment
trurl Posted September 18, 2023 Share Posted September 18, 2023 4 hours ago, DaveW42 said: If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair Unraid has already determined the filesystem is unmountable, so you have to use -L Quote Link to comment
DaveW42 Posted September 18, 2023 Author Share Posted September 18, 2023 Thanks, trurl! Just to confirm, so I should go back to the GUI and this time use the following options for both Disk 9 and Disk 1? -vL Thanks, Dave Quote Link to comment
itimpi Posted September 18, 2023 Share Posted September 18, 2023 2 hours ago, DaveW42 said: Thanks, trurl! Just to confirm, so I should go back to the GUI and this time use the following options for both Disk 9 and Disk 1? -vL Thanks, Dave You need the -L, the v is optional. Quote Link to comment
DaveW42 Posted September 18, 2023 Author Share Posted September 18, 2023 Thanks! Below are the results for Disk 9. Dave Phase 1 - find and verify superblock... - block cache size set to 6101544 entries Phase 2 - using internal log - zero log... zero_log: head block 3794193 tail block 3794187 ALERT: The filesystem has valuable metadata changes in a log which is being destroyed because the -L option was used. - scan filesystem freespace and inode maps... clearing needsrepair flag and regenerating metadata sb_fdblocks 125178818, counted 127618391 - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 2 - agno = 8 - agno = 4 - agno = 5 - agno = 6 - agno = 1 - agno = 7 - agno = 9 - agno = 3 Phase 5 - rebuild AG headers and trees... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... Maximum metadata LSN (1:3794198) is ahead of log (1:2). Format log to cycle 4. XFS_REPAIR Summary Mon Sep 18 09:49:15 2023 Phase Start End Duration Phase 1: 09/18 09:44:43 09/18 09:44:43 Phase 2: 09/18 09:44:43 09/18 09:45:14 31 seconds Phase 3: 09/18 09:45:14 09/18 09:46:41 1 minute, 27 seconds Phase 4: 09/18 09:46:41 09/18 09:46:41 Phase 5: 09/18 09:46:41 09/18 09:46:42 1 second Phase 6: 09/18 09:46:42 09/18 09:48:03 1 minute, 21 seconds Phase 7: 09/18 09:48:03 09/18 09:48:03 Total run time: 3 minutes, 20 seconds done Quote Link to comment
JorgeB Posted September 18, 2023 Share Posted September 18, 2023 It should mount now. Quote Link to comment
DaveW42 Posted September 18, 2023 Author Share Posted September 18, 2023 Thanks! Below are the results for Disk 1. Dave Phase 1 - find and verify superblock... - block cache size set to 6071976 entries Phase 2 - using internal log - zero log... zero_log: head block 116970 tail block 116966 ALERT: The filesystem has valuable metadata changes in a log which is being destroyed because the -L option was used. - scan filesystem freespace and inode maps... clearing needsrepair flag and regenerating metadata - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - agno = 11 - agno = 12 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 2 - agno = 7 - agno = 12 - agno = 4 - agno = 5 - agno = 6 - agno = 9 - agno = 1 - agno = 10 - agno = 8 - agno = 11 - agno = 3 Phase 5 - rebuild AG headers and trees... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - agno = 11 - agno = 12 - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - agno = 11 - agno = 12 - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... Maximum metadata LSN (11:116995) is ahead of log (1:2). Format log to cycle 14. XFS_REPAIR Summary Mon Sep 18 10:05:02 2023 Phase Start End Duration Phase 1: 09/18 10:03:08 09/18 10:03:08 Phase 2: 09/18 10:03:08 09/18 10:03:37 29 seconds Phase 3: 09/18 10:03:37 09/18 10:03:49 12 seconds Phase 4: 09/18 10:03:49 09/18 10:03:49 Phase 5: 09/18 10:03:49 09/18 10:03:50 1 second Phase 6: 09/18 10:03:50 09/18 10:03:59 9 seconds Phase 7: 09/18 10:03:59 09/18 10:03:59 Total run time: 51 seconds done Quote Link to comment
JorgeB Posted September 18, 2023 Share Posted September 18, 2023 Should also mount. Quote Link to comment
DaveW42 Posted September 18, 2023 Author Share Posted September 18, 2023 Thanks, JorgeB! So I should restart the array (i.e., not in maintenance mode)? Thanks, Dave Quote Link to comment
itimpi Posted September 18, 2023 Share Posted September 18, 2023 Just now, DaveW42 said: Thanks, JorgeB! So I should restart the array (i.e., not in maintenance mode)? Thanks, Dave Yes. The disks should now mount fine. Quote Link to comment
DaveW42 Posted September 18, 2023 Author Share Posted September 18, 2023 Thanks! Quote Link to comment
DaveW42 Posted September 18, 2023 Author Share Posted September 18, 2023 Attached is the new diagnostics file. Thanks! Dave nas24-diagnostics-20230918-1036.zip Quote Link to comment
JorgeB Posted September 18, 2023 Share Posted September 18, 2023 If the emulated disk9 contents look correct you can rebuild on top, and re-sync parity at the same time: https://docs.unraid.net/unraid-os/manual/storage-management#rebuilding-a-drive-onto-itself Quote Link to comment
DaveW42 Posted September 18, 2023 Author Share Posted September 18, 2023 Contents of Disk 1 and Disk 9 look great, thank you!!! When rebuilding the drive back onto itself, should I UNassign both Disk 9 and Parity 2, or would I just unassign Disk 9? Thanks again!!!! Dave Quote Link to comment
JorgeB Posted September 18, 2023 Share Posted September 18, 2023 You can do both at the same time. Quote Link to comment
DaveW42 Posted September 18, 2023 Author Share Posted September 18, 2023 Got it, thank you. I will unassign both Disk 9 and Parity 2. Thank you! Dave Quote Link to comment
DaveW42 Posted September 18, 2023 Author Share Posted September 18, 2023 Thanks, JorgeB! Data rebuild is commencing as indicated. As an additional data point in case anyone is curious, despite having so many drives the parity rebuild process is only requiring an additional 30 watts with respect to power consumption (80 plus platinum PSU). Dave Quote Link to comment
DaveW42 Posted September 20, 2023 Author Share Posted September 20, 2023 The rebuild process and parity check has completed, and everything looks great (no data loss!) Thanks so much for all the help, JorgeB, itimpi, and trurl!!!! It is greatly appreciated. Dave 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.