Adrian Posted July 18, 2022 Share Posted July 18, 2022 So something happened and 2 drives are marked red. I think one is actually dead and the other is fine. I'm running an extended SmartTest on one of them and then the other. I have dual parity. What's the process to "recover" from the two red drives if 1 or both are bad? Do I simply replace the bad drive(s) and rebuild? I've never done this with more than 1 drive. Quote Link to comment
trurl Posted July 18, 2022 Share Posted July 18, 2022 attach diagnostics to your NEXT post in this thread Quote Link to comment
Adrian Posted July 18, 2022 Author Share Posted July 18, 2022 mediaserver-diagnostics-20220718-1723.zip FYI, this is unfortunately captured after I rebooted the server. Quote Link to comment
trurl Posted July 18, 2022 Share Posted July 18, 2022 Both disks look fine, neither have completed extended test yet. You will probably have to disable spindown on the disks to get that to complete. 15 minutes ago, Adrian said: captured after I rebooted Didn't notice anything in current syslog, can't say what happened earlier of course. And can't tell whether any drives are unmountable since you haven't started the array. Do that and post new diagnostics. Quote Link to comment
Adrian Posted July 18, 2022 Author Share Posted July 18, 2022 7 minutes ago, trurl said: Both disks look fine, neither have completed extended test yet. You will probably have to disable spindown on the disks to get that to complete. Didn't notice anything in current syslog, can't say what happened earlier of course. And can't tell whether any drives are unmountable since you haven't started the array. Do that and post new diagnostics. Yea that's what I'm hoping. I'm currently running an extended test on one of the drives. Probably take another 6-8 hours to complete. When that's done, I'll try the other drive. And yup, I disabled the spindown. ok, so if the extended test passes for both, just try to start the array and then upload new diagnostics. Got it. Quote Link to comment
trurl Posted July 18, 2022 Share Posted July 18, 2022 8 minutes ago, Adrian said: extended test Will take many hours, similar to parity check since those drives are the size of parity. Quote Link to comment
itimpi Posted July 19, 2022 Share Posted July 19, 2022 9 hours ago, Adrian said: ok, so if the extended test passes for both, just try to start the array and then upload new diagnostics. This is what I would recommend. No reason not to be running the extended test on both drives in parallel as the test is completely internal to the drive. The process for rebuilding the drives is covered here inthe online documentations accessible via the ‘Manual’ link at the bottom of the GUI but it would be a good idea to wait until you have gotten feedback on the diagnostics after the extended tests before going ahead with that. Quote Link to comment
Adrian Posted July 19, 2022 Author Share Posted July 19, 2022 11 hours ago, itimpi said: This is what I would recommend. No reason not to be running the extended test on both drives in parallel as the test is completely internal to the drive. Good to know for next time, if it ever happens again. Both extended tests completed and it looks like no errors were reported. With both tests completed, I started the array. Attached is the diagnostics file generated after I started the array. mediaserver-diagnostics-20220719-1417.zip Quote Link to comment
trurl Posted July 19, 2022 Share Posted July 19, 2022 Both disks 1,14 disabled/emulated, and both emulated disks are unmountable as you should see on Main. There are disks 16,17,18, but nothing assigned as disk15, is that as it should be? We always recommend repairing the emulated filesystems and checking the results of the repair before rebuilding on top of the same disk. Even better would be to rebuild to spares after repairing the emulated filesystems so you keep the originals as they are as another possible way to recover files. Do you have any spares? Quote Link to comment
Adrian Posted July 19, 2022 Author Share Posted July 19, 2022 1 hour ago, trurl said: Both disks 1,14 disabled/emulated, and both emulated disks are unmountable as you should see on Main. There are disks 16,17,18, but nothing assigned as disk15, is that as it should be? We always recommend repairing the emulated filesystems and checking the results of the repair before rebuilding on top of the same disk. Even better would be to rebuild to spares after repairing the emulated filesystems so you keep the originals as they are as another possible way to recover files. Do you have any spares? Yes, disk 1 and 14 show disabled on Main. Disk 15 isn't used. I do have a physical disk, but just never added to the array. I think I precleared it and then left it there/forgot about it I do have spares. Would I replace both Disk 1 and Disk 14 with the spares at the same time and then rebuild? Quote Link to comment
trurl Posted July 19, 2022 Share Posted July 19, 2022 1 hour ago, trurl said: We always recommend repairing the emulated filesystems and checking the results of the repair before rebuilding You want to rebuild a mountable filesystem whether you are rebuilding on top of the same disk or to spare disks. When you rebuild, you get exactly what the emulated disk has, which currently is unmountable. You could rebuild that unmountable disk, but then you would have to repair the resulting rebuild. When rebuilding to spares, if you repair first, you can check the results of the repair against the contents of the original disks, mounted as Unassigned Devices (after repairing them if necessary). If the original disks are better than the emulated disks, you could put them back into the array and rebuild parity instead. And you get two different versions of the disks you can copy somewhere off the array if you don't have backups for those files So, next step, check filesystem on both disabled/emulated disks. Quote Link to comment
trurl Posted July 19, 2022 Share Posted July 19, 2022 1 hour ago, trurl said: Both disks 1,14 disabled/emulated, and both emulated disks are unmountable as you should see on Main. 14 minutes ago, Adrian said: Yes, disk 1 and 14 show disabled on Main. What I really wanted you to notice is that they were also unmountable. You can't access the files on unmounted disks. Quote Link to comment
trurl Posted July 19, 2022 Share Posted July 19, 2022 8 minutes ago, trurl said: they were also unmountable. You can't access the files on unmounted disks. If the emulated/disabled disks were mountable, you could access their files even though Unraid won't use a disabled disk until it is rebuilt. Its contents are emulated from the parity calculation by reading all other disks. Emulated disks can even be written by updating parity as if the disk had been written. The initial failed write that disabled the disk, and any subsequent writes to the emulated disk can be recovered by rebuilding. 11 minutes ago, trurl said: next step, check filesystem on both disabled/emulated , unmountable disks. Quote Link to comment
Adrian Posted July 20, 2022 Author Share Posted July 20, 2022 I performed the check filesystem on both disks and this is what it displayed for both drives: Phase 1 - find and verify superblock... bad primary superblock - bad CRC in superblock !!! attempting to find secondary superblock... .found candidate secondary superblock... verified secondary superblock... would write modified primary superblock Primary superblock would have been modified. Cannot proceed further in no_modify mode. Exiting now. Quote Link to comment
ChatNoir Posted July 20, 2022 Share Posted July 20, 2022 4 hours ago, Adrian said: I performed the check filesystem on both disks and this is what it displayed for both drives: Phase 1 - find and verify superblock... bad primary superblock - bad CRC in superblock !!! attempting to find secondary superblock... .found candidate secondary superblock... verified secondary superblock... would write modified primary superblock Primary superblock would have been modified. Cannot proceed further in no_modify mode. Exiting now. Try again without the -n suffix (no_modify). Quote Link to comment
Adrian Posted July 20, 2022 Author Share Posted July 20, 2022 Disk 1 Phase 1 - find and verify superblock... bad primary superblock - bad CRC in superblock !!! attempting to find secondary superblock... .found candidate secondary superblock... verified secondary superblock... writing modified primary superblock sb root inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 128 resetting superblock root inode pointer to 128 sb realtime bitmap inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 129 resetting superblock realtime bitmap inode pointer to 129 sb realtime summary inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 130 resetting superblock realtime summary inode pointer to 130 Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. Disk 14 Phase 1 - find and verify superblock... bad primary superblock - bad CRC in superblock !!! attempting to find secondary superblock... .found candidate secondary superblock... verified secondary superblock... writing modified primary superblock sb root inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 128 resetting superblock root inode pointer to 128 sb realtime bitmap inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 129 resetting superblock realtime bitmap inode pointer to 129 sb realtime summary inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 130 resetting superblock realtime summary inode pointer to 130 Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. Quote Link to comment
itimpi Posted July 20, 2022 Share Posted July 20, 2022 You need to rerun without -n but adding -L. Quote Link to comment
trurl Posted July 20, 2022 Share Posted July 20, 2022 2 hours ago, Adrian said: please attempt a mount of the filesystem before doing this Unraid has already told you the disk is unmountable, so you have to 2 hours ago, itimpi said: rerun without -n but adding -L. Quote Link to comment
Adrian Posted July 20, 2022 Author Share Posted July 20, 2022 Ran it with -L option Disk 1 Phase 1 - find and verify superblock... sb root inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 128 resetting superblock root inode pointer to 128 sb realtime bitmap inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 129 resetting superblock realtime bitmap inode pointer to 129 sb realtime summary inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 130 resetting superblock realtime summary inode pointer to 130 Phase 2 - using internal log - zero log... ALERT: The filesystem has valuable metadata changes in a log which is being destroyed because the -L option was used. - scan filesystem freespace and inode maps... clearing needsrepair flag and regenerating metadata sb_icount 0, counted 63776 sb_ifree 0, counted 179 sb_fdblocks 1952984865, counted 929448093 - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 2 - agno = 4 - agno = 3 - agno = 5 - agno = 6 - agno = 7 - agno = 1 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... Maximum metadata LSN (1:141778) is ahead of log (1:2). Format log to cycle 4. done Disk 14 Phase 1 - find and verify superblock... sb root inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 128 resetting superblock root inode pointer to 128 sb realtime bitmap inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 129 resetting superblock realtime bitmap inode pointer to 129 sb realtime summary inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 130 resetting superblock realtime summary inode pointer to 130 Phase 2 - using internal log - zero log... ALERT: The filesystem has valuable metadata changes in a log which is being destroyed because the -L option was used. - scan filesystem freespace and inode maps... clearing needsrepair flag and regenerating metadata sb_icount 0, counted 14784 sb_ifree 0, counted 254 sb_fdblocks 1952984865, counted 936669596 - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 7 - agno = 4 - agno = 1 - agno = 3 - agno = 6 - agno = 5 - agno = 2 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... disconnected dir inode 11307331946, moving to lost+found Phase 7 - verify and correct link counts... resetting inode 191 nlinks from 2 to 3 Maximum metadata LSN (1:93159) is ahead of log (1:2). Format log to cycle 4. done Quote Link to comment
JonathanM Posted July 20, 2022 Share Posted July 20, 2022 Do the emulated drives mount normally now? Quote Link to comment
Adrian Posted July 20, 2022 Author Share Posted July 20, 2022 (edited) 10 minutes ago, JonathanM said: Do the emulated drives mount normally now? I think so. Still shows disabled\emulated. But I can access them through their direct share. One of the disks has a lost+found folder which I assume is from the repair? So would I next set these aside and rebuild onto new drives and then I can compare the rebuilt drives to the repaired ones? Edited July 20, 2022 by Adrian Quote Link to comment
trurl Posted July 20, 2022 Share Posted July 20, 2022 10 minutes ago, Adrian said: One of the disks has a lost+found folder which I assume is from the repair? How much is in there? Post new diagnostics Quote Link to comment
trurl Posted July 20, 2022 Share Posted July 20, 2022 12 minutes ago, Adrian said: So would I next set these aside and rebuild onto new drives and then I can compare the rebuilt drives to the repaired ones? A little confusing I know. What you have repaired is the emulated drives, and that is what would be rebuilt onto new drives. It remains to be seen whether the original drives need repair or not before you can compare them to the rebuilds. Quote Link to comment
itimpi Posted July 20, 2022 Share Posted July 20, 2022 12 minutes ago, Adrian said: So would I next set these aside and rebuild onto new drives and then I can compare the rebuilt drives to the repaired ones? Since the drives were disabled it is the EMULATED drive that got repaired - not the physical drive. All the rebuild process does is make the physical drive being rebuilt match the emulated ones Quote Link to comment
Adrian Posted July 20, 2022 Author Share Posted July 20, 2022 (edited) 15 minutes ago, trurl said: How much is in there? Post new diagnostics Just 1 folder that has 3 recent files from 7/13/2022. So what's next? mediaserver-diagnostics-20220720-1214.zip Edited July 20, 2022 by Adrian Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.