2 drives red

Adrian · July 18, 2022

So something happened and 2 drives are marked red. I think one is actually dead and the other is fine. I'm running an extended SmartTest on one of them and then the other. I have dual parity.

What's the process to "recover" from the two red drives if 1 or both are bad? Do I simply replace the bad drive(s) and rebuild? I've never done this with more than 1 drive.

trurl · July 18, 2022

attach diagnostics to your NEXT post in this thread

Adrian · July 18, 2022

mediaserver-diagnostics-20220718-1723.zip

FYI, this is unfortunately captured after I rebooted the server.

trurl · July 18, 2022

Both disks look fine, neither have completed extended test yet. You will probably have to disable spindown on the disks to get that to complete.

15 minutes ago, Adrian said:

captured after I rebooted

Didn't notice anything in current syslog, can't say what happened earlier of course.

And can't tell whether any drives are unmountable since you haven't started the array. Do that and post new diagnostics.

Adrian · July 18, 2022

7 minutes ago, trurl said:

Both disks look fine, neither have completed extended test yet. You will probably have to disable spindown on the disks to get that to complete.

Didn't notice anything in current syslog, can't say what happened earlier of course.

And can't tell whether any drives are unmountable since you haven't started the array. Do that and post new diagnostics.

Yea that's what I'm hoping. I'm currently running an extended test on one of the drives. Probably take another 6-8 hours to complete. When that's done, I'll try the other drive. And yup, I disabled the spindown.

ok, so if the extended test passes for both, just try to start the array and then upload new diagnostics. Got it.

trurl · July 18, 2022

8 minutes ago, Adrian said:

extended test

Will take many hours, similar to parity check since those drives are the size of parity.

itimpi · July 19, 2022

9 hours ago, Adrian said:

ok, so if the extended test passes for both, just try to start the array and then upload new diagnostics.

This is what I would recommend. No reason not to be running the extended test on both drives in parallel as the test is completely internal to the drive.

The process for rebuilding the drives is covered here inthe online documentations accessible via the ‘Manual’ link at the bottom of the GUI but it would be a good idea to wait until you have gotten feedback on the diagnostics after the extended tests before going ahead with that.

Adrian · July 19, 2022

11 hours ago, itimpi said:

This is what I would recommend. No reason not to be running the extended test on both drives in parallel as the test is completely internal to the drive.

Good to know for next time, if it ever happens again.

Both extended tests completed and it looks like no errors were reported.

With both tests completed, I started the array. Attached is the diagnostics file generated after I started the array.

mediaserver-diagnostics-20220719-1417.zip

trurl · July 19, 2022

Both disks 1,14 disabled/emulated, and both emulated disks are unmountable as you should see on Main.

There are disks 16,17,18, but nothing assigned as disk15, is that as it should be?

We always recommend repairing the emulated filesystems and checking the results of the repair before rebuilding on top of the same disk. Even better would be to rebuild to spares after repairing the emulated filesystems so you keep the originals as they are as another possible way to recover files.

Do you have any spares?

Adrian · July 19, 2022

1 hour ago, trurl said:

Both disks 1,14 disabled/emulated, and both emulated disks are unmountable as you should see on Main.

There are disks 16,17,18, but nothing assigned as disk15, is that as it should be?

We always recommend repairing the emulated filesystems and checking the results of the repair before rebuilding on top of the same disk. Even better would be to rebuild to spares after repairing the emulated filesystems so you keep the originals as they are as another possible way to recover files.

Do you have any spares?

Yes, disk 1 and 14 show disabled on Main.

Disk 15 isn't used. I do have a physical disk, but just never added to the array. I think I precleared it and then left it there/forgot about it

I do have spares. Would I replace both Disk 1 and Disk 14 with the spares at the same time and then rebuild?

trurl · July 19, 2022

1 hour ago, trurl said:

We always recommend repairing the emulated filesystems and checking the results of the repair before rebuilding

You want to rebuild a mountable filesystem whether you are rebuilding on top of the same disk or to spare disks. When you rebuild, you get exactly what the emulated disk has, which currently is unmountable. You could rebuild that unmountable disk, but then you would have to repair the resulting rebuild.

When rebuilding to spares, if you repair first, you can check the results of the repair against the contents of the original disks, mounted as Unassigned Devices (after repairing them if necessary). If the original disks are better than the emulated disks, you could put them back into the array and rebuild parity instead.

And you get two different versions of the disks you can copy somewhere off the array if you don't have backups for those files

So, next step, check filesystem on both disabled/emulated disks.

trurl · July 19, 2022

1 hour ago, trurl said:

Both disks 1,14 disabled/emulated, and both emulated disks are unmountable as you should see on Main.

14 minutes ago, Adrian said:

Yes, disk 1 and 14 show disabled on Main.

What I really wanted you to notice is that they were also unmountable. You can't access the files on unmounted disks.

trurl · July 19, 2022

8 minutes ago, trurl said:

they were also unmountable. You can't access the files on unmounted disks.

If the emulated/disabled disks were mountable, you could access their files even though Unraid won't use a disabled disk until it is rebuilt. Its contents are emulated from the parity calculation by reading all other disks. Emulated disks can even be written by updating parity as if the disk had been written. The initial failed write that disabled the disk, and any subsequent writes to the emulated disk can be recovered by rebuilding.

11 minutes ago, trurl said:

next step, check filesystem on both disabled/emulated

, unmountable disks.

Adrian · July 20, 2022

I performed the check filesystem on both disks and this is what it displayed for both drives:

Phase 1 - find and verify superblock... bad primary superblock - bad CRC in superblock !!! attempting to find secondary superblock... .found candidate secondary superblock... verified secondary superblock... would write modified primary superblock Primary superblock would have been modified. Cannot proceed further in no_modify mode. Exiting now.

ChatNoir · July 20, 2022

4 hours ago, Adrian said:

I performed the check filesystem on both disks and this is what it displayed for both drives:

Phase 1 - find and verify superblock... bad primary superblock - bad CRC in superblock !!! attempting to find secondary superblock... .found candidate secondary superblock... verified secondary superblock... would write modified primary superblock Primary superblock would have been modified. Cannot proceed further in no_modify mode. Exiting now.

Try again without the -n suffix (no_modify).

Adrian · July 20, 2022

Disk 1

Phase 1 - find and verify superblock... bad primary superblock - bad CRC in superblock !!! attempting to find secondary superblock... .found candidate secondary superblock... verified secondary superblock... writing modified primary superblock sb root inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 128 resetting superblock root inode pointer to 128 sb realtime bitmap inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 129 resetting superblock realtime bitmap inode pointer to 129 sb realtime summary inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 130 resetting superblock realtime summary inode pointer to 130 Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this.

Disk 14

Phase 1 - find and verify superblock... bad primary superblock - bad CRC in superblock !!! attempting to find secondary superblock... .found candidate secondary superblock... verified secondary superblock... writing modified primary superblock sb root inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 128 resetting superblock root inode pointer to 128 sb realtime bitmap inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 129 resetting superblock realtime bitmap inode pointer to 129 sb realtime summary inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 130 resetting superblock realtime summary inode pointer to 130 Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this.

itimpi · July 20, 2022

You need to rerun without -n but adding -L.

trurl · July 20, 2022

2 hours ago, Adrian said:

please attempt a mount of the filesystem before doing this

Unraid has already told you the disk is unmountable, so you have to

2 hours ago, itimpi said:

rerun without -n but adding -L.

Adrian · July 20, 2022

Ran it with -L option

Disk 1

Phase 1 - find and verify superblock...
sb root inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 128
resetting superblock root inode pointer to 128
sb realtime bitmap inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 129
resetting superblock realtime bitmap inode pointer to 129
sb realtime summary inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 130
resetting superblock realtime summary inode pointer to 130
Phase 2 - using internal log
- zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
destroyed because the -L option was used.
- scan filesystem freespace and inode maps...
clearing needsrepair flag and regenerating metadata
sb_icount 0, counted 63776
sb_ifree 0, counted 179
sb_fdblocks 1952984865, counted 929448093
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 2
- agno = 4
- agno = 3
- agno = 5
- agno = 6
- agno = 7
- agno = 1
Phase 5 - rebuild AG headers and trees...
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Maximum metadata LSN (1:141778) is ahead of log (1:2).
Format log to cycle 4.
done

Disk 14

Phase 1 - find and verify superblock...
sb root inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 128
resetting superblock root inode pointer to 128
sb realtime bitmap inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 129
resetting superblock realtime bitmap inode pointer to 129
sb realtime summary inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 130
resetting superblock realtime summary inode pointer to 130
Phase 2 - using internal log
- zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
destroyed because the -L option was used.
- scan filesystem freespace and inode maps...
clearing needsrepair flag and regenerating metadata
sb_icount 0, counted 14784
sb_ifree 0, counted 254
sb_fdblocks 1952984865, counted 936669596
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 7
- agno = 4
- agno = 1
- agno = 3
- agno = 6
- agno = 5
- agno = 2
Phase 5 - rebuild AG headers and trees...
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
disconnected dir inode 11307331946, moving to lost+found
Phase 7 - verify and correct link counts...
resetting inode 191 nlinks from 2 to 3
Maximum metadata LSN (1:93159) is ahead of log (1:2).
Format log to cycle 4.
done

JonathanM · July 20, 2022

Do the emulated drives mount normally now?

Adrian · July 20, 2022

10 minutes ago, JonathanM said:

Do the emulated drives mount normally now?

I think so. Still shows disabled\emulated. But I can access them through their direct share.

One of the disks has a lost+found folder which I assume is from the repair?

So would I next set these aside and rebuild onto new drives and then I can compare the rebuilt drives to the repaired ones?

Edited July 20, 2022 by Adrian

trurl · July 20, 2022

10 minutes ago, Adrian said:

One of the disks has a lost+found folder which I assume is from the repair?

How much is in there?

Post new diagnostics

trurl · July 20, 2022

12 minutes ago, Adrian said:

So would I next set these aside and rebuild onto new drives and then I can compare the rebuilt drives to the repaired ones?

A little confusing I know. What you have repaired is the emulated drives, and that is what would be rebuilt onto new drives.

It remains to be seen whether the original drives need repair or not before you can compare them to the rebuilds.

itimpi · July 20, 2022

12 minutes ago, Adrian said:

So would I next set these aside and rebuild onto new drives and then I can compare the rebuilt drives to the repaired ones?

Since the drives were disabled it is the EMULATED drive that got repaired - not the physical drive. All the rebuild process does is make the physical drive being rebuilt match the emulated ones

Adrian · July 20, 2022

15 minutes ago, trurl said:

How much is in there?

Post new diagnostics

Just 1 folder that has 3 recent files from 7/13/2022.

So what's next?

mediaserver-diagnostics-20220720-1214.zip

Edited July 20, 2022 by Adrian

2 drives red

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation