2 drives red


Go to solution Solved by trurl,

Recommended Posts

So something happened and 2 drives are marked red. I think one is actually dead and the other is fine. I'm running an extended SmartTest on one of them and then the other. I have dual parity.

 

What's the process to "recover" from the two red drives if 1 or both are bad? Do I simply replace the bad drive(s) and rebuild? I've never done this with more than 1 drive.

Link to comment

Both disks look fine, neither have completed extended test yet. You will probably have to disable spindown on the disks to get that to complete.

15 minutes ago, Adrian said:

captured after I rebooted

Didn't notice anything in current syslog, can't say what happened earlier of course.

 

And can't tell whether any drives are unmountable since you haven't started the array. Do that and post new diagnostics.

Link to comment
7 minutes ago, trurl said:

Both disks look fine, neither have completed extended test yet. You will probably have to disable spindown on the disks to get that to complete.

Didn't notice anything in current syslog, can't say what happened earlier of course.

 

And can't tell whether any drives are unmountable since you haven't started the array. Do that and post new diagnostics.

 

Yea that's what I'm hoping. I'm currently running an extended test on one of the drives. Probably take another 6-8 hours to complete. When that's done, I'll try the other drive. And yup, I disabled the spindown.

ok, so if the extended test passes for both, just try to start the array and then upload new diagnostics. Got it.

Link to comment
9 hours ago, Adrian said:

ok, so if the extended test passes for both, just try to start the array and then upload new diagnostics.

This is what I would recommend.  No reason not to be running the extended test on both drives in parallel as the test is completely internal to the drive.
 

The process for rebuilding the drives is covered here inthe online documentations accessible via the ‘Manual’ link at the bottom of the GUI but it would be a good idea to wait until you have gotten feedback on the diagnostics after the extended tests before going ahead with that.

Link to comment
11 hours ago, itimpi said:

This is what I would recommend.  No reason not to be running the extended test on both drives in parallel as the test is completely internal to the drive.

 

Good to know for next time, if it ever happens again.

 

Both extended tests completed and it looks like no errors were reported.

 

With both tests completed, I started the array. Attached is the diagnostics file generated after I started the array.

mediaserver-diagnostics-20220719-1417.zip

Link to comment

Both disks 1,14 disabled/emulated, and both emulated disks are unmountable as you should see on Main.

 

There are disks 16,17,18, but nothing assigned as disk15, is that as it should be?

 

We always recommend repairing the emulated filesystems and checking the results of the repair before rebuilding on top of the same disk. Even better would be to rebuild to spares after repairing the emulated filesystems so you keep the originals as they are as another possible way to recover files.

 

Do you have any spares?

Link to comment
1 hour ago, trurl said:

Both disks 1,14 disabled/emulated, and both emulated disks are unmountable as you should see on Main.

 

There are disks 16,17,18, but nothing assigned as disk15, is that as it should be?

 

We always recommend repairing the emulated filesystems and checking the results of the repair before rebuilding on top of the same disk. Even better would be to rebuild to spares after repairing the emulated filesystems so you keep the originals as they are as another possible way to recover files.

 

Do you have any spares?

 

Yes, disk 1 and 14 show disabled on Main.

 

Disk 15 isn't used. I do have a physical disk, but just never added to the array. I think I precleared it and then left it there/forgot about it :)

 

I do have spares. Would I replace both Disk 1 and Disk 14 with the spares at the same time and then rebuild?

 

Link to comment
1 hour ago, trurl said:

We always recommend repairing the emulated filesystems and checking the results of the repair before rebuilding

You want to rebuild a mountable filesystem whether you are rebuilding on top of the same disk or to spare disks. When you rebuild, you get exactly what the emulated disk has, which currently is unmountable. You could rebuild that unmountable disk, but then you would have to repair the resulting rebuild.

 

When rebuilding to spares, if you repair first, you can check the results of the repair against the contents of the original disks, mounted as Unassigned Devices (after repairing them if necessary). If the original disks are better than the emulated disks, you could put them back into the array and rebuild parity instead.

 

And you get two different versions of the disks you can copy somewhere off the array if you don't have backups for those files

 

So, next step, check filesystem on both disabled/emulated disks.

Link to comment
1 hour ago, trurl said:

Both disks 1,14 disabled/emulated, and both emulated disks are unmountable as you should see on Main.

14 minutes ago, Adrian said:

Yes, disk 1 and 14 show disabled on Main.

What I really wanted you to notice is that they were also unmountable. You can't access the files on unmounted disks.

Link to comment
8 minutes ago, trurl said:

they were also unmountable. You can't access the files on unmounted disks.

If the emulated/disabled disks were mountable, you could access their files even though Unraid won't use a disabled disk until it is rebuilt. Its contents are emulated from the parity calculation by reading all other disks. Emulated disks can even be written by updating parity as if the disk had been written. The initial failed write that disabled the disk, and any subsequent writes to the emulated disk can be recovered by rebuilding.

 

11 minutes ago, trurl said:

next step, check filesystem on both disabled/emulated

, unmountable disks.

Link to comment

I performed the check filesystem on both disks and this is what it displayed for both drives:

 

Phase 1 - find and verify superblock... bad primary superblock - bad CRC in superblock !!! attempting to find secondary superblock... .found candidate secondary superblock... verified secondary superblock... would write modified primary superblock Primary superblock would have been modified. Cannot proceed further in no_modify mode. Exiting now.

Link to comment
4 hours ago, Adrian said:

I performed the check filesystem on both disks and this is what it displayed for both drives:

 

Phase 1 - find and verify superblock... bad primary superblock - bad CRC in superblock !!! attempting to find secondary superblock... .found candidate secondary superblock... verified secondary superblock... would write modified primary superblock Primary superblock would have been modified. Cannot proceed further in no_modify mode. Exiting now.

Try again without the -n suffix (no_modify).

Link to comment

Disk 1

 

Phase 1 - find and verify superblock... bad primary superblock - bad CRC in superblock !!! attempting to find secondary superblock... .found candidate secondary superblock... verified secondary superblock... writing modified primary superblock sb root inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 128 resetting superblock root inode pointer to 128 sb realtime bitmap inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 129 resetting superblock realtime bitmap inode pointer to 129 sb realtime summary inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 130 resetting superblock realtime summary inode pointer to 130 Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this.

 

 

 

Disk 14

 

Phase 1 - find and verify superblock... bad primary superblock - bad CRC in superblock !!! attempting to find secondary superblock... .found candidate secondary superblock... verified secondary superblock... writing modified primary superblock sb root inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 128 resetting superblock root inode pointer to 128 sb realtime bitmap inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 129 resetting superblock realtime bitmap inode pointer to 129 sb realtime summary inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 130 resetting superblock realtime summary inode pointer to 130 Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this.

Link to comment

Ran it with -L option

Disk 1

Phase 1 - find and verify superblock...
sb root inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 128
resetting superblock root inode pointer to 128
sb realtime bitmap inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 129
resetting superblock realtime bitmap inode pointer to 129
sb realtime summary inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 130
resetting superblock realtime summary inode pointer to 130
Phase 2 - using internal log
        - zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
destroyed because the -L option was used.
        - scan filesystem freespace and inode maps...
clearing needsrepair flag and regenerating metadata
sb_icount 0, counted 63776
sb_ifree 0, counted 179
sb_fdblocks 1952984865, counted 929448093
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 2
        - agno = 4
        - agno = 3
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 1
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Maximum metadata LSN (1:141778) is ahead of log (1:2).
Format log to cycle 4.
done

 

Disk 14

 

Phase 1 - find and verify superblock...
sb root inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 128
resetting superblock root inode pointer to 128
sb realtime bitmap inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 129
resetting superblock realtime bitmap inode pointer to 129
sb realtime summary inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 130
resetting superblock realtime summary inode pointer to 130
Phase 2 - using internal log
        - zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
destroyed because the -L option was used.
        - scan filesystem freespace and inode maps...
clearing needsrepair flag and regenerating metadata
sb_icount 0, counted 14784
sb_ifree 0, counted 254
sb_fdblocks 1952984865, counted 936669596
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 7
        - agno = 4
        - agno = 1
        - agno = 3
        - agno = 6
        - agno = 5
        - agno = 2
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
disconnected dir inode 11307331946, moving to lost+found
Phase 7 - verify and correct link counts...
resetting inode 191 nlinks from 2 to 3
Maximum metadata LSN (1:93159) is ahead of log (1:2).
Format log to cycle 4.
done

Link to comment
10 minutes ago, JonathanM said:

Do the emulated drives mount normally now?

 

I think so. Still shows disabled\emulated. But I can access them through their direct share.

One of the disks has a lost+found folder which I assume is from the repair?

 

So would I next set these aside and rebuild onto new drives and then I can compare the rebuilt drives to the repaired ones?

Edited by Adrian
Link to comment
12 minutes ago, Adrian said:

So would I next set these aside and rebuild onto new drives and then I can compare the rebuilt drives to the repaired ones?

A little confusing I know. What you have repaired is the emulated drives, and that is what would be rebuilt onto new drives.

 

It remains to be seen whether the original drives need repair or not before you can compare them to the rebuilds.

Link to comment
12 minutes ago, Adrian said:

So would I next set these aside and rebuild onto new drives and then I can compare the rebuilt drives to the repaired ones?

Since the drives were disabled it is the EMULATED drive that got repaired - not the physical drive.   All the rebuild process does is make the physical drive being rebuilt match the emulated ones

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.