Failed drive and parity upgrade

FastAd · February 19

I've just had my first failed drive in Unraid. I have 2 10TB parity drives and 7 data drives. I've been looking to upgrade my parity drives for a larger size recently as I'm nearly at max capacity and now one of my data drives has just failed.

I was planning to just replace the 2 parity drives with larger drives and then move the current 10TB parity drives to data drives. Whats the best way to upgrade my parity drives and replace the failed data drive with one of the old parity drives?

If I was to get 3 new larger drives to replace the 2 parity drives and 1 data drive, can it rebuild the data drive as it will be larger than the current parity drives?

What's the best way to fix this safely?

trurl · February 19

Parity swap might be what you need.

But please attach diagnostics to your NEXT post in this thread and wait for further advice.

FastAd · February 19

Thanks for your reply trurl. I've attached the diagnostics to this post.

chaos-diagnostics-20240219-1523.zip

trurl · February 19

If some of those Unassigned Devices are permanent, you might consider making them pools instead.

trurl · February 19

Looks like disk5 disconnected and is now an Unassigned Device.

SMART for that disk P86N looks OK. Emulated disk5 is mounted and shows plenty of data.

I assume that is the disk you are referring to as failed, but looks like a connection problem and not a disk problem.

Check connections on disk5, SATA and power, both ends, including splitters.

Then post new diagnostics.

FastAd · February 20

Yes it is disk5. I've checked all cables and even tried swapping the cables with another drive and the drive is still disabled.

I noticed on boot up it was pausing on the bios screen with "Please backup your data and replace your hard disk drive. A failure may be imminent and cause unpredictable fail.".

The drive is showing up in the bios also.

After starting the array, on disk5 it now says "Unmountable: Unsupported or no file system"

chaos-diagnostics-20240220-1046.zip

Edited February 20 by FastAd

JorgeB · February 20

SMART for disk5 looks fine, the waring during boot is likely from a different disk, check SMART status for all devices on the dashboard, also check filesystem on the emulated disk5, run it without -n.

trurl · February 20

3 hours ago, FastAd said:

"Unmountable: Unsupported or no file system"

The way you fix that is

1 hour ago, JorgeB said:

check filesystem on the emulated disk5

Capture the output and post it.

3 hours ago, FastAd said:

the drive is still disabled

The way you fix that is by rebuilding, but you don't want to rebuild until you have fixed the filesystem.

FastAd · February 20

File system check is running :-

Phase 1 - find and verify superblock... couldn't verify primary superblock - not enough secondary superblocks with matching geometry !!! attempting to find secondary superblock...

I guess it runs through the whole disk like a parity check?

trurl · February 21

Usually it produces results pretty quickly. If it has to go looking for superblocks it might take a while.

Any progress?

FastAd · February 21

Reading of the disks has stopped.

Under disk5 it has the following :-

Phase 1 - find and verify superblock...
couldn't verify primary superblock - not enough secondary superblocks with matching geometry !!!

attempting to find secondary superblock...
....<Lots of dots here>...........Sorry, could not find valid secondary superblock
Exiting now.

I'm guessing its dead dead. If SMART looked ok do I reformat the drive and rebuild it or is it best to replace it?

I've just had 2 new 18TB drives arrive for my parity drive upgrade.

JorgeB · February 21

See if the actual disk mounts with UD, stop the array, unassign disk5, see if it mounts.

FastAd · February 21

Tried to mount in UD. Get the error "Device '/dev/sdo' failed to mount. Check the syslog for details."

The end of my system log has the follwing :-

Feb 21 15:15:00 Chaos unassigned.devices: Mounting partition 'sdo1' at mountpoint '/mnt/disks/2SG9P86N'... Feb 21 15:15:00 Chaos unassigned.devices: Mount cmd: /sbin/mount -t 'xfs' -o rw,relatime '/dev/sdo1' '/mnt/disks/2SG9P86N' Feb 21 15:15:00 Chaos kernel: XFS (sdo1): Mounting V5 Filesystem Feb 21 15:15:00 Chaos kernel: XFS (sdo1): Corruption warning: Metadata has LSN (2:2970628) ahead of current LSN (2:2967213). Please unmount and run xfs_repair (>= v4.3) to resolve. Feb 21 15:15:00 Chaos kernel: XFS (sdo1): Metadata corruption detected at xfs_agf_verify+0x64/0x1e2 [xfs], xfs_agf block 0xfffffff1 Feb 21 15:15:00 Chaos kernel: XFS (sdo1): Unmount and run xfs_repair Feb 21 15:15:00 Chaos kernel: XFS (sdo1): First 128 bytes of corrupted metadata buffer: Feb 21 15:15:00 Chaos kernel: 00000000: 58 41 47 46 00 00 00 01 00 00 00 02 0f ff ff ff XAGF............ Feb 21 15:15:00 Chaos kernel: 00000010: 00 00 00 01 00 00 00 02 00 00 00 00 00 00 00 01 ................ Feb 21 15:15:00 Chaos kernel: 00000020: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 03 ................ Feb 21 15:15:00 Chaos kernel: 00000030: 00 00 00 04 04 6c 14 8e 02 d3 73 5f 00 00 00 00 .....l....s_.... Feb 21 15:15:00 Chaos kernel: 00000040: 43 33 50 50 19 85 4e 40 b4 d9 d3 a4 1a 0e 71 82 [email protected]. Feb 21 15:15:00 Chaos kernel: 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Feb 21 15:15:00 Chaos kernel: 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Feb 21 15:15:00 Chaos kernel: 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Feb 21 15:15:00 Chaos kernel: XFS (sdo1): metadata I/O error in "xfs_read_agf+0xd7/0x116 [xfs]" at daddr 0xfffffff1 len 1 error 117 Feb 21 15:15:00 Chaos kernel: XFS (sdo1): Error -117 reserving per-AG metadata reserve pool. Feb 21 15:15:00 Chaos kernel: XFS (sdo1): Corruption of in-memory data (0x8) detected at xfs_fs_reserve_ag_blocks+0xa7/0xb8 [xfs] (fs/xfs/xfs_fsops.c:575). Shutting down filesystem. Feb 21 15:15:00 Chaos kernel: XFS (sdo1): Please unmount the filesystem and rectify the problem(s) Feb 21 15:15:00 Chaos kernel: XFS (sdo1): Ending clean mount Feb 21 15:15:00 Chaos kernel: XFS (sdo1): Error -5 reserving per-AG metadata reserve pool. Feb 21 15:15:01 Chaos unassigned.devices: Mount of 'sdo1' failed: 'mount: /mnt/disks/2SG9P86N: can't read superblock on /dev/sdo1. dmesg(1) may have more information after failed mount system call. ' Feb 21 15:15:01 Chaos unassigned.devices: Partition '2SG9P86N' cannot be mounted.

JorgeB · February 21

Check filesystem on that disk using the CLI:

xfs_repair -v /dev/sdo1

FastAd · February 21

Done some things, quite a few of the following:-

imap claims a free inode 3524047782 is in use, correcting imap and clearing inode
cleared inode 3524047782

and:-

clearing inode number in entry at offset 784...

XFS_REPAIR Summary Wed Feb 21 16:15:50 2024

Phase           Start           End             Duration
Phase 1:        02/21 16:07:45 02/21 16:07:46 1 second
Phase 2:        02/21 16:07:46 02/21 16:07:48 2 seconds
Phase 3:        02/21 16:07:48 02/21 16:12:04 4 minutes, 16 seconds
Phase 4:        02/21 16:12:04 02/21 16:12:04
Phase 5:        02/21 16:12:04 02/21 16:12:04
Phase 6:        02/21 16:12:04 02/21 16:15:22 3 minutes, 18 seconds
Phase 7:        02/21 16:15:22 02/21 16:15:22

Total run time: 7 minutes, 37 seconds
done

I tried to mount the drive in UD and it did mount. Should I try and add the drive back into the array?

JorgeB · February 21

8 minutes ago, FastAd said:

Should I try and add the drive back into the array?

If you add it it will be rebuild on top or just cleared (erased), you can do a new config and re-sync parity, or rebuild to a spare disk and then copy the data back, any doubts please ask.

FastAd · February 21

How do I add the drive back and rebuild it from parity. Do I need to format it first?

JorgeB · February 21

11 minutes ago, FastAd said:

How do I add the drive back and rebuild it from parity.

Only do that if you have a spare, if you rebuild from parity on top of the old disk, the result will be an unmountable disk and delete all data from the actual disk.

FastAd · February 21

Can I not use that drive in the array again?

JorgeB · February 21

1 hour ago, JorgeB said:

you can do a new config and re-sync parity

If all drives are healthy you can do a new config with the old disk ( Tools - New Config) and then, instead of rebuilding the disk, you use it's data, since it's now mounting with UD, and resync parity instead.

trurl · February 21

1 hour ago, FastAd said:

Do I need to format it first?

Format is NEVER part of rebuild.

Never format a disk that has data you want to keep.

https://docs.unraid.net/unraid-os/manual/storage-management/#reset-the-array-configuration

FastAd · February 21

If I did new config, it would write new parity data.

If the same data drive fails again while writing new parity data won't the data on that drive be lost?

Would it be safer to wipe the data disk that failed and is now working and rebuild from the current parity to that disk?

Sorry for all the questions but want to make sure I'm doing it right.

trurl · February 21

2 hours ago, FastAd said:

Would it be safer to wipe the data disk that failed and is now working and rebuild from the current parity to that disk?

Your talk of format and wipe are very scary.

Why do you want to wipe anything? Rebuild will completely overwrite the entire disk with the data of the emulated disk, which comes from the parity calculation by reading all other disks.

https://docs.unraid.net/unraid-os/manual/what-is-unraid/#parity-protected-array

trurl · February 21

4 hours ago, JorgeB said:

if you rebuild from parity on top of the old disk, the result will be an unmountable disk

trurl · February 21

We have already determined that rebuilding the disk from parity will result in an unmountable disk, since the emulated disk is unmountable.

The fact that the emulated disk is unmountable means parity can only rebuild an unmountable disk, since the emulated disk is exactly what will be written during rebuild. So current contents of parity can't help get your data back.

But, we have also determined that the physical disk is mountable and has its data.

What we are proposing is to New Config that physical disk back into the array so its data is back in the array and it can be used to rebuild parity.

Failed drive and parity upgrade

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation