Jump to content

Failed drive and parity upgrade


Recommended Posts

I've just had my first failed drive in Unraid.  I have 2 10TB parity drives and 7 data drives. I've been looking to upgrade my parity drives for a larger size recently as I'm nearly at max capacity and now one of my data drives has just failed. 

 

I was planning to just replace the 2 parity drives with larger drives and then move the current 10TB parity drives to data drives.  Whats the best way to upgrade my parity drives and replace the failed data drive with one of the old parity drives?

 

If I was to get 3 new larger drives to replace the 2 parity drives and 1 data drive, can it rebuild the data drive as it will be larger than the current parity drives?

 

What's the best way to fix this safely?

Link to comment

Looks like disk5 disconnected and is now an Unassigned Device.

 

SMART for that disk P86N looks OK. Emulated disk5 is mounted and shows plenty of data.

 

I assume that is the disk you are referring to as failed, but looks like a connection problem and not a disk problem.

 

Check connections on disk5, SATA and power, both ends, including splitters.

 

Then post new diagnostics.

Link to comment

Yes it is disk5. I've checked all cables and even tried swapping the cables with another drive and the drive is still disabled.

 

I noticed on boot up it was pausing on the bios screen with "Please backup your data and replace your hard disk drive. A failure may be imminent and cause unpredictable fail.".

 

The drive is showing up in the bios also.

 

After starting the array, on disk5 it now says "Unmountable: Unsupported or no file system"

chaos-diagnostics-20240220-1046.zip

Edited by FastAd
Link to comment
3 hours ago, FastAd said:

"Unmountable: Unsupported or no file system"

The way you fix that is

1 hour ago, JorgeB said:

check filesystem on the emulated disk5

Capture the output and post it.

 

3 hours ago, FastAd said:

the drive is still disabled

The way you fix that is by rebuilding, but you don't want to rebuild until you have fixed the filesystem.

Link to comment

File system check is running :-

 

Phase 1 - find and verify superblock... couldn't verify primary superblock - not enough secondary superblocks with matching geometry !!! attempting to find secondary superblock...

 

I guess it runs through the whole disk like a parity check?

Link to comment

Reading of the disks has stopped.

 

Under disk5 it has the following :-

 

Phase 1 - find and verify superblock...
couldn't verify primary superblock - not enough secondary superblocks with matching geometry !!!

attempting to find secondary superblock...
....<Lots of dots here>...........Sorry, could not find valid secondary superblock
Exiting now.

 

I'm guessing its dead dead.  If SMART looked ok do I reformat the drive and rebuild it or is it best to replace it?

 

I've just had 2 new 18TB drives arrive for my parity drive upgrade.

 

Link to comment

Tried to mount in UD.  Get the error "Device '/dev/sdo' failed to mount. Check the syslog for details."

 

The end of my system log has the follwing :-

 

Feb 21 15:15:00 Chaos unassigned.devices: Mounting partition 'sdo1' at mountpoint '/mnt/disks/2SG9P86N'... Feb 21 15:15:00 Chaos unassigned.devices: Mount cmd: /sbin/mount -t 'xfs' -o rw,relatime '/dev/sdo1' '/mnt/disks/2SG9P86N' Feb 21 15:15:00 Chaos kernel: XFS (sdo1): Mounting V5 Filesystem Feb 21 15:15:00 Chaos kernel: XFS (sdo1): Corruption warning: Metadata has LSN (2:2970628) ahead of current LSN (2:2967213). Please unmount and run xfs_repair (>= v4.3) to resolve. Feb 21 15:15:00 Chaos kernel: XFS (sdo1): Metadata corruption detected at xfs_agf_verify+0x64/0x1e2 [xfs], xfs_agf block 0xfffffff1 Feb 21 15:15:00 Chaos kernel: XFS (sdo1): Unmount and run xfs_repair Feb 21 15:15:00 Chaos kernel: XFS (sdo1): First 128 bytes of corrupted metadata buffer: Feb 21 15:15:00 Chaos kernel: 00000000: 58 41 47 46 00 00 00 01 00 00 00 02 0f ff ff ff XAGF............ Feb 21 15:15:00 Chaos kernel: 00000010: 00 00 00 01 00 00 00 02 00 00 00 00 00 00 00 01 ................ Feb 21 15:15:00 Chaos kernel: 00000020: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 03 ................ Feb 21 15:15:00 Chaos kernel: 00000030: 00 00 00 04 04 6c 14 8e 02 d3 73 5f 00 00 00 00 .....l....s_.... Feb 21 15:15:00 Chaos kernel: 00000040: 43 33 50 50 19 85 4e 40 b4 d9 d3 a4 1a 0e 71 82 [email protected]. Feb 21 15:15:00 Chaos kernel: 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Feb 21 15:15:00 Chaos kernel: 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Feb 21 15:15:00 Chaos kernel: 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Feb 21 15:15:00 Chaos kernel: XFS (sdo1): metadata I/O error in "xfs_read_agf+0xd7/0x116 [xfs]" at daddr 0xfffffff1 len 1 error 117 Feb 21 15:15:00 Chaos kernel: XFS (sdo1): Error -117 reserving per-AG metadata reserve pool. Feb 21 15:15:00 Chaos kernel: XFS (sdo1): Corruption of in-memory data (0x8) detected at xfs_fs_reserve_ag_blocks+0xa7/0xb8 [xfs] (fs/xfs/xfs_fsops.c:575). Shutting down filesystem. Feb 21 15:15:00 Chaos kernel: XFS (sdo1): Please unmount the filesystem and rectify the problem(s) Feb 21 15:15:00 Chaos kernel: XFS (sdo1): Ending clean mount Feb 21 15:15:00 Chaos kernel: XFS (sdo1): Error -5 reserving per-AG metadata reserve pool. Feb 21 15:15:01 Chaos unassigned.devices: Mount of 'sdo1' failed: 'mount: /mnt/disks/2SG9P86N: can't read superblock on /dev/sdo1. dmesg(1) may have more information after failed mount system call. ' Feb 21 15:15:01 Chaos unassigned.devices: Partition '2SG9P86N' cannot be mounted.

 

Link to comment

Done some things, quite a few of the following:-

 

imap claims a free inode 3524047782 is in use, correcting imap and clearing inode
cleared inode 3524047782

 

and:-

 

clearing inode number in entry at offset 784...

 

XFS_REPAIR Summary    Wed Feb 21 16:15:50 2024

Phase           Start           End             Duration
Phase 1:        02/21 16:07:45  02/21 16:07:46  1 second
Phase 2:        02/21 16:07:46  02/21 16:07:48  2 seconds
Phase 3:        02/21 16:07:48  02/21 16:12:04  4 minutes, 16 seconds
Phase 4:        02/21 16:12:04  02/21 16:12:04
Phase 5:        02/21 16:12:04  02/21 16:12:04
Phase 6:        02/21 16:12:04  02/21 16:15:22  3 minutes, 18 seconds
Phase 7:        02/21 16:15:22  02/21 16:15:22

Total run time: 7 minutes, 37 seconds
done

 

I tried to mount the drive in UD and it did mount.  Should I try and add the drive back into the array?

Link to comment
1 hour ago, JorgeB said:

you can do a new config and re-sync parity

 

If all drives are healthy you can do a new config with the old disk ( Tools - New Config) and then, instead of rebuilding the disk, you use it's data, since it's now mounting with UD, and resync parity instead.

Link to comment

If I did new config, it would write new parity data.

 

If the same data drive fails again while writing new parity data won't the data on that drive be lost?

 

Would it be safer to wipe the data disk that failed and is now working and rebuild from the current parity to that disk?

 

Sorry for all the questions but want to make sure I'm doing it right.

Link to comment
2 hours ago, FastAd said:

Would it be safer to wipe the data disk that failed and is now working and rebuild from the current parity to that disk?

Your talk of format and wipe are very scary.

 

Why do you want to wipe anything? Rebuild will completely overwrite the entire disk with the data of the emulated disk, which comes from the parity calculation by reading all other disks.

 

https://docs.unraid.net/unraid-os/manual/what-is-unraid/#parity-protected-array

Link to comment

We have already determined that rebuilding the disk from parity will result in an unmountable disk, since the emulated disk is unmountable.

 

The fact that the emulated disk is unmountable means parity can only rebuild an unmountable disk, since the emulated disk is exactly what will be written during rebuild. So current contents of parity can't help get your data back.

 

But, we have also determined that the physical disk is mountable and has its data.

 

What we are proposing is to New Config that physical disk back into the array so its data is back in the array and it can be used to rebuild parity.

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...