FastAd Posted February 19 Share Posted February 19 I've just had my first failed drive in Unraid. I have 2 10TB parity drives and 7 data drives. I've been looking to upgrade my parity drives for a larger size recently as I'm nearly at max capacity and now one of my data drives has just failed. I was planning to just replace the 2 parity drives with larger drives and then move the current 10TB parity drives to data drives. Whats the best way to upgrade my parity drives and replace the failed data drive with one of the old parity drives? If I was to get 3 new larger drives to replace the 2 parity drives and 1 data drive, can it rebuild the data drive as it will be larger than the current parity drives? What's the best way to fix this safely? Quote Link to comment
trurl Posted February 19 Share Posted February 19 Parity swap might be what you need. But please attach diagnostics to your NEXT post in this thread and wait for further advice. Quote Link to comment
FastAd Posted February 19 Author Share Posted February 19 Thanks for your reply trurl. I've attached the diagnostics to this post. chaos-diagnostics-20240219-1523.zip Quote Link to comment
trurl Posted February 19 Share Posted February 19 If some of those Unassigned Devices are permanent, you might consider making them pools instead. Quote Link to comment
trurl Posted February 19 Share Posted February 19 Looks like disk5 disconnected and is now an Unassigned Device. SMART for that disk P86N looks OK. Emulated disk5 is mounted and shows plenty of data. I assume that is the disk you are referring to as failed, but looks like a connection problem and not a disk problem. Check connections on disk5, SATA and power, both ends, including splitters. Then post new diagnostics. Quote Link to comment
FastAd Posted February 20 Author Share Posted February 20 (edited) Yes it is disk5. I've checked all cables and even tried swapping the cables with another drive and the drive is still disabled. I noticed on boot up it was pausing on the bios screen with "Please backup your data and replace your hard disk drive. A failure may be imminent and cause unpredictable fail.". The drive is showing up in the bios also. After starting the array, on disk5 it now says "Unmountable: Unsupported or no file system" chaos-diagnostics-20240220-1046.zip Edited February 20 by FastAd Quote Link to comment
JorgeB Posted February 20 Share Posted February 20 SMART for disk5 looks fine, the waring during boot is likely from a different disk, check SMART status for all devices on the dashboard, also check filesystem on the emulated disk5, run it without -n. Quote Link to comment
trurl Posted February 20 Share Posted February 20 3 hours ago, FastAd said: "Unmountable: Unsupported or no file system" The way you fix that is 1 hour ago, JorgeB said: check filesystem on the emulated disk5 Capture the output and post it. 3 hours ago, FastAd said: the drive is still disabled The way you fix that is by rebuilding, but you don't want to rebuild until you have fixed the filesystem. Quote Link to comment
FastAd Posted February 20 Author Share Posted February 20 File system check is running :- Phase 1 - find and verify superblock... couldn't verify primary superblock - not enough secondary superblocks with matching geometry !!! attempting to find secondary superblock... I guess it runs through the whole disk like a parity check? Quote Link to comment
trurl Posted February 21 Share Posted February 21 Usually it produces results pretty quickly. If it has to go looking for superblocks it might take a while. Any progress? Quote Link to comment
FastAd Posted February 21 Author Share Posted February 21 Reading of the disks has stopped. Under disk5 it has the following :- Phase 1 - find and verify superblock... couldn't verify primary superblock - not enough secondary superblocks with matching geometry !!! attempting to find secondary superblock... ....<Lots of dots here>...........Sorry, could not find valid secondary superblock Exiting now. I'm guessing its dead dead. If SMART looked ok do I reformat the drive and rebuild it or is it best to replace it? I've just had 2 new 18TB drives arrive for my parity drive upgrade. Quote Link to comment
JorgeB Posted February 21 Share Posted February 21 See if the actual disk mounts with UD, stop the array, unassign disk5, see if it mounts. Quote Link to comment
FastAd Posted February 21 Author Share Posted February 21 Tried to mount in UD. Get the error "Device '/dev/sdo' failed to mount. Check the syslog for details." The end of my system log has the follwing :- Feb 21 15:15:00 Chaos unassigned.devices: Mounting partition 'sdo1' at mountpoint '/mnt/disks/2SG9P86N'... Feb 21 15:15:00 Chaos unassigned.devices: Mount cmd: /sbin/mount -t 'xfs' -o rw,relatime '/dev/sdo1' '/mnt/disks/2SG9P86N' Feb 21 15:15:00 Chaos kernel: XFS (sdo1): Mounting V5 Filesystem Feb 21 15:15:00 Chaos kernel: XFS (sdo1): Corruption warning: Metadata has LSN (2:2970628) ahead of current LSN (2:2967213). Please unmount and run xfs_repair (>= v4.3) to resolve. Feb 21 15:15:00 Chaos kernel: XFS (sdo1): Metadata corruption detected at xfs_agf_verify+0x64/0x1e2 [xfs], xfs_agf block 0xfffffff1 Feb 21 15:15:00 Chaos kernel: XFS (sdo1): Unmount and run xfs_repair Feb 21 15:15:00 Chaos kernel: XFS (sdo1): First 128 bytes of corrupted metadata buffer: Feb 21 15:15:00 Chaos kernel: 00000000: 58 41 47 46 00 00 00 01 00 00 00 02 0f ff ff ff XAGF............ Feb 21 15:15:00 Chaos kernel: 00000010: 00 00 00 01 00 00 00 02 00 00 00 00 00 00 00 01 ................ Feb 21 15:15:00 Chaos kernel: 00000020: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 03 ................ Feb 21 15:15:00 Chaos kernel: 00000030: 00 00 00 04 04 6c 14 8e 02 d3 73 5f 00 00 00 00 .....l....s_.... Feb 21 15:15:00 Chaos kernel: 00000040: 43 33 50 50 19 85 4e 40 b4 d9 d3 a4 1a 0e 71 82 [email protected]. Feb 21 15:15:00 Chaos kernel: 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Feb 21 15:15:00 Chaos kernel: 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Feb 21 15:15:00 Chaos kernel: 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Feb 21 15:15:00 Chaos kernel: XFS (sdo1): metadata I/O error in "xfs_read_agf+0xd7/0x116 [xfs]" at daddr 0xfffffff1 len 1 error 117 Feb 21 15:15:00 Chaos kernel: XFS (sdo1): Error -117 reserving per-AG metadata reserve pool. Feb 21 15:15:00 Chaos kernel: XFS (sdo1): Corruption of in-memory data (0x8) detected at xfs_fs_reserve_ag_blocks+0xa7/0xb8 [xfs] (fs/xfs/xfs_fsops.c:575). Shutting down filesystem. Feb 21 15:15:00 Chaos kernel: XFS (sdo1): Please unmount the filesystem and rectify the problem(s) Feb 21 15:15:00 Chaos kernel: XFS (sdo1): Ending clean mount Feb 21 15:15:00 Chaos kernel: XFS (sdo1): Error -5 reserving per-AG metadata reserve pool. Feb 21 15:15:01 Chaos unassigned.devices: Mount of 'sdo1' failed: 'mount: /mnt/disks/2SG9P86N: can't read superblock on /dev/sdo1. dmesg(1) may have more information after failed mount system call. ' Feb 21 15:15:01 Chaos unassigned.devices: Partition '2SG9P86N' cannot be mounted. Quote Link to comment
JorgeB Posted February 21 Share Posted February 21 Check filesystem on that disk using the CLI: xfs_repair -v /dev/sdo1 Quote Link to comment
FastAd Posted February 21 Author Share Posted February 21 Done some things, quite a few of the following:- imap claims a free inode 3524047782 is in use, correcting imap and clearing inode cleared inode 3524047782 and:- clearing inode number in entry at offset 784... XFS_REPAIR Summary Wed Feb 21 16:15:50 2024 Phase Start End Duration Phase 1: 02/21 16:07:45 02/21 16:07:46 1 second Phase 2: 02/21 16:07:46 02/21 16:07:48 2 seconds Phase 3: 02/21 16:07:48 02/21 16:12:04 4 minutes, 16 seconds Phase 4: 02/21 16:12:04 02/21 16:12:04 Phase 5: 02/21 16:12:04 02/21 16:12:04 Phase 6: 02/21 16:12:04 02/21 16:15:22 3 minutes, 18 seconds Phase 7: 02/21 16:15:22 02/21 16:15:22 Total run time: 7 minutes, 37 seconds done I tried to mount the drive in UD and it did mount. Should I try and add the drive back into the array? Quote Link to comment
JorgeB Posted February 21 Share Posted February 21 8 minutes ago, FastAd said: Should I try and add the drive back into the array? If you add it it will be rebuild on top or just cleared (erased), you can do a new config and re-sync parity, or rebuild to a spare disk and then copy the data back, any doubts please ask. Quote Link to comment
FastAd Posted February 21 Author Share Posted February 21 How do I add the drive back and rebuild it from parity. Do I need to format it first? Quote Link to comment
JorgeB Posted February 21 Share Posted February 21 11 minutes ago, FastAd said: How do I add the drive back and rebuild it from parity. Only do that if you have a spare, if you rebuild from parity on top of the old disk, the result will be an unmountable disk and delete all data from the actual disk. Quote Link to comment
FastAd Posted February 21 Author Share Posted February 21 Can I not use that drive in the array again? Quote Link to comment
JorgeB Posted February 21 Share Posted February 21 1 hour ago, JorgeB said: you can do a new config and re-sync parity If all drives are healthy you can do a new config with the old disk ( Tools - New Config) and then, instead of rebuilding the disk, you use it's data, since it's now mounting with UD, and resync parity instead. Quote Link to comment
trurl Posted February 21 Share Posted February 21 1 hour ago, FastAd said: Do I need to format it first? Format is NEVER part of rebuild. Never format a disk that has data you want to keep. https://docs.unraid.net/unraid-os/manual/storage-management/#reset-the-array-configuration Quote Link to comment
FastAd Posted February 21 Author Share Posted February 21 If I did new config, it would write new parity data. If the same data drive fails again while writing new parity data won't the data on that drive be lost? Would it be safer to wipe the data disk that failed and is now working and rebuild from the current parity to that disk? Sorry for all the questions but want to make sure I'm doing it right. Quote Link to comment
trurl Posted February 21 Share Posted February 21 2 hours ago, FastAd said: Would it be safer to wipe the data disk that failed and is now working and rebuild from the current parity to that disk? Your talk of format and wipe are very scary. Why do you want to wipe anything? Rebuild will completely overwrite the entire disk with the data of the emulated disk, which comes from the parity calculation by reading all other disks. https://docs.unraid.net/unraid-os/manual/what-is-unraid/#parity-protected-array Quote Link to comment
trurl Posted February 21 Share Posted February 21 4 hours ago, JorgeB said: if you rebuild from parity on top of the old disk, the result will be an unmountable disk Quote Link to comment
trurl Posted February 21 Share Posted February 21 We have already determined that rebuilding the disk from parity will result in an unmountable disk, since the emulated disk is unmountable. The fact that the emulated disk is unmountable means parity can only rebuild an unmountable disk, since the emulated disk is exactly what will be written during rebuild. So current contents of parity can't help get your data back. But, we have also determined that the physical disk is mountable and has its data. What we are proposing is to New Config that physical disk back into the array so its data is back in the array and it can be used to rebuild parity. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.