Made some mistakes upgrading a drive (SOLVED)

bobobeastie · October 8, 2019

In the beginning, I precleared an 8tb drive that I was replacing in my main server, so that I could put it in my secondary server, to replace a 2tb drive I was getting errors on. After preclearing was done I turned the secondary server off and replaced the drive, 2tb for 8tb. I changed slot 1 to the new drive and started the array. Then it asked to format the drive, and this is where I figured I did something wrong. I think I said format during parity rebuild, then I got spooked, so I stopped and tried parity rebuild again without formatting, which was also probably wrong. The rebuild finished and the "new" drive was still listed as unformated.

After that, I have put the old 2tb drive back in slot 1, and tried a new configuration, nothing had been written to the array but I was going to resync parity from the data drives just in case, better than losing 2tb. When trying that, it is showing disk 9 as unformated/no fs, disk 9 is 8tb, so if I had to choose, I'd lose 2tb. I double checked that the drive I removed was the "new" previously precleared drive from my main server. The serial from slot 8 is a different brand and the one I removed matches the serial from a spreadsheet I keep for my main array for locating drives.

Hopefully I'm still in a place where I can keep all my data, but after stumbling around I could really use some help on how to do that, please. Diagnostics are attached, for the boot it was generated from, I have tried new config and am getting that drive 9 is not formatted.

Extra stuff that might not matter: I think some of my confusion comes from the secondary server being encrypted, which my main one isn't, maybe replacing drives is different in this case. I was going to move SAS cards around, but the 16e card I bought for my new server started smoking when I started it up, so I couldn't use one of the 8i cards from that that I was going to put in my secondary server, but moving things around seems to have at least temporarily eased the errors.

tower-diagnostics-20191007-1806.zip

Edited November 7, 2019 by bobobeastie
marking solved

bobobeastie · October 8, 2019

I started in maintenance mode and ran xfs reapir, I didn't see anything useful, but I'm not sure what to look for, attached is what it reported with -vl mode. Starting the array after that had no positive results.

flags vl log.txt

JorgeB · October 8, 2019

Please post output of:

blkid

bobobeastie · October 8, 2019

Thanks, I'm attaching the results of blkid in maintenance mode and array started "normally", just in case. I see that sdi is not getting a PARTUUID, sdi is the drive that I attempted to replace, but is currently the old drive while I attempt to build parity so that I can try upgrading the drive again without losing data.

started in maint mode.txt started normally.txt

JorgeB · October 8, 2019

sdj (disk9) is not mounting because it has the same UUID as sdi (disk1), you can't mount both at the same time.

bobobeastie · October 8, 2019

Assuming that means that I can't change/fix the UUID, should I be able to put sdi in a spare bay on my main server and mount with unassigned devices? That drive is encrypted xfs, will unasigned devices be okay with this and prompt me for the key?

If the above works, I can then replace sdi with the drive I was trying to replace it with originally, assuming it has a different UUID, and build parity, then when done add back the data from the 2tb sdi.

...just found a post from you about duplicate UUID's, and you said that "xfs_admin -U generate /dev/sdi1" can be run. Should I try that?

JorgeB · October 8, 2019

6 hours ago, bobobeastie said:

Assuming that means that I can't change/fix the UUID,

You can change it, but they being identical can't be a coincidence, disks will be a clone of each other, likely from some confusion during the upgrade.

You can change using the md device to maintain parity:

xfs_admin -U generate /dev/mdX

replace X with disk number.

bobobeastie · October 8, 2019

Thanks, so it looks like I have 2 options, fix the UUID issue then try to rebuild the parity drive, I'm ok with not trusting it, or take the 2tb drive I want to replace out, mount it with unasigned devices on another unraid system, put a precleared 8tb drive in system in question, and use new configuration and build parity, then copy contents of 2tb drive to array once settled.

If there is no danged in changing UUID's, then I might as well try that first while I wait for the 8tb drive to preclear. I also plan to check if I can mount the 2tb xfs encrypted drive before doing anything risky.

Is that a safe plan?

JorgeB · October 8, 2019

46 minutes ago, bobobeastie said:

If there is no danged in changing UUID's,

Should be safe, several users did it before without issue, array need to be in maintenance mode.

bobobeastie · October 8, 2019

This is what I get when I try that:

root@Tower:~# xfs_admin -U generate /dev/md9
xfs_admin: /dev/md9 is not a valid XFS filesystem (unexpected SB magic number 0x4c554b53)
Use -F to force a read attempt.

JorgeB · October 8, 2019

1 minute ago, bobobeastie said:

This is what I get when I try that:

root@Tower:~# xfs_admin -U generate /dev/md9
xfs_admin: /dev/md9 is not a valid XFS filesystem (unexpected SB magic number 0x4c554b53)
Use -F to force a read attempt.

Forgot you have an encrypted array, it should then be:

xfs_admin -U generate /dev/mapper/md9

bobobeastie · October 8, 2019

GREAT! Thank you very much for all your help, that worked and parity is being rebuilt, I will then follow this: https://wiki.unraid.net/Replacing_a_Data_Drive after the rebuild is done and the replacement drive gets precleared.

bobobeastie · October 19, 2019

Ugh, everything was fine since the 8th, then today I noticed a bunch of sub folders were missing on one of the shares, then I saw errors on 3-4 drives, rebooted, then there was an error on one of the drives, then I ran blkid, and two of the drives have the same uuid's again, so I ran the generate command, array wasn't fixed after that, or after a reboot.

tower-diagnostics-20191019-0107.zip

JorgeB · October 19, 2019

Check filesystem on disk8, ideally you should have done that before rebuilding, assuming you're rebuild on top of the old disk.

bobobeastie · October 19, 2019

Thanks, -nv resulted in this:

Phase 1 - find and verify superblock...
bad primary superblock - bad CRC in superblock !!!

attempting to find secondary superblock...
.found candidate secondary superblock...
verified secondary superblock...
would write modified primary superblock
Primary superblock would have been modified.
Cannot proceed further in no_modify mode.
Exiting now.

and -v resulted in this:

Phase 1 - find and verify superblock... bad primary superblock - bad CRC in superblock !!! attempting to find secondary superblock... .found candidate secondary superblock... verified secondary superblock... writing modified primary superblock - block cache size set to 722176 entries Phase 2 - using internal log - zero log... zero_log: head block 517937 tail block 517933 ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this.

JorgeB · October 19, 2019

Use -L

bobobeastie · October 19, 2019

That worked, I changed the uuid again, and was able to start a rebuild. I'm concerned that the free space in the drive being rebuilt is showing as way more than it should be, and there is a lost+found dir with half a terabyte of files in it, which I understand is a result of checking the disk, but I thought all of that would just make the drive ready to be mounted and overwritten with the data from a parity sync?

JorgeB · October 20, 2019

18 hours ago, bobobeastie said:

I changed the uuid again

The problem wasn't UUID related, it was a corrupt filesystem.

18 hours ago, bobobeastie said:

I'm concerned that the free space in the drive being rebuilt is showing as way more than it should be, and there is a lost+found dir with half a terabyte of files in it

Like mentioned you should never have started rebuilding on top of the old disk before fixing the filesystem and confirming it was indeed fixed with no data loss, so you'd have the option to resync parity instead.

bobobeastie · October 20, 2019

I thought I checked the file system of a couple of drives when I had these issues earlier this month. The uuids had become duplicated again, so it was my understanding that it had to be fixed.

Is my data lost? I stopped the parity sync to the data drive at 11%, which probably doesn't matter. Does fixing file system errors change parity in maintenance mode? If so I guess that explains my situation.

JorgeB · October 21, 2019

18 hours ago, bobobeastie said:

The uuids had become duplicated again

UUID can't get duplicated on its own, or by coincidence.

18 hours ago, bobobeastie said:

Is my data lost? I stopped the parity sync to the data drive at 11%, which probably doesn't matter. Does fixing file system errors change parity in maintenance mode? If so I guess that explains my situation.

Probably yes, fixing filesystem updates parity, you should never rebuild an umountable filesystem on top of the old disk.

bobobeastie · October 25, 2019

I waited to continue with this issue until I had a replacement for the 4 port pcie SATA and 2 port sata cards that I was using on top of my 5 onboard SATA ports, and I received a 16i LSI SAS controller for my main server, so that I could take one of the 8i SAS controllers for this computer, which I haven't had a single issue with.

SO using one of the 8i controllers, actually I've tried both, I get a yellow exclamation mark saying "device contents emulated" on Disk8/sdh, which is the drive that was having issues in my last round of posts, and Disk9/sdk has a red X and says "Device is disabled, contents emulated". THe xfs_repair status for both is listed as "Not Available". Even though it's not available, just in case it helps, here's what happens when I run a -n check:

Disk 8:

Phase 1 - find and verify superblock... superblock read failed, offset 0, size 524288, ag 0, rval -1 fatal error -- Input/output error

Disk 9:

Phase 1 - find and verify superblock... superblock read failed, offset 0, size 524288, ag 0, rval -1 fatal error -- Input/output error

I'm guessing that doesn't mean anything because it's disabled. This was after moving to SAS to SATA breakout cables vs individual SATA cables on a SATA card. I tried both SAS cards and moved to different breakout SATA cables, and the issue did not move with the cables.

Just in case it helps, here's the blkid output, sdi and sdk have duplicate UUID's:

/dev/loop0: TYPE="squashfs"
/dev/loop1: TYPE="squashfs"
/dev/sda1: LABEL_FATBOOT="UNRAID" LABEL="UNRAID" UUID="2732-64F5" TYPE="vfat" PARTUUID="a3760dfe-01"
/dev/nvme0n1p1: UUID="f69130dd-7800-43c1-8fe6-1409cc4d3060" TYPE="crypto_LUKS"
/dev/sdb1: UUID="59de781d-b037-441a-b7eb-c918e2ed2d49" TYPE="crypto_LUKS" PARTUUID="ce24456f-79a7-425b-926b-908c829c8719"
/dev/sdc1: UUID="ffb9c825-16f5-49bb-9225-58b349c15524" TYPE="crypto_LUKS" PARTUUID="62e353df-95ac-41a2-98e6-aaec2b37913d"
/dev/sdd1: UUID="74cc9054-0ad6-4c5a-b17e-ffa174b8816a" TYPE="crypto_LUKS" PARTUUID="b6ae2bc8-aad2-489f-bb4f-b354371d9511"
/dev/sde1: UUID="36fb8f2a-7833-4dfc-8838-af76fb89a733" TYPE="crypto_LUKS" PARTUUID="2f100d44-a1b4-4e39-94ca-388105318d81"
/dev/sdf1: UUID="f6804684-df06-42ab-9afc-ec4277c848f2" TYPE="crypto_LUKS" PARTUUID="51acfce5-61fb-4788-961a-2b34b6115fa6"
/dev/sdh1: UUID="afc0186b-5d48-4888-bdcc-99e3c17af950" TYPE="crypto_LUKS" PARTUUID="f407d887-e020-47bf-bba4-3d024c26844d"
/dev/sdi1: UUID="a5241765-df2b-4966-ba7f-38fed7ae6d58" TYPE="crypto_LUKS" PARTUUID="a74c7aaa-ff16-4885-bad4-1aab9a3b39ce"
/dev/sdj1: UUID="023c43b4-cff7-45b8-bc7c-df5e85630455" TYPE="crypto_LUKS" PARTUUID="a45371be-e202-4a81-9604-ffc1d7591bc5"
/dev/sdk1: UUID="023c43b4-cff7-45b8-bc7c-df5e85630455" TYPE="crypto_LUKS" PARTUUID="e4acd90a-fcfd-4b45-a68a-1bc496acd051"
/dev/sdl1: UUID="d41db265-f644-4ad9-9c8e-78e38673af04" TYPE="crypto_LUKS" PARTUUID="31a34a62-a0f0-45d1-97f2-3d103dab2d76"
/dev/md1: UUID="023c43b4-cff7-45b8-bc7c-df5e85630455" TYPE="crypto_LUKS"
/dev/md2: UUID="d41db265-f644-4ad9-9c8e-78e38673af04" TYPE="crypto_LUKS"
/dev/md3: UUID="a5241765-df2b-4966-ba7f-38fed7ae6d58" TYPE="crypto_LUKS"
/dev/md4: UUID="59de781d-b037-441a-b7eb-c918e2ed2d49" TYPE="crypto_LUKS"
/dev/md5: UUID="ffb9c825-16f5-49bb-9225-58b349c15524" TYPE="crypto_LUKS"
/dev/md6: UUID="36fb8f2a-7833-4dfc-8838-af76fb89a733" TYPE="crypto_LUKS"
/dev/md7: UUID="f6804684-df06-42ab-9afc-ec4277c848f2" TYPE="crypto_LUKS"
/dev/mapper/md1: UUID="d1c0645c-cf5b-4589-bd2f-6dccc0f99467" TYPE="xfs"
/dev/mapper/md2: UUID="b83db605-8817-4174-9db9-b7e43e533179" TYPE="xfs"
/dev/mapper/md3: UUID="db2b3d1c-513a-4b32-bb55-5ca4df663303" TYPE="xfs"
/dev/mapper/md4: UUID="f17d514e-699f-4939-b22e-83ee770c67d7" TYPE="xfs"
/dev/mapper/md5: UUID="0a7c834d-88fc-4318-85c5-a69a7449f1dc" TYPE="xfs"
/dev/mapper/md6: UUID="3aea003c-7173-4efb-bfec-a775d9ebe4cf" TYPE="xfs"
/dev/mapper/md7: UUID="af81136a-8131-4341-b705-f6c50638961f" TYPE="xfs"
/dev/mapper/nvme0n1p1: UUID="0ee7ecd1-bff0-43c7-b1e7-def11ff953c3" UUID_SUB="229084e1-41a2-4fbc-ab3e-e7a73d2c48d4" TYPE="btrfs"
/dev/nvme0n1: PTTYPE="dos"
/dev/sdg1: UUID="b88n:m?f-7ldi-4;>5-c:nl-o=6j?n<ccec0" TYPE="crypto_LUKS" PARTUUID="14cef639-350a-4daf-bfc0-ee5239c0ec62"

What should my next step be? Both disks passed 4 rounds of pre-clears before using them on this server, so I'm thinking this issue is caused by some SATA card or cable issue from before my SAS card upgrade.

tower-diagnostics-20191024-1925.zip

JorgeB · October 25, 2019

Unraid can't emulate 2 disks with single parity, assuming the disks themselves are good and the invalid one wasn't overwritten with a rebuild the best bet is doing a new config and then resync parity.

bobobeastie · October 25, 2019

I know I have single parity and what that entails, so I was not expecting and parity sync to work as is, I forgot to state that. It is confusing to me that two drives would be listed as emulated in a single parity array. What I'm asking is, is there anything I can do to fix either drive, preferably 9, as 8 was 11% in to a parity sync, such that I can minimize lost data. I think Disk9 might have shown up as normal after installing the "new" SAS card, but I think I ran an xfs repair, which maybe caused this issue? Based on that I'm guessing this isn't a hardware issue, but if needed I could temporarily go back to 4+2 SATA cards.

JorgeB · October 25, 2019

5 minutes ago, bobobeastie said:

It is confusing to me that two drives would be listed as emulated in a single parity array.

This can happen if one drive fails while another is being rebuilt, don't know if this could be improved by @limetech, to have a similar behavior as when multiple drives fail at the same time, in that case Unraid will only disable one drive if there's single parity, and 2 max with dual parity.

13 minutes ago, bobobeastie said:

What I'm asking is, is there anything I can do to fix either drive, preferably 9, as 8 was 11% in to a parity sync, such that I can minimize lost data.

You can try the invalid slot command, I'll post instructions in a few minutes.

JorgeB · October 25, 2019

-Tools -> New Config -> Retain current configuration: All -> Apply
-Assign any missing disk(s), I would recommended if possible using a new disk for disk8, since old one can still have some data if needed, though likely not all.
-Important - After checking the assignments leave the browser on that page, the "Main" page.

-Open an SSH session/use the console and type (don't copy/paste directly from the forum, as sometimes it can insert extra characters):

mdcmd set invalidslot 8 29

-Back on the GUI and without refreshing the page, just start the array, do not check the "parity is already valid" box (GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the invalid slot command, but they won't be as long as the procedure was correctly done), disk8 will start rebuilding, disk should mount immediately but if it's unmountable don't format, wait for the rebuild to finish and then run a filesystem check.

P.S. Since array is encrypted you'll be asked for a new key, just enter old one.

Made some mistakes upgrading a drive (SOLVED)

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation