Made some mistakes upgrading a drive (SOLVED)


Recommended Posts

In the beginning, I precleared an 8tb drive that I was replacing in my main server, so that I could put it in my secondary server, to replace a 2tb drive I was getting errors on.  After preclearing was done I turned the secondary server off and replaced the drive, 2tb for 8tb.  I changed slot 1 to the new drive and started the array.  Then it asked to format the drive, and this is where I figured I did something wrong.  I think I said format during parity rebuild, then I got spooked, so I stopped and tried parity rebuild again without formatting, which was also probably wrong.  The rebuild finished and the "new" drive was still listed as unformated.

 

After that, I have put the old 2tb drive back in slot 1, and tried a new configuration, nothing had been written to the array but I was going to resync parity from the data drives just in case, better than losing 2tb.  When trying that, it is showing disk 9 as unformated/no fs, disk 9 is 8tb, so if I had to choose, I'd lose 2tb.  I double checked that the drive I removed was the "new" previously precleared drive from my main server.  The serial from slot 8 is a different brand and the one I removed matches the serial from a spreadsheet I keep for my main array for locating drives. 

 

Hopefully I'm still in a place where I can keep all my data, but after stumbling around I could really use some help on how to do that, please.  Diagnostics are attached, for the boot it was generated from, I have tried new config and am getting that drive 9 is not formatted.

 

Extra stuff that might not matter: I think some of my confusion comes from the secondary server being encrypted, which my main one isn't, maybe replacing drives is different in this case.  I was going to move SAS cards around, but the 16e card I bought for my new server started smoking when I started it up, so I couldn't use one of the 8i cards from that that I was going to put in my secondary server, but moving things around seems to have at least temporarily eased the errors.

 

 

tower-diagnostics-20191007-1806.zip

Edited by bobobeastie
marking solved
Link to comment

Assuming that means that I can't change/fix the UUID, should I be able to put sdi in a spare bay on my main server and mount with unassigned devices?  That drive is encrypted xfs, will unasigned devices be okay with this and prompt me for the key?  

 

If the above works, I can then replace sdi with the drive I was trying to replace it with originally, assuming it has a different UUID, and build parity, then when done add back the data from the 2tb sdi.

 

...just found a post from you about duplicate UUID's, and you said that "xfs_admin -U generate /dev/sdi1" can be run.  Should I try that?

Link to comment
6 hours ago, bobobeastie said:

Assuming that means that I can't change/fix the UUID,

You can change it, but they being identical can't be a coincidence, disks will be a clone of each other, likely from some confusion during the upgrade.

 

You can change using the md device to maintain parity:

 

xfs_admin -U generate /dev/mdX

 

replace X with disk number.

Link to comment

Thanks, so it looks like I have 2 options, fix the UUID issue then try to rebuild the parity drive, I'm ok with not trusting it, or take the 2tb drive I want to replace out, mount it with unasigned devices on another unraid system, put  a precleared 8tb drive in system in question, and use new configuration and build parity, then copy contents of 2tb drive to array once settled.

 

If there is no danged in changing UUID's, then I might as well try that first while I wait for the 8tb drive to preclear.  I also plan to check if I can mount the 2tb xfs encrypted drive before doing anything risky.

 

Is that a safe plan?

Link to comment
1 minute ago, bobobeastie said:

This is what I get when I try that:

 

root@Tower:~# xfs_admin -U generate /dev/md9
xfs_admin: /dev/md9 is not a valid XFS filesystem (unexpected SB magic number 0x4c554b53)
Use -F to force a read attempt.

Forgot you have an encrypted array, it should then be:

 

xfs_admin -U generate /dev/mapper/md9

 

Link to comment
  • 2 weeks later...

Ugh, everything was fine since the 8th, then today I noticed a bunch of sub folders were missing on one of the shares, then I saw errors on 3-4 drives, rebooted, then there was an error on one of the drives, then I ran blkid, and two of the drives have the same uuid's again, so I ran the generate command, array wasn't fixed after that, or after a reboot.

tower-diagnostics-20191019-0107.zip

Link to comment

Thanks, -nv resulted in this:


Phase 1 - find and verify superblock...
bad primary superblock - bad CRC in superblock !!!

attempting to find secondary superblock...
.found candidate secondary superblock...
verified secondary superblock...
would write modified primary superblock
Primary superblock would have been modified.
Cannot proceed further in no_modify mode.
Exiting now.

 

 

and -v resulted in this:

 

Phase 1 - find and verify superblock... bad primary superblock - bad CRC in superblock !!! attempting to find secondary superblock... .found candidate secondary superblock... verified secondary superblock... writing modified primary superblock - block cache size set to 722176 entries Phase 2 - using internal log - zero log... zero_log: head block 517937 tail block 517933 ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this.

Link to comment

That worked, I changed the uuid again, and was able to start a rebuild.  I'm concerned that the free space in the drive being rebuilt is showing as way more than it should be, and there is a lost+found dir with half a terabyte of files in it, which I understand is a result of checking the disk, but I thought all of that would just make the drive ready to be mounted and overwritten with the data from a parity sync?

Link to comment
18 hours ago, bobobeastie said:

I changed the uuid again

The problem wasn't UUID related, it was a corrupt filesystem.

 

18 hours ago, bobobeastie said:

I'm concerned that the free space in the drive being rebuilt is showing as way more than it should be, and there is a lost+found dir with half a terabyte of files in it

Like mentioned you should never have started rebuilding on top of the old disk before fixing the filesystem and confirming it was indeed fixed with no data loss, so you'd have the option to resync parity instead.

Link to comment

I thought I checked the file system of a couple of drives when I had these issues earlier this month. The uuids had become duplicated again, so it was my understanding that it had to be fixed.

 

Is my data lost? I stopped the parity sync to the data drive at 11%, which probably doesn't matter. Does fixing file system errors change parity in maintenance mode? If so I guess that explains my situation.

Link to comment
18 hours ago, bobobeastie said:

The uuids had become duplicated again

UUID can't get duplicated on its own, or by coincidence.

 

18 hours ago, bobobeastie said:

Is my data lost? I stopped the parity sync to the data drive at 11%, which probably doesn't matter. Does fixing file system errors change parity in maintenance mode? If so I guess that explains my situation.

Probably yes, fixing filesystem updates parity, you should never rebuild an umountable filesystem on top of the old disk.

Link to comment

I waited to continue with this issue until I had a replacement for the 4 port pcie SATA and 2 port sata cards that I was using on top of my 5 onboard SATA ports, and I received a 16i LSI SAS controller for my main server, so that I could take one of the 8i SAS controllers for this computer, which I haven't had a single issue with.

 

SO using one of the 8i controllers, actually I've tried both, I get a yellow exclamation mark saying "device contents emulated" on Disk8/sdh, which is the drive that was having issues in my last round of posts, and Disk9/sdk has a red X and says "Device is disabled, contents emulated".  THe xfs_repair status for both is listed as "Not Available".  Even though it's not available, just in case it helps, here's what happens when I run a -n check:

 

Disk 8:

 

Phase 1 - find and verify superblock... superblock read failed, offset 0, size 524288, ag 0, rval -1 fatal error -- Input/output error

Disk 9:

 

Phase 1 - find and verify superblock... superblock read failed, offset 0, size 524288, ag 0, rval -1 fatal error -- Input/output error

 

I'm guessing that doesn't mean anything because it's disabled.  This was after moving to SAS to SATA breakout cables vs individual SATA cables on a SATA card. I tried both SAS cards and moved to different breakout SATA cables, and the issue did not move with the cables.

 

Just in case it helps, here's the blkid output, sdi and sdk have duplicate UUID's:

/dev/loop0: TYPE="squashfs"
/dev/loop1: TYPE="squashfs"
/dev/sda1: LABEL_FATBOOT="UNRAID" LABEL="UNRAID" UUID="2732-64F5" TYPE="vfat" PARTUUID="a3760dfe-01"
/dev/nvme0n1p1: UUID="f69130dd-7800-43c1-8fe6-1409cc4d3060" TYPE="crypto_LUKS"
/dev/sdb1: UUID="59de781d-b037-441a-b7eb-c918e2ed2d49" TYPE="crypto_LUKS" PARTUUID="ce24456f-79a7-425b-926b-908c829c8719"
/dev/sdc1: UUID="ffb9c825-16f5-49bb-9225-58b349c15524" TYPE="crypto_LUKS" PARTUUID="62e353df-95ac-41a2-98e6-aaec2b37913d"
/dev/sdd1: UUID="74cc9054-0ad6-4c5a-b17e-ffa174b8816a" TYPE="crypto_LUKS" PARTUUID="b6ae2bc8-aad2-489f-bb4f-b354371d9511"
/dev/sde1: UUID="36fb8f2a-7833-4dfc-8838-af76fb89a733" TYPE="crypto_LUKS" PARTUUID="2f100d44-a1b4-4e39-94ca-388105318d81"
/dev/sdf1: UUID="f6804684-df06-42ab-9afc-ec4277c848f2" TYPE="crypto_LUKS" PARTUUID="51acfce5-61fb-4788-961a-2b34b6115fa6"
/dev/sdh1: UUID="afc0186b-5d48-4888-bdcc-99e3c17af950" TYPE="crypto_LUKS" PARTUUID="f407d887-e020-47bf-bba4-3d024c26844d"
/dev/sdi1: UUID="a5241765-df2b-4966-ba7f-38fed7ae6d58" TYPE="crypto_LUKS" PARTUUID="a74c7aaa-ff16-4885-bad4-1aab9a3b39ce"
/dev/sdj1: UUID="023c43b4-cff7-45b8-bc7c-df5e85630455" TYPE="crypto_LUKS" PARTUUID="a45371be-e202-4a81-9604-ffc1d7591bc5"
/dev/sdk1: UUID="023c43b4-cff7-45b8-bc7c-df5e85630455" TYPE="crypto_LUKS" PARTUUID="e4acd90a-fcfd-4b45-a68a-1bc496acd051"
/dev/sdl1: UUID="d41db265-f644-4ad9-9c8e-78e38673af04" TYPE="crypto_LUKS" PARTUUID="31a34a62-a0f0-45d1-97f2-3d103dab2d76"
/dev/md1: UUID="023c43b4-cff7-45b8-bc7c-df5e85630455" TYPE="crypto_LUKS"
/dev/md2: UUID="d41db265-f644-4ad9-9c8e-78e38673af04" TYPE="crypto_LUKS"
/dev/md3: UUID="a5241765-df2b-4966-ba7f-38fed7ae6d58" TYPE="crypto_LUKS"
/dev/md4: UUID="59de781d-b037-441a-b7eb-c918e2ed2d49" TYPE="crypto_LUKS"
/dev/md5: UUID="ffb9c825-16f5-49bb-9225-58b349c15524" TYPE="crypto_LUKS"
/dev/md6: UUID="36fb8f2a-7833-4dfc-8838-af76fb89a733" TYPE="crypto_LUKS"
/dev/md7: UUID="f6804684-df06-42ab-9afc-ec4277c848f2" TYPE="crypto_LUKS"
/dev/mapper/md1: UUID="d1c0645c-cf5b-4589-bd2f-6dccc0f99467" TYPE="xfs"
/dev/mapper/md2: UUID="b83db605-8817-4174-9db9-b7e43e533179" TYPE="xfs"
/dev/mapper/md3: UUID="db2b3d1c-513a-4b32-bb55-5ca4df663303" TYPE="xfs"
/dev/mapper/md4: UUID="f17d514e-699f-4939-b22e-83ee770c67d7" TYPE="xfs"
/dev/mapper/md5: UUID="0a7c834d-88fc-4318-85c5-a69a7449f1dc" TYPE="xfs"
/dev/mapper/md6: UUID="3aea003c-7173-4efb-bfec-a775d9ebe4cf" TYPE="xfs"
/dev/mapper/md7: UUID="af81136a-8131-4341-b705-f6c50638961f" TYPE="xfs"
/dev/mapper/nvme0n1p1: UUID="0ee7ecd1-bff0-43c7-b1e7-def11ff953c3" UUID_SUB="229084e1-41a2-4fbc-ab3e-e7a73d2c48d4" TYPE="btrfs"
/dev/nvme0n1: PTTYPE="dos"
/dev/sdg1: UUID="b88n:m?f-7ldi-4;>5-c:nl-o=6j?n<ccec0" TYPE="crypto_LUKS" PARTUUID="14cef639-350a-4daf-bfc0-ee5239c0ec62"

 

What should my next step be?  Both disks passed 4 rounds of pre-clears before using them on this server, so I'm thinking this issue is caused by some SATA card or cable issue from before my SAS card upgrade.

tower-diagnostics-20191024-1925.zip

Link to comment

I know I have single parity and what that entails, so I was not expecting and parity sync to work as is, I forgot to state that. It is confusing to me that two drives would be listed as emulated in a single parity array.  What I'm asking is, is there anything I can do to fix either drive, preferably 9, as 8 was 11% in to a parity sync, such that I can minimize lost data.  I think Disk9 might have shown up as normal after installing the "new" SAS card, but I think I ran an xfs repair, which maybe caused this issue?  Based on that I'm guessing this isn't a hardware issue, but if needed I could temporarily go back to 4+2 SATA cards.

Link to comment
5 minutes ago, bobobeastie said:

It is confusing to me that two drives would be listed as emulated in a single parity array.

This can happen if one drive fails while another is being rebuilt, don't know if this could be improved by @limetech, to have a similar behavior as when multiple drives fail at the same time, in that case Unraid will only disable one drive if there's single parity, and 2 max with dual parity.

 

13 minutes ago, bobobeastie said:

What I'm asking is, is there anything I can do to fix either drive, preferably 9, as 8 was 11% in to a parity sync, such that I can minimize lost data. 

You can try the invalid slot command, I'll post instructions in a few minutes.

Link to comment

-Tools -> New Config -> Retain current configuration: All -> Apply
-Assign any missing disk(s), I would recommended if possible using a new disk for disk8, since old one can still have some data if needed, though likely not all.
-Important - After checking the assignments leave the browser on that page, the "Main" page.

-Open an SSH session/use the console and type (don't copy/paste directly from the forum, as sometimes it can insert extra characters):

mdcmd set invalidslot 8 29

-Back on the GUI and without refreshing the page, just start the array, do not check the "parity is already valid" box (GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the invalid slot command, but they won't be as long as the procedure was correctly done), disk8 will start rebuilding, disk should mount immediately but if it's unmountable don't format, wait for the rebuild to finish and then run a filesystem check.

 

P.S. Since array is encrypted you'll be asked for a new key, just enter old one.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.