Made some mistakes upgrading a drive (SOLVED)


Recommended Posts

Great, thank you very much.  I'm going to play it safe and take your advice about using a new drive.  Wen't to bestbuy.com and it let me know a 10tb easystore in my cart had gone down in price, so I'm going to take that as a sign, put that in my main server, replace an 8tb drive, and use that 8tb drive here.

 

I take it this fixes Disk9, and 8 will be rebuilt from parity?

 

Also, is it likely that bad data controllers had anything to do with causing any of these issues?

Edited by bobobeastie
Link to comment
9 hours ago, bobobeastie said:

I take it this fixes Disk9, and 8 will be rebuilt from parity?

It will re-enable disk9 and start rebuilding disk8, success depends on if parity is still valid.

 

9 hours ago, bobobeastie said:

Also, is it likely that bad data controllers had anything to do with causing any of these issues?

Very possibly, especially the Marvell controller you were using, they are known to drop disks without a reason sometimes.

Link to comment

Excellent, thank you, I put the different/"new" drive in as disk 8, followed instructions exactly, and the "new" disk 8 is listed as "Unmountable: No file system", but thanks to your detailed instructions I did not stop the array or select to format the drive..   Once the rebuild is done, and I run a file system check, what will the outcome be?  I assume what is really being checked is the emulated contents based on the other drives and parity, so if that emulated drive is fixed, am I then able to rebuild the drives contents again, or does it magically become mountable in emulation and on the physical drive, and everything goes back to normal?  Or is 8 lost and it's good that I kept the old disk to the side where I can try to use the 89% that did not get reformatted?

Link to comment

It finished, I reloaded the page, stopped the array, then started in maintenance mode, and in the page for disk 8, it shows as Unmountable: No file system and fs type is set to auto.

 

blkid:

/dev/loop0: TYPE="squashfs"
/dev/loop1: TYPE="squashfs"
/dev/sda1: LABEL_FATBOOT="UNRAID" LABEL="UNRAID" UUID="2732-64F5" TYPE="vfat" PARTUUID="a3760dfe-01"
/dev/nvme0n1p1: UUID="f69130dd-7800-43c1-8fe6-1409cc4d3060" TYPE="crypto_LUKS"
/dev/sdb1: UUID="59de781d-b037-441a-b7eb-c918e2ed2d49" TYPE="crypto_LUKS" PARTUUID="ce24456f-79a7-425b-926b-908c829c8719"
/dev/sdc1: UUID="ffb9c825-16f5-49bb-9225-58b349c15524" TYPE="crypto_LUKS" PARTUUID="62e353df-95ac-41a2-98e6-aaec2b37913d"
/dev/sdd1: UUID="74cc9054-0ad6-4c5a-b17e-ffa174b8816a" TYPE="crypto_LUKS" PARTUUID="b6ae2bc8-aad2-489f-bb4f-b354371d9511"
/dev/sde1: UUID="36fb8f2a-7833-4dfc-8838-af76fb89a733" TYPE="crypto_LUKS" PARTUUID="2f100d44-a1b4-4e39-94ca-388105318d81"
/dev/sdf1: UUID="f6804684-df06-42ab-9afc-ec4277c848f2" TYPE="crypto_LUKS" PARTUUID="51acfce5-61fb-4788-961a-2b34b6115fa6"
/dev/sdh1: UUID="023c43b4-cff7-45b8-bc7c-df5e85630455" TYPE="crypto_LUKS" PARTUUID="a45371be-e202-4a81-9604-ffc1d7591bc5"
/dev/sdi1: UUID="a5241765-df2b-4966-ba7f-38fed7ae6d58" TYPE="crypto_LUKS" PARTUUID="a74c7aaa-ff16-4885-bad4-1aab9a3b39ce"
/dev/sdj1: UUID="afc0186b-5d48-4888-bdcc-99e3c17af950" TYPE="crypto_LUKS" PARTUUID="57111893-5e4f-428e-a927-6d96fdef8fd2"
/dev/sdk1: UUID="023c43b4-cff7-45b8-bc7c-df5e85630455" TYPE="crypto_LUKS" PARTUUID="e4acd90a-fcfd-4b45-a68a-1bc496acd051"
/dev/sdl1: UUID="d41db265-f644-4ad9-9c8e-78e38673af04" TYPE="crypto_LUKS" PARTUUID="31a34a62-a0f0-45d1-97f2-3d103dab2d76"
/dev/md1: UUID="023c43b4-cff7-45b8-bc7c-df5e85630455" TYPE="crypto_LUKS"
/dev/md2: UUID="d41db265-f644-4ad9-9c8e-78e38673af04" TYPE="crypto_LUKS"
/dev/md3: UUID="a5241765-df2b-4966-ba7f-38fed7ae6d58" TYPE="crypto_LUKS"
/dev/md4: UUID="59de781d-b037-441a-b7eb-c918e2ed2d49" TYPE="crypto_LUKS"
/dev/md5: UUID="ffb9c825-16f5-49bb-9225-58b349c15524" TYPE="crypto_LUKS"
/dev/md6: UUID="36fb8f2a-7833-4dfc-8838-af76fb89a733" TYPE="crypto_LUKS"
/dev/md7: UUID="f6804684-df06-42ab-9afc-ec4277c848f2" TYPE="crypto_LUKS"
/dev/md8: UUID="afc0186b-5d48-4888-bdcc-99e3c17af950" TYPE="crypto_LUKS"
/dev/md9: UUID="023c43b4-cff7-45b8-bc7c-df5e85630455" TYPE="crypto_LUKS"
/dev/mapper/md1: UUID="d1c0645c-cf5b-4589-bd2f-6dccc0f99467" TYPE="xfs"
/dev/mapper/md2: UUID="b83db605-8817-4174-9db9-b7e43e533179" TYPE="xfs"
/dev/mapper/md3: UUID="db2b3d1c-513a-4b32-bb55-5ca4df663303" TYPE="xfs"
/dev/mapper/md4: UUID="f17d514e-699f-4939-b22e-83ee770c67d7" TYPE="xfs"
/dev/mapper/md5: UUID="0a7c834d-88fc-4318-85c5-a69a7449f1dc" TYPE="xfs"
/dev/mapper/md6: UUID="3aea003c-7173-4efb-bfec-a775d9ebe4cf" TYPE="xfs"
/dev/mapper/md7: UUID="af81136a-8131-4341-b705-f6c50638961f" TYPE="xfs"
/dev/mapper/md8: UUID="196ad532-7693-46cf-ad40-13bdccc057cf" TYPE="xfs"
/dev/mapper/md9: UUID="291b9458-9fa2-4d95-a68f-2c31eecf5d57" TYPE="xfs"
/dev/mapper/nvme0n1p1: UUID="0ee7ecd1-bff0-43c7-b1e7-def11ff953c3" UUID_SUB="229084e1-41a2-4fbc-ab3e-e7a73d2c48d4" TYPE="btrfs"
/dev/nvme0n1: PTTYPE="dos"
/dev/sdg1: UUID="b88n:m?f-7ldi-4;>5-c:nl-o=6j?n<ccec0" TYPE="crypto_LUKS" PARTUUID="14cef639-350a-4daf-bfc0-ee5239c0ec62"

 

 

sdk1 and sdh1 are the same uuid, what should my next step be?

 

edit: The drive is not showing up in the mapper part so I'm guessing that means I cant run the command xfs_admin -U generate /dev/mapper/mdX, because there is no corresponding md value.

edit2: Disk  Log has this error:

Oct 29 14:40:03 Tower kernel: print_req_error: I/O error, dev sdj, sector 15628052928

Edited by bobobeastie
Link to comment

Problem isn't the duplicate UUID, duplicate UUIDs are just on the LUKS device, and although strange it might be normal on the LUKS devices, I don't use encryption so not sure, in any case the xfs filesystem have different UUIDs:

 

/dev/mapper/md1: UUID="d1c0645c-cf5b-4589-bd2f-6dccc0f99467" TYPE="xfs"
/dev/mapper/md8: UUID="196ad532-7693-46cf-ad40-13bdccc057cf" TYPE="xfs"

 

And the problem is just standard filesytem corruption:

 

Oct 30 02:55:55 Tower kernel: XFS (dm-7): Metadata CRC error detected at xfs_sb_read_verify+0x111/0x15f [xfs], xfs_sb_quiet block 0xffffffffffffffff
Oct 30 02:55:55 Tower kernel: XFS (dm-7): Unmount and run xfs_repair

 

Run xfs_repair on disk8

 

 

Link to comment

Look like yes, in maintenance mode:

 

root@Tower:~# xfs_repair -v /dev/mapper/md8
Phase 1 - find and verify superblock...
bad primary superblock - bad CRC in superblock !!!

attempting to find secondary superblock...
.found candidate secondary superblock...
verified secondary superblock...
writing modified primary superblock
        - block cache size set to 722176 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 18 tail block 18
        - scan filesystem freespace and inode maps...
Metadata CRC error detected at 0x439356, xfs_agf block 0x15d50b4c1/0x200Metadata CRC error detected at 0x439356, xfs_agf block 0x246312d41/0x200

Metadata CRC error detected at 0x463086, xfs_agi block 0x246312d42/0x200
Metadata CRC error detected at 0x463086, xfs_agi block 0x15d50b4c2/0x200
bad uuid 196ad532-7693-46cf-5887-4b8d0df5f997 for agi 6
reset bad agi for ag 6
bad uuid 196ad532-7693-46cf-38ff-0f39a7ed4fe1 for agi 10
reset bad agi for ag 10
Metadata CRC error detected at 0x438f94, xfs_agfl block 0x246312d43/0x200
agfl has bad CRC for ag 10
bad agbno 1156485499 in agfl, agno 10
bad agbno 1703632283 in agfl, agno 10
bad agbno 4274230279 in agfl, agno 10
bad agbno 3276528919 in agfl, agno 10
bad agbno 1321478119 for btbno root, agno 10
bad agbno 1891151798 for btbcnt root, agno 10
agf_freeblks 60595936, counted 0 in ag 10
agf_longest 13360461, counted 0 in ag 10
bad agbno 3307702038 for finobt root, agno 6
bad agbno 3783116261 for finobt root, agno 10
agi_freecount 363, counted 0 in ag 10 finobt
sb_icount 18560, counted 2816
sb_ifree 221, counted 465
sb_fdblocks 4653142, counted 1265621349
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 2
        - agno = 10
        - agno = 4
        - agno = 6
        - agno = 5
        - agno = 7
        - agno = 8
        - agno = 1
        - agno = 9
        - agno = 11
        - agno = 3
Phase 5 - rebuild AG headers and trees...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Note - stripe unit (0) and width (0) were copied from a backup superblock.
Please reset with mount -o sunit=<value>,swidth=<value> if necessary

        XFS_REPAIR Summary    Wed Oct 30 03:32:19 2019

Phase           Start           End             Duration
Phase 1:        10/30 03:32:18  10/30 03:32:18
Phase 2:        10/30 03:32:18  10/30 03:32:18
Phase 3:        10/30 03:32:18  10/30 03:32:19  1 second
Phase 4:        10/30 03:32:19  10/30 03:32:19
Phase 5:        10/30 03:32:19  10/30 03:32:19
Phase 6:        10/30 03:32:19  10/30 03:32:19
Phase 7:        10/30 03:32:19  10/30 03:32:19

Total run time: 1 second
done

 

After that I can start the array!!  Thank you very much, time to try to recover some things.

Link to comment

Everything is good, I found some files on a disk that had been replaced and added them back to the array.  I'm trying to mount the disk 8 that had been part of the array but that I kept out while fixing it, it won't mount in unassigned devices, and I noticed this in the log:

 

Oct 30 15:31:17 Tower unassigned.devices: Adding disk '/dev/mapper/HGST_HDN726060ALE614_K1H90MAD'...
Oct 30 15:31:17 Tower unassigned.devices: luksOpen error: Device HGST_HDN726060ALE614_K1H90MAD already exists.
Oct 30 15:31:17 Tower unassigned.devices: Partition 'HGST_HDN726060ALE614_K1H90MAD' could not be mounted...

 

Can anything be done to get it mounted?

tower-diagnostics-20191030-2349.zip

Link to comment

Won't mount when server is freshly booted and array off, but there's probably no mechanism to read an inputted key at this point, won't mount when array has been started and stopped either.

 

Nov 3 03:00:05 Tower unassigned.devices: Adding disk '/dev/mapper/HGST_HDN726060ALE614_K1H90MAD'...
Nov 3 03:00:05 Tower unassigned.devices: luksOpen error: Device HGST_HDN726060ALE614_K1H90MAD already exists.
Nov 3 03:00:05 Tower unassigned.devices: Partition 'HGST_HDN726060ALE614_K1H90MAD' could not be mounted...

 

I'm ready to give up on the drive if there's nothing I can do, just want to check this last time.  I'm hoping after a pre-clear or two that I can add it as a new disk.

tower-diagnostics-20191103-1101.zip

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.