bobobeastie

November 3, 2019

50 minutes ago, dlandon said:

UD has this built in. Click the + sign by the serial number, then click on the four squares icon.

I did that, that's what this part is from:

FS: crypto_LUKS

/sbin/fsck /dev/mapper/HGST_HDN726060ALE614_K1H90MAD 2>&1

fsck from util-linux 2.33.2
If you wish to check the consistency of an XFS filesystem or
repair a damaged filesystem, see xfs_repair(8).

I'm guessing it didn't work.

November 3, 2019

Nov 3 14:48:45 Tower unassigned.devices: Adding disk '/dev/mapper/HGST_HDN726060ALE614_K1H90MAD'...
Nov 3 14:48:46 Tower unassigned.devices: luksOpen error: Device HGST_HDN726060ALE614_K1H90MAD already exists.
Nov 3 14:48:46 Tower unassigned.devices: Partition 'HGST_HDN726060ALE614_K1H90MAD' could not be mounted...

I was able to run some sort of filesystem check using the terminal, I think -L, it fixed some issues, after and I think beofre fixing it, I get this this when I use the web ui and I'm not sure what it means:

FS: crypto_LUKS

/sbin/fsck /dev/mapper/HGST_HDN726060ALE614_K1H90MAD 2>&1

fsck from util-linux 2.33.2
If you wish to check the consistency of an XFS filesystem or
repair a damaged filesystem, see xfs_repair(8).

I used "/sbin/fsck /dev/mapper/HGST_HDN726060ALE614_K1H90MAD" from above to figure out which disk to tell it top check/fix.

November 3, 2019

5 minutes ago, dlandon said:

I wanted you to leave it auto mount to see if it would mount.

Try to mount it now.

Same boot after trying to mount.

tower-diagnostics-20191103-2152.zip

November 3, 2019

42 minutes ago, dlandon said:

I don't see anything that would keep the UD disk from mounting. Reboot your server and start the array with the passphrase. Don't do anything with the UD disk. Let it mount on its own. Once the array has started, wait about 5 minutes and post diagnostics.

Diagnostics attached, post reboot without touching UD, and it isn't set to auto mount.

tower-diagnostics-20191103-2036.zip

November 3, 2019

1 hour ago, dlandon said:

No. You cannot force the disk to mount.

Looks like Unraid still thinks it is part of the array. Please show a screen shot of the Main tab Array Devices.

Thanks, see attached.

November 3, 2019

2 minutes ago, dlandon said:
Go to a Unraid terminal and type:
cat /root/keyfile
This will show you the keyfile contents.

If the keyfile exists, UD should mount the encrypted drive.

That works, it shows my passphrase. My assumption was that this issue was caused by this drive having been already in the array because of this bit of the log from when I try to mount:

Nov 3 03:00:05 Tower unassigned.devices: Adding disk '/dev/mapper/HGST_HDN726060ALE614_K1H90MAD'...
Nov 3 03:00:05 Tower unassigned.devices: luksOpen error: Device HGST_HDN726060ALE614_K1H90MAD already exists.
Nov 3 03:00:05 Tower unassigned.devices: Partition 'HGST_HDN726060ALE614_K1H90MAD' could not be mounted...

Specifically the already exists part. Can I either make it ignore this safely, or can I make it realize that there are no disks in the array with the same serial? I could off course be wrong, I'm not an expert on any of this.

November 3, 2019

2 hours ago, dlandon said:

According to the log, the array is not starting because the keyfile is missing, so I don't see how you can say the array is working without issues. The array has to be started before UD can see the keyfile.

You can create a keyfile with only the passphrase text, Be sure there are no new lines in the file.

I can't figure out how to get a keyfile, Krusader won't allow me to place a file in root, and I can't copy a keyfile that I created in appdata. Figuring it needed escalated priveledges, I tried mc from the terminal, but I can't figure out mc, command line file editors/browsers are beyond me. I'm assuming I could copy the appdata keyfile to root on the command line if I had the right command. I do see a keyfile in root in mc which thankfully defaults to root, but I can't figure out how to view it.

I'm not sure any of that matters, because I can most definitely start my array and access shares and files on them, docker starts fine.

edit: Is the missing key bit in the syslog just from after a reboot when it is waiting for me to provide the key maybe? I think there are two different methods to unlock encryption, using a file and using a passphrase, and I'm using passphase, because I thought it would be more secure.

November 3, 2019

1 minute ago, dlandon said:

You have a keyfile problem.


Nov  1 06:37:43 Tower root: Failed to open key file.
Nov  1 06:37:43 Tower emhttpd: shcmd (47): exit status: 1
Nov  1 06:37:43 Tower emhttpd: shcmd (49): /usr/sbin/cryptsetup luksOpen /dev/md2 md2 --allow-discards --key-file /root/keyfile
Nov  1 06:37:43 Tower root: Failed to open key file.
Nov  1 06:37:43 Tower emhttpd: shcmd (49): exit status: 1
Nov  1 06:37:43 Tower emhttpd: shcmd (51): /usr/sbin/cryptsetup luksOpen /dev/md3 md3 --allow-discards --key-file /root/keyfile
Nov  1 06:37:43 Tower root: Failed to open key file.
Nov  1 06:37:43 Tower emhttpd: shcmd (51): exit status: 1
Nov  1 06:37:43 Tower emhttpd: shcmd (53): /usr/sbin/cryptsetup luksOpen /dev/md4 md4 --allow-discards --key-file /root/keyfile
Nov  1 06:37:43 Tower root: Failed to open key file.
Nov  1 06:37:43 Tower emhttpd: shcmd (53): exit status: 1
Nov  1 06:37:43 Tower emhttpd: shcmd (55): /usr/sbin/cryptsetup luksOpen /dev/md5 md5 --allow-discards --key-file /root/keyfile
Nov  1 06:37:43 Tower root: Failed to open key file.
Nov  1 06:37:43 Tower emhttpd: shcmd (55): exit status: 1
Nov  1 06:37:43 Tower emhttpd: shcmd (57): /usr/sbin/cryptsetup luksOpen /dev/md6 md6 --allow-discards --key-file /root/keyfile
Nov  1 06:37:43 Tower root: Failed to open key file.
Nov  1 06:37:43 Tower emhttpd: shcmd (57): exit status: 1
Nov  1 06:37:43 Tower emhttpd: shcmd (59): /usr/sbin/cryptsetup luksOpen /dev/md7 md7 --allow-discards --key-file /root/keyfile
Nov  1 06:37:43 Tower root: Failed to open key file.
Nov  1 06:37:43 Tower emhttpd: shcmd (59): exit status: 1
Nov  1 06:37:43 Tower emhttpd: shcmd (61): /usr/sbin/cryptsetup luksOpen /dev/md8 md8 --allow-discards --key-file /root/keyfile
Nov  1 06:37:43 Tower root: Failed to open key file.
Nov  1 06:37:43 Tower emhttpd: shcmd (61): exit status: 1
Nov  1 06:37:43 Tower emhttpd: shcmd (63): /usr/sbin/cryptsetup luksOpen /dev/md9 md9 --allow-discards --key-file /root/keyfile
Nov  1 06:37:43 Tower root: Failed to open key file.
Nov  1 06:37:43 Tower emhttpd: shcmd (63): exit status: 1
Nov  1 06:37:43 Tower emhttpd: shcmd (65): /usr/sbin/cryptsetup luksOpen /dev/nvme0n1p1 nvme0n1p1 --allow-discards --key-file /root/keyfile
Nov  1 06:37:43 Tower root: Failed to open key file.
Nov  1 06:37:43 Tower emhttpd: shcmd (65): exit status: 1
Nov  1 06:37:43 Tower emhttpd: Missing encryption key
Nov  1 06:37:43 Tower kernel: mdcmd (46): stop 
Nov  1 06:37:43 Tower kernel: md1: stopping
Nov  1 06:37:43 Tower kernel: md2: stopping
Nov  1 06:37:43 Tower kernel: md3: stopping
Nov  1 06:37:43 Tower kernel: md4: stopping
Nov  1 06:37:43 Tower kernel: md5: stopping
Nov  1 06:37:43 Tower kernel: md6: stopping
Nov  1 06:37:43 Tower kernel: md7: stopping
Nov  1 06:37:43 Tower kernel: md8: stopping
Nov  1 06:37:43 Tower kernel: md9: stopping
Nov  1 06:37:43 Tower root: error: /plugins/unassigned.devices/UnassignedDevices.php: wrong csrf_token
Nov  1 06:37:44 Tower avahi-daemon[6579]: Server startup complete. Host name is Tower.local. Local service cookie is 3274231656.
Nov  1 06:37:44 Tower root: error: /plugins/unassigned.devices/UnassignedDevices.php: wrong csrf_token
Nov  1 06:37:44 Tower avahi-daemon[6579]: Service "Tower" (/services/ssh.service) successfully established.
Nov  1 06:37:44 Tower avahi-daemon[6579]: Service "Tower" (/services/smb.service) successfully established.
Nov  1 06:37:44 Tower avahi-daemon[6579]: Service "Tower" (/services/sftp-ssh.service) successfully established.
Nov  1 06:37:45 Tower root: error: /plugins/unassigned.devices/UnassignedDevices.php: wrong csrf_token
### [PREVIOUS LINE REPEATED 89 TIMES] ###
Nov  1 06:39:20 Tower unassigned.devices: Adding disk '/dev/mapper/HGST_HDN726060ALE614_K1H90MAD'...
Nov  1 06:39:20 Tower unassigned.devices: luksOpen: key file not found.

You need to start the array and enter the passphrase so it is available to UD, or provide the key file. The keyfile is 'keyfile' and is stored at /root/. It contains the passphrase that UD will use.

I don't have a keyfile as far as I know, and I don't see it in root using Krusader. I enter my key in to a text field before starting my array, which works with no issues that I am aware of. This is the same arrangement where I mounted another encrypted drive. It it would help I could temporarily use a keyfile, do I simply create the file and then enter my encryption password as the only text in the file?

November 3, 2019

I'm trying to mount an encrypted xfs drive that had been part of my array but I was having issues, maybe due to a bad SATA card/cable, so I ended up replacing the drive after installing a SAS card. I suspect there might be files on it that are not on the array, so I have been trying to mount it in unasigned devices to find out, and if so recover them. I was able to mount a another encrypted drive in a similar situation and recover files. The drive I'm having issues with won't mount when server is freshly booted and the array is off, but there's probably no mechanism to read an inputted key at this point, and it won't mount when array has been started and stopped either. I see this in the logs when I try to mount:

Nov 3 03:00:05 Tower unassigned.devices: Adding disk '/dev/mapper/HGST_HDN726060ALE614_K1H90MAD'...
Nov 3 03:00:05 Tower unassigned.devices: luksOpen error: Device HGST_HDN726060ALE614_K1H90MAD already exists.
Nov 3 03:00:05 Tower unassigned.devices: Partition 'HGST_HDN726060ALE614_K1H90MAD' could not be mounted...

I'm ready to give up on the drive if there's nothing I can do, just want to check. Maybe unrelated, but after attempting to mount the disk, I started my array ack up and a parity check started, despite running successfully to completion yesterday. Attached is fresh diagnostics from after the newest parity chjeck was started and canceled. If it helps I have been getting some excellent help from ti-ti jorge regarding the lead up to this situation on this thread:

tower-diagnostics-20191103-1117.zip

November 3, 2019

Won't mount when server is freshly booted and array off, but there's probably no mechanism to read an inputted key at this point, won't mount when array has been started and stopped either.

Nov 3 03:00:05 Tower unassigned.devices: Adding disk '/dev/mapper/HGST_HDN726060ALE614_K1H90MAD'...
Nov 3 03:00:05 Tower unassigned.devices: luksOpen error: Device HGST_HDN726060ALE614_K1H90MAD already exists.
Nov 3 03:00:05 Tower unassigned.devices: Partition 'HGST_HDN726060ALE614_K1H90MAD' could not be mounted...

I'm ready to give up on the drive if there's nothing I can do, just want to check this last time. I'm hoping after a pre-clear or two that I can add it as a new disk.

tower-diagnostics-20191103-1101.zip

October 30, 2019

Everything is good, I found some files on a disk that had been replaced and added them back to the array. I'm trying to mount the disk 8 that had been part of the array but that I kept out while fixing it, it won't mount in unassigned devices, and I noticed this in the log:

Oct 30 15:31:17 Tower unassigned.devices: Adding disk '/dev/mapper/HGST_HDN726060ALE614_K1H90MAD'...
Oct 30 15:31:17 Tower unassigned.devices: luksOpen error: Device HGST_HDN726060ALE614_K1H90MAD already exists.
Oct 30 15:31:17 Tower unassigned.devices: Partition 'HGST_HDN726060ALE614_K1H90MAD' could not be mounted...

Can anything be done to get it mounted?

tower-diagnostics-20191030-2349.zip

October 30, 2019

Look like yes, in maintenance mode:

root@Tower:~# xfs_repair -v /dev/mapper/md8
Phase 1 - find and verify superblock...
bad primary superblock - bad CRC in superblock !!!

attempting to find secondary superblock...
.found candidate secondary superblock...
verified secondary superblock...
writing modified primary superblock
- block cache size set to 722176 entries
Phase 2 - using internal log
- zero log...
zero_log: head block 18 tail block 18
- scan filesystem freespace and inode maps...
Metadata CRC error detected at 0x439356, xfs_agf block 0x15d50b4c1/0x200Metadata CRC error detected at 0x439356, xfs_agf block 0x246312d41/0x200

Metadata CRC error detected at 0x463086, xfs_agi block 0x246312d42/0x200
Metadata CRC error detected at 0x463086, xfs_agi block 0x15d50b4c2/0x200
bad uuid 196ad532-7693-46cf-5887-4b8d0df5f997 for agi 6
reset bad agi for ag 6
bad uuid 196ad532-7693-46cf-38ff-0f39a7ed4fe1 for agi 10
reset bad agi for ag 10
Metadata CRC error detected at 0x438f94, xfs_agfl block 0x246312d43/0x200
agfl has bad CRC for ag 10
bad agbno 1156485499 in agfl, agno 10
bad agbno 1703632283 in agfl, agno 10
bad agbno 4274230279 in agfl, agno 10
bad agbno 3276528919 in agfl, agno 10
bad agbno 1321478119 for btbno root, agno 10
bad agbno 1891151798 for btbcnt root, agno 10
agf_freeblks 60595936, counted 0 in ag 10
agf_longest 13360461, counted 0 in ag 10
bad agbno 3307702038 for finobt root, agno 6
bad agbno 3783116261 for finobt root, agno 10
agi_freecount 363, counted 0 in ag 10 finobt
sb_icount 18560, counted 2816
sb_ifree 221, counted 465
sb_fdblocks 4653142, counted 1265621349
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- agno = 8
- agno = 9
- agno = 10
- agno = 11
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 2
- agno = 10
- agno = 4
- agno = 6
- agno = 5
- agno = 7
- agno = 8
- agno = 1
- agno = 9
- agno = 11
- agno = 3
Phase 5 - rebuild AG headers and trees...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- agno = 8
- agno = 9
- agno = 10
- agno = 11
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- traversing filesystem ...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- agno = 8
- agno = 9
- agno = 10
- agno = 11
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Note - stripe unit (0) and width (0) were copied from a backup superblock.
Please reset with mount -o sunit=<value>,swidth=<value> if necessary

XFS_REPAIR Summary Wed Oct 30 03:32:19 2019

Phase Start End Duration
Phase 1: 10/30 03:32:18 10/30 03:32:18
Phase 2: 10/30 03:32:18 10/30 03:32:18
Phase 3: 10/30 03:32:18 10/30 03:32:19 1 second
Phase 4: 10/30 03:32:19 10/30 03:32:19
Phase 5: 10/30 03:32:19 10/30 03:32:19
Phase 6: 10/30 03:32:19 10/30 03:32:19
Phase 7: 10/30 03:32:19 10/30 03:32:19

Total run time: 1 second
done

After that I can start the array!! Thank you very much, time to try to recover some things.

October 30, 2019

Should that be in maintenance mode?

Unmounted:

root@Tower:~# xfs_repair -v /dev/mapper/md8
/dev/mapper/md8: No such file or directory
/dev/mapper/md8: No such file or directory

fatal error -- couldn't initialize XFS library

October 30, 2019

Sorry, I may have forgotten to mention that xfs_repair is not listed on the page for disk 8, while it is for others, even in maintenance mode. Can I run it using the terminal? If so what command?

edit: Not available unmounted either.

October 30, 2019

Ok, thought maintenance mode would do it

tower-diagnostics-20191030-0956.zip

October 30, 2019

Diagnostics

tower-diagnostics-20191030-0902.zip

October 29, 2019

It finished, I reloaded the page, stopped the array, then started in maintenance mode, and in the page for disk 8, it shows as Unmountable: No file system and fs type is set to auto.

blkid:

/dev/loop0: TYPE="squashfs"
/dev/loop1: TYPE="squashfs"
/dev/sda1: LABEL_FATBOOT="UNRAID" LABEL="UNRAID" UUID="2732-64F5" TYPE="vfat" PARTUUID="a3760dfe-01"
/dev/nvme0n1p1: UUID="f69130dd-7800-43c1-8fe6-1409cc4d3060" TYPE="crypto_LUKS"
/dev/sdb1: UUID="59de781d-b037-441a-b7eb-c918e2ed2d49" TYPE="crypto_LUKS" PARTUUID="ce24456f-79a7-425b-926b-908c829c8719"
/dev/sdc1: UUID="ffb9c825-16f5-49bb-9225-58b349c15524" TYPE="crypto_LUKS" PARTUUID="62e353df-95ac-41a2-98e6-aaec2b37913d"
/dev/sdd1: UUID="74cc9054-0ad6-4c5a-b17e-ffa174b8816a" TYPE="crypto_LUKS" PARTUUID="b6ae2bc8-aad2-489f-bb4f-b354371d9511"
/dev/sde1: UUID="36fb8f2a-7833-4dfc-8838-af76fb89a733" TYPE="crypto_LUKS" PARTUUID="2f100d44-a1b4-4e39-94ca-388105318d81"
/dev/sdf1: UUID="f6804684-df06-42ab-9afc-ec4277c848f2" TYPE="crypto_LUKS" PARTUUID="51acfce5-61fb-4788-961a-2b34b6115fa6"
/dev/sdh1: UUID="023c43b4-cff7-45b8-bc7c-df5e85630455" TYPE="crypto_LUKS" PARTUUID="a45371be-e202-4a81-9604-ffc1d7591bc5"
/dev/sdi1: UUID="a5241765-df2b-4966-ba7f-38fed7ae6d58" TYPE="crypto_LUKS" PARTUUID="a74c7aaa-ff16-4885-bad4-1aab9a3b39ce"
/dev/sdj1: UUID="afc0186b-5d48-4888-bdcc-99e3c17af950" TYPE="crypto_LUKS" PARTUUID="57111893-5e4f-428e-a927-6d96fdef8fd2"
/dev/sdk1: UUID="023c43b4-cff7-45b8-bc7c-df5e85630455" TYPE="crypto_LUKS" PARTUUID="e4acd90a-fcfd-4b45-a68a-1bc496acd051"
/dev/sdl1: UUID="d41db265-f644-4ad9-9c8e-78e38673af04" TYPE="crypto_LUKS" PARTUUID="31a34a62-a0f0-45d1-97f2-3d103dab2d76"
/dev/md1: UUID="023c43b4-cff7-45b8-bc7c-df5e85630455" TYPE="crypto_LUKS"
/dev/md2: UUID="d41db265-f644-4ad9-9c8e-78e38673af04" TYPE="crypto_LUKS"
/dev/md3: UUID="a5241765-df2b-4966-ba7f-38fed7ae6d58" TYPE="crypto_LUKS"
/dev/md4: UUID="59de781d-b037-441a-b7eb-c918e2ed2d49" TYPE="crypto_LUKS"
/dev/md5: UUID="ffb9c825-16f5-49bb-9225-58b349c15524" TYPE="crypto_LUKS"
/dev/md6: UUID="36fb8f2a-7833-4dfc-8838-af76fb89a733" TYPE="crypto_LUKS"
/dev/md7: UUID="f6804684-df06-42ab-9afc-ec4277c848f2" TYPE="crypto_LUKS"
/dev/md8: UUID="afc0186b-5d48-4888-bdcc-99e3c17af950" TYPE="crypto_LUKS"
/dev/md9: UUID="023c43b4-cff7-45b8-bc7c-df5e85630455" TYPE="crypto_LUKS"
/dev/mapper/md1: UUID="d1c0645c-cf5b-4589-bd2f-6dccc0f99467" TYPE="xfs"
/dev/mapper/md2: UUID="b83db605-8817-4174-9db9-b7e43e533179" TYPE="xfs"
/dev/mapper/md3: UUID="db2b3d1c-513a-4b32-bb55-5ca4df663303" TYPE="xfs"
/dev/mapper/md4: UUID="f17d514e-699f-4939-b22e-83ee770c67d7" TYPE="xfs"
/dev/mapper/md5: UUID="0a7c834d-88fc-4318-85c5-a69a7449f1dc" TYPE="xfs"
/dev/mapper/md6: UUID="3aea003c-7173-4efb-bfec-a775d9ebe4cf" TYPE="xfs"
/dev/mapper/md7: UUID="af81136a-8131-4341-b705-f6c50638961f" TYPE="xfs"
/dev/mapper/md8: UUID="196ad532-7693-46cf-ad40-13bdccc057cf" TYPE="xfs"
/dev/mapper/md9: UUID="291b9458-9fa2-4d95-a68f-2c31eecf5d57" TYPE="xfs"
/dev/mapper/nvme0n1p1: UUID="0ee7ecd1-bff0-43c7-b1e7-def11ff953c3" UUID_SUB="229084e1-41a2-4fbc-ab3e-e7a73d2c48d4" TYPE="btrfs"
/dev/nvme0n1: PTTYPE="dos"
/dev/sdg1: UUID="b88n:m?f-7ldi-4;>5-c:nl-o=6j?n<ccec0" TYPE="crypto_LUKS" PARTUUID="14cef639-350a-4daf-bfc0-ee5239c0ec62"

sdk1 and sdh1 are the same uuid, what should my next step be?

edit: The drive is not showing up in the mapper part so I'm guessing that means I cant run the command xfs_admin -U generate /dev/mapper/mdX, because there is no corresponding md value.

edit2: Disk Log has this error:

Oct 29 14:40:03 Tower kernel: print_req_error: I/O error, dev sdj, sector 15628052928

October 29, 2019

Excellent, thank you, I put the different/"new" drive in as disk 8, followed instructions exactly, and the "new" disk 8 is listed as "Unmountable: No file system", but thanks to your detailed instructions I did not stop the array or select to format the drive.. Once the rebuild is done, and I run a file system check, what will the outcome be? I assume what is really being checked is the emulated contents based on the other drives and parity, so if that emulated drive is fixed, am I then able to rebuild the drives contents again, or does it magically become mountable in emulation and on the physical drive, and everything goes back to normal? Or is 8 lost and it's good that I kept the old disk to the side where I can try to use the 89% that did not get reformatted?

October 26, 2019

Good to have a probable explanation, the 4 port one was Marvell 88SE9215, the 2 port one was Asmedia 1061. So I think having an enterprise level SAS card that I have used without issue for over a year will be good.

October 25, 2019

Great, thank you very much. I'm going to play it safe and take your advice about using a new drive. Wen't to bestbuy.com and it let me know a 10tb easystore in my cart had gone down in price, so I'm going to take that as a sign, put that in my main server, replace an 8tb drive, and use that 8tb drive here.

I take it this fixes Disk9, and 8 will be rebuilt from parity?

Also, is it likely that bad data controllers had anything to do with causing any of these issues?

October 25, 2019

I know I have single parity and what that entails, so I was not expecting and parity sync to work as is, I forgot to state that. It is confusing to me that two drives would be listed as emulated in a single parity array. What I'm asking is, is there anything I can do to fix either drive, preferably 9, as 8 was 11% in to a parity sync, such that I can minimize lost data. I think Disk9 might have shown up as normal after installing the "new" SAS card, but I think I ran an xfs repair, which maybe caused this issue? Based on that I'm guessing this isn't a hardware issue, but if needed I could temporarily go back to 4+2 SATA cards.

October 25, 2019

I waited to continue with this issue until I had a replacement for the 4 port pcie SATA and 2 port sata cards that I was using on top of my 5 onboard SATA ports, and I received a 16i LSI SAS controller for my main server, so that I could take one of the 8i SAS controllers for this computer, which I haven't had a single issue with.

SO using one of the 8i controllers, actually I've tried both, I get a yellow exclamation mark saying "device contents emulated" on Disk8/sdh, which is the drive that was having issues in my last round of posts, and Disk9/sdk has a red X and says "Device is disabled, contents emulated". THe xfs_repair status for both is listed as "Not Available". Even though it's not available, just in case it helps, here's what happens when I run a -n check:

Disk 8:

Phase 1 - find and verify superblock... superblock read failed, offset 0, size 524288, ag 0, rval -1 fatal error -- Input/output error

Disk 9:

Phase 1 - find and verify superblock... superblock read failed, offset 0, size 524288, ag 0, rval -1 fatal error -- Input/output error

I'm guessing that doesn't mean anything because it's disabled. This was after moving to SAS to SATA breakout cables vs individual SATA cables on a SATA card. I tried both SAS cards and moved to different breakout SATA cables, and the issue did not move with the cables.

Just in case it helps, here's the blkid output, sdi and sdk have duplicate UUID's:

/dev/loop0: TYPE="squashfs"
/dev/loop1: TYPE="squashfs"
/dev/sda1: LABEL_FATBOOT="UNRAID" LABEL="UNRAID" UUID="2732-64F5" TYPE="vfat" PARTUUID="a3760dfe-01"
/dev/nvme0n1p1: UUID="f69130dd-7800-43c1-8fe6-1409cc4d3060" TYPE="crypto_LUKS"
/dev/sdb1: UUID="59de781d-b037-441a-b7eb-c918e2ed2d49" TYPE="crypto_LUKS" PARTUUID="ce24456f-79a7-425b-926b-908c829c8719"
/dev/sdc1: UUID="ffb9c825-16f5-49bb-9225-58b349c15524" TYPE="crypto_LUKS" PARTUUID="62e353df-95ac-41a2-98e6-aaec2b37913d"
/dev/sdd1: UUID="74cc9054-0ad6-4c5a-b17e-ffa174b8816a" TYPE="crypto_LUKS" PARTUUID="b6ae2bc8-aad2-489f-bb4f-b354371d9511"
/dev/sde1: UUID="36fb8f2a-7833-4dfc-8838-af76fb89a733" TYPE="crypto_LUKS" PARTUUID="2f100d44-a1b4-4e39-94ca-388105318d81"
/dev/sdf1: UUID="f6804684-df06-42ab-9afc-ec4277c848f2" TYPE="crypto_LUKS" PARTUUID="51acfce5-61fb-4788-961a-2b34b6115fa6"
/dev/sdh1: UUID="afc0186b-5d48-4888-bdcc-99e3c17af950" TYPE="crypto_LUKS" PARTUUID="f407d887-e020-47bf-bba4-3d024c26844d"
/dev/sdi1: UUID="a5241765-df2b-4966-ba7f-38fed7ae6d58" TYPE="crypto_LUKS" PARTUUID="a74c7aaa-ff16-4885-bad4-1aab9a3b39ce"
/dev/sdj1: UUID="023c43b4-cff7-45b8-bc7c-df5e85630455" TYPE="crypto_LUKS" PARTUUID="a45371be-e202-4a81-9604-ffc1d7591bc5"
/dev/sdk1: UUID="023c43b4-cff7-45b8-bc7c-df5e85630455" TYPE="crypto_LUKS" PARTUUID="e4acd90a-fcfd-4b45-a68a-1bc496acd051"
/dev/sdl1: UUID="d41db265-f644-4ad9-9c8e-78e38673af04" TYPE="crypto_LUKS" PARTUUID="31a34a62-a0f0-45d1-97f2-3d103dab2d76"
/dev/md1: UUID="023c43b4-cff7-45b8-bc7c-df5e85630455" TYPE="crypto_LUKS"
/dev/md2: UUID="d41db265-f644-4ad9-9c8e-78e38673af04" TYPE="crypto_LUKS"
/dev/md3: UUID="a5241765-df2b-4966-ba7f-38fed7ae6d58" TYPE="crypto_LUKS"
/dev/md4: UUID="59de781d-b037-441a-b7eb-c918e2ed2d49" TYPE="crypto_LUKS"
/dev/md5: UUID="ffb9c825-16f5-49bb-9225-58b349c15524" TYPE="crypto_LUKS"
/dev/md6: UUID="36fb8f2a-7833-4dfc-8838-af76fb89a733" TYPE="crypto_LUKS"
/dev/md7: UUID="f6804684-df06-42ab-9afc-ec4277c848f2" TYPE="crypto_LUKS"
/dev/mapper/md1: UUID="d1c0645c-cf5b-4589-bd2f-6dccc0f99467" TYPE="xfs"
/dev/mapper/md2: UUID="b83db605-8817-4174-9db9-b7e43e533179" TYPE="xfs"
/dev/mapper/md3: UUID="db2b3d1c-513a-4b32-bb55-5ca4df663303" TYPE="xfs"
/dev/mapper/md4: UUID="f17d514e-699f-4939-b22e-83ee770c67d7" TYPE="xfs"
/dev/mapper/md5: UUID="0a7c834d-88fc-4318-85c5-a69a7449f1dc" TYPE="xfs"
/dev/mapper/md6: UUID="3aea003c-7173-4efb-bfec-a775d9ebe4cf" TYPE="xfs"
/dev/mapper/md7: UUID="af81136a-8131-4341-b705-f6c50638961f" TYPE="xfs"
/dev/mapper/nvme0n1p1: UUID="0ee7ecd1-bff0-43c7-b1e7-def11ff953c3" UUID_SUB="229084e1-41a2-4fbc-ab3e-e7a73d2c48d4" TYPE="btrfs"
/dev/nvme0n1: PTTYPE="dos"
/dev/sdg1: UUID="b88n:m?f-7ldi-4;>5-c:nl-o=6j?n<ccec0" TYPE="crypto_LUKS" PARTUUID="14cef639-350a-4daf-bfc0-ee5239c0ec62"

What should my next step be? Both disks passed 4 rounds of pre-clears before using them on this server, so I'm thinking this issue is caused by some SATA card or cable issue from before my SAS card upgrade.

tower-diagnostics-20191024-1925.zip

October 20, 2019

I thought I checked the file system of a couple of drives when I had these issues earlier this month. The uuids had become duplicated again, so it was my understanding that it had to be fixed.

Is my data lost? I stopped the parity sync to the data drive at 11%, which probably doesn't matter. Does fixing file system errors change parity in maintenance mode? If so I guess that explains my situation.

October 19, 2019

That worked, I changed the uuid again, and was able to start a rebuild. I'm concerned that the free space in the drive being rebuilt is showing as way more than it should be, and there is a lost+found dir with half a terabyte of files in it, which I understand is a result of checking the disk, but I thought all of that would just make the drive ready to be mounted and overwritten with the data from a parity sync?

October 19, 2019

Thanks, -nv resulted in this:

Phase 1 - find and verify superblock...
bad primary superblock - bad CRC in superblock !!!

attempting to find secondary superblock...
.found candidate secondary superblock...
verified secondary superblock...
would write modified primary superblock
Primary superblock would have been modified.
Cannot proceed further in no_modify mode.
Exiting now.

and -v resulted in this:

Phase 1 - find and verify superblock... bad primary superblock - bad CRC in superblock !!! attempting to find secondary superblock... .found candidate secondary superblock... verified secondary superblock... writing modified primary superblock - block cache size set to 722176 entries Phase 2 - using internal log - zero log... zero_log: head block 517937 tail block 517933 ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this.

bobobeastie

Posts

Joined

Last visited

Content Type

Profiles

Forums

Downloads

Store

Gallery

Bug Reports

Documentation

Landing

Posts posted by bobobeastie

Unassigned Devices - Managing Disk Drives and Remote Shares Outside of The Unraid Array

Unassigned Devices - Managing Disk Drives and Remote Shares Outside of The Unraid Array

Unassigned Devices - Managing Disk Drives and Remote Shares Outside of The Unraid Array

Unassigned Devices - Managing Disk Drives and Remote Shares Outside of The Unraid Array

Unassigned Devices - Managing Disk Drives and Remote Shares Outside of The Unraid Array

Unassigned Devices - Managing Disk Drives and Remote Shares Outside of The Unraid Array

Unassigned Devices - Managing Disk Drives and Remote Shares Outside of The Unraid Array

Unassigned Devices - Managing Disk Drives and Remote Shares Outside of The Unraid Array

Unassigned Devices - Managing Disk Drives and Remote Shares Outside of The Unraid Array

Made some mistakes upgrading a drive (SOLVED)

Made some mistakes upgrading a drive (SOLVED)

Made some mistakes upgrading a drive (SOLVED)

Made some mistakes upgrading a drive (SOLVED)

Made some mistakes upgrading a drive (SOLVED)

Made some mistakes upgrading a drive (SOLVED)

Made some mistakes upgrading a drive (SOLVED)

Made some mistakes upgrading a drive (SOLVED)

Made some mistakes upgrading a drive (SOLVED)

Made some mistakes upgrading a drive (SOLVED)

Made some mistakes upgrading a drive (SOLVED)

Made some mistakes upgrading a drive (SOLVED)

Made some mistakes upgrading a drive (SOLVED)

Made some mistakes upgrading a drive (SOLVED)

Made some mistakes upgrading a drive (SOLVED)

Made some mistakes upgrading a drive (SOLVED)