September 6, 20196 yr These NVME drives were mounted as unassigned devices in RAID1 and were running my Win10 VMs. Not sure whether this was due to a hardware or filesystem issue. The directory on the drive that contained my VMs (that were running at the time and crashed after the errors) shows up as empty now (all VM images missing). Not sure whether it is s afe to disconnect/reconnect the drives or best way to proceed. Would appreciate any advice. [edit] I forced stopped my VM and disabled virtualization in the GUI - I don't know if this caused my VM images to go missing on disk. My highest priority is recovering these images Sep 5 14:33:22 Tower kernel: nvme nvme0: I/O 408 QID 9 timeout, aborting Sep 5 14:33:23 Tower kernel: nvme nvme0: I/O 409 QID 9 timeout, aborting Sep 5 14:33:23 Tower kernel: nvme nvme0: I/O 410 QID 9 timeout, aborting Sep 5 14:33:23 Tower kernel: nvme nvme0: I/O 411 QID 9 timeout, aborting Sep 5 14:33:23 Tower kernel: nvme nvme0: I/O 737 QID 7 timeout, aborting Sep 5 14:33:23 Tower kernel: nvme nvme0: I/O 738 QID 7 timeout, aborting Sep 5 14:33:23 Tower kernel: nvme nvme0: I/O 739 QID 7 timeout, aborting Sep 5 14:33:23 Tower kernel: nvme nvme0: I/O 198 QID 8 timeout, aborting Sep 5 14:33:52 Tower kernel: nvme nvme0: I/O 408 QID 9 timeout, reset controller Sep 5 14:34:23 Tower kernel: nvme nvme0: I/O 11 QID 0 timeout, reset controller Sep 5 14:35:23 Tower kernel: nvme nvme0: Device not ready; aborting reset Sep 5 14:35:23 Tower kernel: nvme nvme0: Abort status: 0x7 Sep 5 14:35:23 Tower kernel: nvme nvme0: Abort status: 0x7 Sep 5 14:35:23 Tower kernel: nvme nvme0: Abort status: 0x7 Sep 5 14:35:23 Tower kernel: nvme nvme0: Abort status: 0x7 Sep 5 14:35:23 Tower kernel: nvme nvme0: Abort status: 0x7 Sep 5 14:35:23 Tower kernel: nvme nvme0: Abort status: 0x7 Sep 5 14:35:23 Tower kernel: nvme nvme0: Abort status: 0x7 Sep 5 14:35:23 Tower kernel: nvme nvme0: Abort status: 0x7 Sep 5 14:35:42 Tower dhcpcd[2016]: br0: failed to renew DHCP, rebinding Sep 5 14:35:42 Tower dhcpcd[2016]: br0: truncated packet (180) from 192.168.10.10 Sep 5 14:35:42 Tower dhcpcd[2016]: br0: truncated packet (180) from 192.168.10.10 Sep 5 14:35:42 Tower dhcpcd[2016]: br0: truncated packet (152) from 192.168.10.10 Sep 5 14:35:42 Tower dhcpcd[2016]: br0: truncated packet (152) from 192.168.10.10 Sep 5 14:35:42 Tower dhcpcd[2016]: br0: truncated packet (132) from 192.168.10.12 Sep 5 14:35:42 Tower dhcpcd[2016]: br0: truncated packet (132) from 192.168.10.12 Sep 5 14:35:42 Tower dhcpcd[2016]: br0: truncated packet (24) from 192.168.10.12 Sep 5 14:35:42 Tower dhcpcd[2016]: br0: truncated packet (24) from 192.168.10.12 Sep 5 14:35:54 Tower kernel: nvme nvme0: Device not ready; aborting reset Sep 5 14:35:54 Tower kernel: nvme nvme0: Removing after probe failure status: -19 Sep 5 14:36:24 Tower kernel: nvme nvme0: Device not ready; aborting reset Sep 5 14:36:24 Tower kernel: BTRFS error (device dm-14): bdev /dev/mapper/ssd errs: wr 1, rd 0, flush 0, corrupt 0, gen 0 Sep 5 14:36:24 Tower kernel: print_req_error: I/O error, dev nvme0n1, sector 1419705920 Sep 5 14:36:24 Tower kernel: print_req_error: I/O error, dev nvme0n1, sector 1426329664 Sep 5 14:36:24 Tower kernel: print_req_error: I/O error, dev nvme0n1, sector 1313322472 Sep 5 14:36:24 Tower kernel: BTRFS error (device dm-14): bdev /dev/mapper/ssd errs: wr 3, rd 0, flush 0, corrupt 0, gen 0 Sep 5 14:36:24 Tower kernel: print_req_error: I/O error, dev nvme0n1, sector 726592960 Sep 5 14:36:24 Tower kernel: BTRFS error (device dm-14): bdev /dev/mapper/ssd errs: wr 4, rd 0, flush 0, corrupt 0, gen 0 Sep 5 14:36:24 Tower kernel: BTRFS error (device dm-14): bdev /dev/mapper/ssd errs: wr 5, rd 0, flush 0, corrupt 0, gen 0 Sep 5 14:36:24 Tower kernel: BTRFS error (device dm-14): bdev /dev/mapper/ssd errs: wr 6, rd 0, flush 0, corrupt 0, gen 0 Sep 5 14:36:24 Tower kernel: print_req_error: I/O error, dev nvme0n1, sector 1416326976 Sep 5 14:36:24 Tower kernel: BTRFS error (device dm-14): bdev /dev/mapper/ssd errs: wr 7, rd 0, flush 0, corrupt 0, gen 0 Sep 5 14:36:24 Tower kernel: print_req_error: I/O error, dev nvme0n1, sector 1426329600 Sep 5 14:36:24 Tower kernel: print_req_error: I/O error, dev nvme0n1, sector 1385254080 Sep 5 14:36:24 Tower kernel: BTRFS error (device dm-14): bdev /dev/mapper/ssd errs: wr 9, rd 0, flush 0, corrupt 0, gen 0 Sep 5 14:36:24 Tower kernel: BTRFS error (device dm-14): bdev /dev/mapper/ssd errs: wr 9, rd 0, flush 0, corrupt 0, gen 0 Sep 5 14:36:24 Tower kernel: BTRFS error (device dm-14): bdev /dev/mapper/ssd errs: wr 10, rd 0, flush 0, corrupt 0, gen 0 Sep 5 14:36:24 Tower kernel: print_req_error: I/O error, dev nvme0n1, sector 1416326896 Sep 5 14:36:24 Tower kernel: print_req_error: I/O error, dev nvme0n1, sector 1419705792 Sep 5 14:36:24 Tower kernel: print_req_error: I/O error, dev nvme0n1, sector 1355724792 Sep 5 14:36:24 Tower kernel: nvme nvme0: failed to set APST feature (-19) Sep 5 14:36:24 Tower kernel: BTRFS: error (device dm-14) in btrfs_run_delayed_refs:2935: errno=-5 IO failure Sep 5 14:36:24 Tower kernel: BTRFS info (device dm-14): forced readonly Sep 5 14:36:24 Tower kernel: BTRFS: error (device dm-14) in __btrfs_free_extent:6803: errno=-5 IO failure Sep 5 14:36:24 Tower kernel: BTRFS: error (device dm-14) in btrfs_run_delayed_refs:2935: errno=-5 IO failure Sep 5 14:36:24 Tower kernel: BTRFS warning (device dm-14): Skipping commit of aborted transaction. Sep 5 14:36:24 Tower kernel: BTRFS: error (device dm-14) in cleanup_transaction:1846: errno=-5 IO failure Sep 5 14:36:24 Tower kernel: BTRFS info (device dm-14): delayed_refs has NO entry Sep 5 14:36:32 Tower kernel: btrfs_dev_stat_print_on_error: 641 callbacks suppressed Sep 5 14:36:32 Tower kernel: BTRFS error (device dm-14): bdev /dev/mapper/ssd errs: wr 385, rd 268, flush 0, corrupt 0, gen 0 Sep 5 14:36:32 Tower kernel: BTRFS error (device dm-14): bdev /dev/mapper/ssd errs: wr 385, rd 269, flush 0, corrupt 0, gen 0 Sep 5 14:36:32 Tower kernel: BTRFS error (device dm-14): bdev /dev/mapper/ssd errs: wr 385, rd 270, flush 0, corrupt 0, gen 0 Sep 5 14:36:32 Tower kernel: BTRFS error (device dm-14): bdev /dev/mapper/ssd errs: wr 385, rd 271, flush 0, corrupt 0, gen 0 Sep 5 14:36:32 Tower kernel: BTRFS error (device dm-14): bdev /dev/mapper/ssd errs: wr 385, rd 272, flush 0, corrupt 0, gen 0 Sep 5 14:36:32 Tower kernel: BTRFS error (device dm-14): bdev /dev/mapper/ssd errs: wr 385, rd 273, flush 0, corrupt 0, gen 0 Sep 5 14:36:32 Tower kernel: BTRFS error (device dm-14): bdev /dev/mapper/ssd errs: wr 385, rd 274, flush 0, corrupt 0, gen 0 Sep 5 14:36:32 Tower kernel: BTRFS error (device dm-14): bdev /dev/mapper/ssd errs: wr 385, rd 275, flush 0, corrupt 0, gen 0 Sep 5 14:36:32 Tower kernel: BTRFS error (device dm-14): bdev /dev/mapper/ssd errs: wr 385, rd 276, flush 0, corrupt 0, gen 0 Sep 5 14:36:32 Tower kernel: BTRFS error (device dm-14): bdev /dev/mapper/ssd errs: wr 385, rd 277, flush 0, corrupt 0, gen 0 Edited December 25, 20205 yr by golli53
September 6, 20196 yr Community Expert 4 hours ago, golli53 said: Sep 5 14:33:22 Tower kernel: nvme nvme0: I/O 408 QID 9 timeout, aborting Sep 5 14:33:23 Tower kernel: nvme nvme0: I/O 409 QID 9 timeout, aborting Sep 5 14:33:23 Tower kernel: nvme nvme0: I/O 410 QID 9 timeout, aborting Sep 5 14:33:23 Tower kernel: nvme nvme0: I/O 411 QID 9 timeout, aborting Sep 5 14:33:23 Tower kernel: nvme nvme0: I/O 737 QID 7 timeout, aborting Sep 5 14:33:23 Tower kernel: nvme nvme0: I/O 738 QID 7 timeout, aborting Sep 5 14:33:23 Tower kernel: nvme nvme0: I/O 739 QID 7 timeout, aborting Sep 5 14:33:23 Tower kernel: nvme nvme0: I/O 198 QID 8 timeout, aborting Sep 5 14:33:52 Tower kernel: nvme nvme0: I/O 408 QID 9 timeout, reset controller Sep 5 14:34:23 Tower kernel: nvme nvme0: I/O 11 QID 0 timeout, reset controller Sep 5 14:35:23 Tower kernel: nvme nvme0: Device not ready; aborting reset This is a hardware problem, rebooting/power cycling should bring the device back online. Edited September 6, 20196 yr by johnnie.black
September 6, 20196 yr Author 6 hours ago, johnnie.black said: This is a hardware problem, rebooting/power cycling should bring the device back online. Thanks- I rebooted and the drive no longer shows up under Unassigned Devices. I guess the drive (or mobo controller) failed. What would you recommend as the safest way to recover the data from the other (hopefully) still good drive in the RAID1 btrfs array?
September 6, 20196 yr Community Expert There are some recovery options in the FAQ, thought the pool should mount with just the single device, unless it was created on v6.7+, because of a bug, and if that's the case recovery options on the FAQ won't help much either. Edited September 6, 20196 yr by johnnie.black
September 6, 20196 yr Author 29 minutes ago, johnnie.black said: There are some recovery options in the FAQ, thought the pool should mount with just the single device, unless it was created on v6.7+, because of a bug, and if that's the case recovery options on the FAQ won't help much either. Oh shoot - I didn't see this bug. I do think it was created 6.7+. On second reboot, the bad drive (nvme0n1, ssd) re-appeared in unRAID, but can't mount the pool (hangs). Then tried mounting with degraded using the good drive (nvme1n1, ssd2) - see below. Am I toast? login as: root Linux 4.19.56-Unraid. root@Tower:~# /usr/sbin/cryptsetup luksOpen /dev/nvme1n1p1 ssd2 --allow-discards --key-file /root/keyfile root@Tower:~# mkdir /mnt/disks/ssd root@Tower:~# /sbin/mount -o usebackuproot,ro '/dev/mapper/ssd' '/mnt/disks/ssd' ^C root@Tower:~# /sbin/mount -o degraded,usebackuproot,ro '/dev/mapper/ssd2' '/mnt/disks/ssd' mount: /mnt/disks/ssd: wrong fs type, bad option, bad superblock on /dev/mapper/ssd2, missing codepage or helper program, or other error. Edited September 6, 20196 yr by golli53
September 6, 20196 yr Community Expert If both drives are accessible you can try the recovery options above, you just need to try and mount any one device and the other will mount together (if corruption isn't very serious), failing that try btrfs restore, if neither option works I'm afraid not much help I can give.
September 6, 20196 yr Author 1 minute ago, johnnie.black said: If both drives are accessible you can try the recovery options above, you just need to try and mount any one device and the other will mount together (if corruption isn't very serious), failing that try btrfs restore, if neither option works I'm afraid not much help I can give. Got it - is the error above when trying degraded mode ("wrong fs type, bad option, bad superblock on /dev/mapper/ssd2, missing codepage or helper program, or other error.") due to the bug that you referenced? Mounting as a pool unfortunately hangs forever
September 6, 20196 yr Community Expert 2 minutes ago, golli53 said: wrong fs type, bad option, bad superblock on /dev/mapper/ssd2 This means no superblock is detected, but you might be using the wrong device, your did you get ssd2 from? Device should be /dev/mapper/nvme1n1p1
September 6, 20196 yr Author 1 minute ago, johnnie.black said: This means no superblock is detected, but you might be using the wrong device, your did you get ssd2 from? Device should be /dev/mapper/nvme1n1p1 The ssd2 is from running: /usr/sbin/cryptsetup luksOpen /dev/nvme1n1p1 ssd2 --allow-discards --key-file /root/keyfile
September 6, 20196 yr Community Expert Yeah, I saw that after posting, that means the superblock was destroyed on the other device, this would happen if you tried starting the array with just the other device.
September 6, 20196 yr Community Expert You can try this to see if you can recover from a backup superblock: https://www.mankier.com/8/btrfs-select-super
Archived
This topic is now archived and is closed to further replies.