golli53 Posted September 6, 2019 Share Posted September 6, 2019 (edited) These NVME drives were mounted as unassigned devices in RAID1 and were running my Win10 VMs. Not sure whether this was due to a hardware or filesystem issue. The directory on the drive that contained my VMs (that were running at the time and crashed after the errors) shows up as empty now (all VM images missing). Not sure whether it is s afe to disconnect/reconnect the drives or best way to proceed. Would appreciate any advice. [edit] I forced stopped my VM and disabled virtualization in the GUI - I don't know if this caused my VM images to go missing on disk. My highest priority is recovering these images Sep 5 14:33:22 Tower kernel: nvme nvme0: I/O 408 QID 9 timeout, aborting Sep 5 14:33:23 Tower kernel: nvme nvme0: I/O 409 QID 9 timeout, aborting Sep 5 14:33:23 Tower kernel: nvme nvme0: I/O 410 QID 9 timeout, aborting Sep 5 14:33:23 Tower kernel: nvme nvme0: I/O 411 QID 9 timeout, aborting Sep 5 14:33:23 Tower kernel: nvme nvme0: I/O 737 QID 7 timeout, aborting Sep 5 14:33:23 Tower kernel: nvme nvme0: I/O 738 QID 7 timeout, aborting Sep 5 14:33:23 Tower kernel: nvme nvme0: I/O 739 QID 7 timeout, aborting Sep 5 14:33:23 Tower kernel: nvme nvme0: I/O 198 QID 8 timeout, aborting Sep 5 14:33:52 Tower kernel: nvme nvme0: I/O 408 QID 9 timeout, reset controller Sep 5 14:34:23 Tower kernel: nvme nvme0: I/O 11 QID 0 timeout, reset controller Sep 5 14:35:23 Tower kernel: nvme nvme0: Device not ready; aborting reset Sep 5 14:35:23 Tower kernel: nvme nvme0: Abort status: 0x7 Sep 5 14:35:23 Tower kernel: nvme nvme0: Abort status: 0x7 Sep 5 14:35:23 Tower kernel: nvme nvme0: Abort status: 0x7 Sep 5 14:35:23 Tower kernel: nvme nvme0: Abort status: 0x7 Sep 5 14:35:23 Tower kernel: nvme nvme0: Abort status: 0x7 Sep 5 14:35:23 Tower kernel: nvme nvme0: Abort status: 0x7 Sep 5 14:35:23 Tower kernel: nvme nvme0: Abort status: 0x7 Sep 5 14:35:23 Tower kernel: nvme nvme0: Abort status: 0x7 Sep 5 14:35:42 Tower dhcpcd[2016]: br0: failed to renew DHCP, rebinding Sep 5 14:35:42 Tower dhcpcd[2016]: br0: truncated packet (180) from 192.168.10.10 Sep 5 14:35:42 Tower dhcpcd[2016]: br0: truncated packet (180) from 192.168.10.10 Sep 5 14:35:42 Tower dhcpcd[2016]: br0: truncated packet (152) from 192.168.10.10 Sep 5 14:35:42 Tower dhcpcd[2016]: br0: truncated packet (152) from 192.168.10.10 Sep 5 14:35:42 Tower dhcpcd[2016]: br0: truncated packet (132) from 192.168.10.12 Sep 5 14:35:42 Tower dhcpcd[2016]: br0: truncated packet (132) from 192.168.10.12 Sep 5 14:35:42 Tower dhcpcd[2016]: br0: truncated packet (24) from 192.168.10.12 Sep 5 14:35:42 Tower dhcpcd[2016]: br0: truncated packet (24) from 192.168.10.12 Sep 5 14:35:54 Tower kernel: nvme nvme0: Device not ready; aborting reset Sep 5 14:35:54 Tower kernel: nvme nvme0: Removing after probe failure status: -19 Sep 5 14:36:24 Tower kernel: nvme nvme0: Device not ready; aborting reset Sep 5 14:36:24 Tower kernel: BTRFS error (device dm-14): bdev /dev/mapper/ssd errs: wr 1, rd 0, flush 0, corrupt 0, gen 0 Sep 5 14:36:24 Tower kernel: print_req_error: I/O error, dev nvme0n1, sector 1419705920 Sep 5 14:36:24 Tower kernel: print_req_error: I/O error, dev nvme0n1, sector 1426329664 Sep 5 14:36:24 Tower kernel: print_req_error: I/O error, dev nvme0n1, sector 1313322472 Sep 5 14:36:24 Tower kernel: BTRFS error (device dm-14): bdev /dev/mapper/ssd errs: wr 3, rd 0, flush 0, corrupt 0, gen 0 Sep 5 14:36:24 Tower kernel: print_req_error: I/O error, dev nvme0n1, sector 726592960 Sep 5 14:36:24 Tower kernel: BTRFS error (device dm-14): bdev /dev/mapper/ssd errs: wr 4, rd 0, flush 0, corrupt 0, gen 0 Sep 5 14:36:24 Tower kernel: BTRFS error (device dm-14): bdev /dev/mapper/ssd errs: wr 5, rd 0, flush 0, corrupt 0, gen 0 Sep 5 14:36:24 Tower kernel: BTRFS error (device dm-14): bdev /dev/mapper/ssd errs: wr 6, rd 0, flush 0, corrupt 0, gen 0 Sep 5 14:36:24 Tower kernel: print_req_error: I/O error, dev nvme0n1, sector 1416326976 Sep 5 14:36:24 Tower kernel: BTRFS error (device dm-14): bdev /dev/mapper/ssd errs: wr 7, rd 0, flush 0, corrupt 0, gen 0 Sep 5 14:36:24 Tower kernel: print_req_error: I/O error, dev nvme0n1, sector 1426329600 Sep 5 14:36:24 Tower kernel: print_req_error: I/O error, dev nvme0n1, sector 1385254080 Sep 5 14:36:24 Tower kernel: BTRFS error (device dm-14): bdev /dev/mapper/ssd errs: wr 9, rd 0, flush 0, corrupt 0, gen 0 Sep 5 14:36:24 Tower kernel: BTRFS error (device dm-14): bdev /dev/mapper/ssd errs: wr 9, rd 0, flush 0, corrupt 0, gen 0 Sep 5 14:36:24 Tower kernel: BTRFS error (device dm-14): bdev /dev/mapper/ssd errs: wr 10, rd 0, flush 0, corrupt 0, gen 0 Sep 5 14:36:24 Tower kernel: print_req_error: I/O error, dev nvme0n1, sector 1416326896 Sep 5 14:36:24 Tower kernel: print_req_error: I/O error, dev nvme0n1, sector 1419705792 Sep 5 14:36:24 Tower kernel: print_req_error: I/O error, dev nvme0n1, sector 1355724792 Sep 5 14:36:24 Tower kernel: nvme nvme0: failed to set APST feature (-19) Sep 5 14:36:24 Tower kernel: BTRFS: error (device dm-14) in btrfs_run_delayed_refs:2935: errno=-5 IO failure Sep 5 14:36:24 Tower kernel: BTRFS info (device dm-14): forced readonly Sep 5 14:36:24 Tower kernel: BTRFS: error (device dm-14) in __btrfs_free_extent:6803: errno=-5 IO failure Sep 5 14:36:24 Tower kernel: BTRFS: error (device dm-14) in btrfs_run_delayed_refs:2935: errno=-5 IO failure Sep 5 14:36:24 Tower kernel: BTRFS warning (device dm-14): Skipping commit of aborted transaction. Sep 5 14:36:24 Tower kernel: BTRFS: error (device dm-14) in cleanup_transaction:1846: errno=-5 IO failure Sep 5 14:36:24 Tower kernel: BTRFS info (device dm-14): delayed_refs has NO entry Sep 5 14:36:32 Tower kernel: btrfs_dev_stat_print_on_error: 641 callbacks suppressed Sep 5 14:36:32 Tower kernel: BTRFS error (device dm-14): bdev /dev/mapper/ssd errs: wr 385, rd 268, flush 0, corrupt 0, gen 0 Sep 5 14:36:32 Tower kernel: BTRFS error (device dm-14): bdev /dev/mapper/ssd errs: wr 385, rd 269, flush 0, corrupt 0, gen 0 Sep 5 14:36:32 Tower kernel: BTRFS error (device dm-14): bdev /dev/mapper/ssd errs: wr 385, rd 270, flush 0, corrupt 0, gen 0 Sep 5 14:36:32 Tower kernel: BTRFS error (device dm-14): bdev /dev/mapper/ssd errs: wr 385, rd 271, flush 0, corrupt 0, gen 0 Sep 5 14:36:32 Tower kernel: BTRFS error (device dm-14): bdev /dev/mapper/ssd errs: wr 385, rd 272, flush 0, corrupt 0, gen 0 Sep 5 14:36:32 Tower kernel: BTRFS error (device dm-14): bdev /dev/mapper/ssd errs: wr 385, rd 273, flush 0, corrupt 0, gen 0 Sep 5 14:36:32 Tower kernel: BTRFS error (device dm-14): bdev /dev/mapper/ssd errs: wr 385, rd 274, flush 0, corrupt 0, gen 0 Sep 5 14:36:32 Tower kernel: BTRFS error (device dm-14): bdev /dev/mapper/ssd errs: wr 385, rd 275, flush 0, corrupt 0, gen 0 Sep 5 14:36:32 Tower kernel: BTRFS error (device dm-14): bdev /dev/mapper/ssd errs: wr 385, rd 276, flush 0, corrupt 0, gen 0 Sep 5 14:36:32 Tower kernel: BTRFS error (device dm-14): bdev /dev/mapper/ssd errs: wr 385, rd 277, flush 0, corrupt 0, gen 0 Edited December 25, 2020 by golli53 Quote Link to comment
JorgeB Posted September 6, 2019 Share Posted September 6, 2019 (edited) 4 hours ago, golli53 said: Sep 5 14:33:22 Tower kernel: nvme nvme0: I/O 408 QID 9 timeout, aborting Sep 5 14:33:23 Tower kernel: nvme nvme0: I/O 409 QID 9 timeout, aborting Sep 5 14:33:23 Tower kernel: nvme nvme0: I/O 410 QID 9 timeout, aborting Sep 5 14:33:23 Tower kernel: nvme nvme0: I/O 411 QID 9 timeout, aborting Sep 5 14:33:23 Tower kernel: nvme nvme0: I/O 737 QID 7 timeout, aborting Sep 5 14:33:23 Tower kernel: nvme nvme0: I/O 738 QID 7 timeout, aborting Sep 5 14:33:23 Tower kernel: nvme nvme0: I/O 739 QID 7 timeout, aborting Sep 5 14:33:23 Tower kernel: nvme nvme0: I/O 198 QID 8 timeout, aborting Sep 5 14:33:52 Tower kernel: nvme nvme0: I/O 408 QID 9 timeout, reset controller Sep 5 14:34:23 Tower kernel: nvme nvme0: I/O 11 QID 0 timeout, reset controller Sep 5 14:35:23 Tower kernel: nvme nvme0: Device not ready; aborting reset This is a hardware problem, rebooting/power cycling should bring the device back online. Edited September 6, 2019 by johnnie.black Quote Link to comment
golli53 Posted September 6, 2019 Author Share Posted September 6, 2019 6 hours ago, johnnie.black said: This is a hardware problem, rebooting/power cycling should bring the device back online. Thanks- I rebooted and the drive no longer shows up under Unassigned Devices. I guess the drive (or mobo controller) failed. What would you recommend as the safest way to recover the data from the other (hopefully) still good drive in the RAID1 btrfs array? Quote Link to comment
JorgeB Posted September 6, 2019 Share Posted September 6, 2019 (edited) There are some recovery options in the FAQ, thought the pool should mount with just the single device, unless it was created on v6.7+, because of a bug, and if that's the case recovery options on the FAQ won't help much either. Edited September 6, 2019 by johnnie.black Quote Link to comment
golli53 Posted September 6, 2019 Author Share Posted September 6, 2019 (edited) 29 minutes ago, johnnie.black said: There are some recovery options in the FAQ, thought the pool should mount with just the single device, unless it was created on v6.7+, because of a bug, and if that's the case recovery options on the FAQ won't help much either. Oh shoot - I didn't see this bug. I do think it was created 6.7+. On second reboot, the bad drive (nvme0n1, ssd) re-appeared in unRAID, but can't mount the pool (hangs). Then tried mounting with degraded using the good drive (nvme1n1, ssd2) - see below. Am I toast? login as: root Linux 4.19.56-Unraid. root@Tower:~# /usr/sbin/cryptsetup luksOpen /dev/nvme1n1p1 ssd2 --allow-discards --key-file /root/keyfile root@Tower:~# mkdir /mnt/disks/ssd root@Tower:~# /sbin/mount -o usebackuproot,ro '/dev/mapper/ssd' '/mnt/disks/ssd' ^C root@Tower:~# /sbin/mount -o degraded,usebackuproot,ro '/dev/mapper/ssd2' '/mnt/disks/ssd' mount: /mnt/disks/ssd: wrong fs type, bad option, bad superblock on /dev/mapper/ssd2, missing codepage or helper program, or other error. Edited September 6, 2019 by golli53 Quote Link to comment
JorgeB Posted September 6, 2019 Share Posted September 6, 2019 If both drives are accessible you can try the recovery options above, you just need to try and mount any one device and the other will mount together (if corruption isn't very serious), failing that try btrfs restore, if neither option works I'm afraid not much help I can give. Quote Link to comment
golli53 Posted September 6, 2019 Author Share Posted September 6, 2019 1 minute ago, johnnie.black said: If both drives are accessible you can try the recovery options above, you just need to try and mount any one device and the other will mount together (if corruption isn't very serious), failing that try btrfs restore, if neither option works I'm afraid not much help I can give. Got it - is the error above when trying degraded mode ("wrong fs type, bad option, bad superblock on /dev/mapper/ssd2, missing codepage or helper program, or other error.") due to the bug that you referenced? Mounting as a pool unfortunately hangs forever Quote Link to comment
JorgeB Posted September 6, 2019 Share Posted September 6, 2019 2 minutes ago, golli53 said: wrong fs type, bad option, bad superblock on /dev/mapper/ssd2 This means no superblock is detected, but you might be using the wrong device, your did you get ssd2 from? Device should be /dev/mapper/nvme1n1p1 Quote Link to comment
golli53 Posted September 6, 2019 Author Share Posted September 6, 2019 1 minute ago, johnnie.black said: This means no superblock is detected, but you might be using the wrong device, your did you get ssd2 from? Device should be /dev/mapper/nvme1n1p1 The ssd2 is from running: /usr/sbin/cryptsetup luksOpen /dev/nvme1n1p1 ssd2 --allow-discards --key-file /root/keyfile Quote Link to comment
JorgeB Posted September 6, 2019 Share Posted September 6, 2019 Yeah, I saw that after posting, that means the superblock was destroyed on the other device, this would happen if you tried starting the array with just the other device. Quote Link to comment
JorgeB Posted September 6, 2019 Share Posted September 6, 2019 You can try this to see if you can recover from a backup superblock: https://www.mankier.com/8/btrfs-select-super 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.