realies Posted July 23, 2019 Share Posted July 23, 2019 Disk 2 went to "Device is disabled, contents emulated" and although it lists the btrfs filesystem, it says 'Unmountable: No file system'. Checking the filesystem status results in 'checksum verify failed on 1312205111296 found 7723BE85 wanted 5C832EA0' and 'bad tree block 1312205111296, bytenr mismatch, want=1312205111296, have=15350558693598887749'. It's a very old drive and it might be failing. The error occurred when trying to run a VM which failed, the system locked and libvirt is located on the same drive. What would be the best steps forward for recovering the drive or the data on it? Quote Link to comment
JorgeB Posted July 24, 2019 Share Posted July 24, 2019 First check that the disabled disks is really failed, it might have been a connection problem, diags might show some clues. If the disk really failed see here for some recovery options to try. Quote Link to comment
realies Posted July 24, 2019 Author Share Posted July 24, 2019 Reconnecting the drive does not help. Attaching diagnostics. helion-diagnostics-20190724-1141.zip Quote Link to comment
JorgeB Posted July 24, 2019 Share Posted July 24, 2019 Just reconnecting won't change a thing, but the disk looks healthy, just a lot of CRC errors which usually indicate a cable problem, unassign the disks and see if it mounts with UD. Quote Link to comment
realies Posted July 24, 2019 Author Share Posted July 24, 2019 (edited) Jul 24 15:06:30 helion unassigned.devices: Adding disk '/dev/sdd1'... Jul 24 15:06:30 helion unassigned.devices: Mount drive command: /sbin/mount -t btrfs -o auto,async,noatime,nodiratime '/dev/sdd1' '/mnt/disks/WDC_WD1001FALS-XXX_WD-XXX' Jul 24 15:06:30 helion kernel: BTRFS info (device sdd1): disk space caching is enabled Jul 24 15:06:30 helion kernel: BTRFS info (device sdd1): has skinny extents Jul 24 15:06:33 helion kernel: BTRFS error (device sdd1): bad tree block start, want 1312205111296 have 1312204832768 Jul 24 15:06:33 helion kernel: BTRFS warning (device sdd1): failed to read log tree Jul 24 15:06:33 helion kernel: BTRFS error (device sdd1): open_ctree failed Jul 24 15:06:33 helion unassigned.devices: Mount of '/dev/sdd1' failed. Error message: mount: /mnt/disks/WDC_WD1001FALS-XXX_WD-XXX: can't read superblock on /dev/sdd1. Jul 24 15:06:33 helion unassigned.devices: Partition 'WDC_WD1001FALS-XXX_WD-XXX' could not be mounted... Edited July 24, 2019 by realies Quote Link to comment
JorgeB Posted July 24, 2019 Share Posted July 24, 2019 So the actual disk also has filesystem corruption, then see the linked recovery options above. Quote Link to comment
realies Posted July 24, 2019 Author Share Posted July 24, 2019 Will try to recover the data to another disk from the array. It says that data from the failed disk is emulated but I'm not seeing the shares, is that to be expected? Quote Link to comment
realies Posted July 24, 2019 Author Share Posted July 24, 2019 I've also got these suggestions, which I am not sure would interfere with what unRAID expects, @limetech might have an idea? Quote since you can reconstruct the data I'd just mkfs the fs and reconstruct it, and convert metadata profile to DUP on those discs with "btrfs balance start -mconvert=dup <mountpoint>" so space usage for metadata on the devices will double, but since it's 5 GiB, it'll become 10 GiB but btrfs will be much more robust to such failures and will be able to fix it automatically Quote Link to comment
JorgeB Posted July 24, 2019 Share Posted July 24, 2019 The metadata should be using the DUP profile, I've been requesting that for some time and it was finally added on v6.7, but for new filesystems only, older ones need to be converted manually, you should do that for any exiting single profile metadata btrfs filesytem, though can't see how that helps for current problem, only for future ones. Quote Link to comment
realies Posted July 24, 2019 Author Share Posted July 24, 2019 (edited) Good to know, will convert them to DUP once everything becomes operational. Is it normal that 'data is emulated' and not visible? Can't see any shares or data that was on the problematic drive. Edited July 24, 2019 by realies Quote Link to comment
JorgeB Posted July 24, 2019 Share Posted July 24, 2019 Data on the emulated disk isn't visible because the filesystem is corrupt, and since the filesystem is also corrupt on the actual disk it's not surprising. Quote Link to comment
realies Posted July 24, 2019 Author Share Posted July 24, 2019 (edited) Sounds like recovery is not as straight forward as replacing the drive with a new one and rebuilding it from parity. Edited July 24, 2019 by realies Quote Link to comment
JorgeB Posted July 24, 2019 Share Posted July 24, 2019 Parity helps with a failed disk, can't help with filesystem corruption, original disk doesn't appear to have failed, problem was likely caused by a bad connection, and that's likely what also corrupted the filesystem. Quote Link to comment
realies Posted July 24, 2019 Author Share Posted July 24, 2019 (edited) Only shares from the failed disk are not visible which makes me think the filesystem is not corrupted. I remember that i have killed wget that was downloading virtio iso drivers and that caused cd /mnt/user to return 'Transport endpoint is not connected'. Rebooting the system made everything work, but later the disk wget was downloading the virtio drivers died. Getting other suggestions from the #btrfs channel that it could also be a kernel bug considering the above. Edited July 24, 2019 by realies Quote Link to comment
JorgeB Posted July 24, 2019 Share Posted July 24, 2019 The disk doesn't mount because the filesystem is corrupted, there's no doubt about that, existing data on the other disks can always be accessed at /mnt/diskX, if they're not accessible at /mnt/user. Quote Link to comment
realies Posted July 24, 2019 Author Share Posted July 24, 2019 (edited) Is it worth recovering the failed disk filesystem instead of wiping the drive and restoring the data via parity? Or because the filesystem is corrupted parity is also corrupted? Edited July 24, 2019 by realies Quote Link to comment
JorgeB Posted July 24, 2019 Share Posted July 24, 2019 What you get with the emulated disk is the same you'll get after a rebuild. Quote Link to comment
realies Posted July 24, 2019 Author Share Posted July 24, 2019 (edited) Sounds like the data is lost. Is 'btrfs restore' the only hope? Why is the emulated disk not showing the missing data? Is parity corrupted, e.g. two device corruption? Edited July 24, 2019 by realies Quote Link to comment
JorgeB Posted July 24, 2019 Share Posted July 24, 2019 19 minutes ago, realies said: Is 'btrfs restore' the only hope? Most likely, there could be more advanced recovery options but you'd need to ask for help on the btrfs mailing list. 20 minutes ago, realies said: Why is the emulated disk not showing the missing data? Like already mentioned, the emulated disk emulates the missing disk, but this includes emulating any existing filesystem corruption, parity can't help with that. Quote Link to comment
realies Posted July 24, 2019 Author Share Posted July 24, 2019 Stopping the array, un-assigning the failed disk and starting the array a few times made unRAID think that re-assinging the same disk is a replacement disk... 🤦 Quote Link to comment
JorgeB Posted July 24, 2019 Share Posted July 24, 2019 That's normal behavior, it's what you do to re-enable a disk. Quote Link to comment
realies Posted July 25, 2019 Author Share Posted July 25, 2019 While 'btrfs restore -v /dev/sdX1 /mnt/disk2/restore' managed to recover most of the data, it seems 'btrfs check --repair /dev/sdX1' managed to restore everything. Yet to validate for any data corruption, so far it looks all good. Many thanks @johnnie.black! Quote Link to comment
JorgeB Posted July 25, 2019 Share Posted July 25, 2019 19 minutes ago, realies said: it seems 'btrfs check --repair /dev/sdX1' managed to restore everything. That's good to hear, btrfs fsck is constantly being improved, but still good to try other tools first as it can still make things worse. As for data integrity you can just run a scrub, and don't forget to convert metadata do DUP. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.