tuxbass Posted August 7, 2020 Share Posted August 7, 2020 (edited) Think I'm finally paying the karma debt of going through my life with no drive errors. After replacing a failed data drive , the party drive started showing error. Now I'm left a bit puzzled as to what the state of the array is. A read error notification popped up at one point, but drive rebuild seemed to continue. Now the rebuild should be complete, but the rebuilt drive is in an unmountable state ("Unmountable: No file system"). Tried running extended SMART test on the parity drive, but it stopped running it rather quickly after hitting the read error; certainly didn't run the expected 5+h that's expected for slow 4TB. As a sidenote, the "SMART error log" option in main -> <parity disk>-> self-test reports "No Errors Logged", while "Last SMART test result" at the bottom of the same page states "Errors occurred - Check SMART report" - is this a possible issue with UI? I'm plenty versions behind though, so hard to verify if it's a fixed issue or not. By the looks of it parity drive needs replacing as well. But as mentioned, I'm unclear how to proceed as the state of array is rather questionable. Diag & smart report of the parity drive attached. Excerpt from smart report: Error 1 [0] occurred at disk power-on lifetime: 14243 hours (593 days + 11 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 05 40 00 00 00 b3 8e e8 e0 00 Error: UNC 1344 sectors at LBA = 0x00b38ee8 = 11767528 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 25 00 00 05 40 00 00 00 b3 8b c8 e0 08 1d+20:00:11.763 READ DMA EXT 25 00 00 01 80 00 00 00 b3 8a 48 e0 08 1d+20:00:11.754 READ DMA EXT 25 00 00 05 40 00 00 00 b3 85 08 e0 08 1d+20:00:11.740 READ DMA EXT 25 00 00 05 40 00 00 00 b3 7f c8 e0 08 1d+20:00:11.346 READ DMA EXT 25 00 00 05 40 00 00 00 b3 7a 88 e0 08 1d+20:00:11.344 READ DMA EXT SMART Extended Self-test Log Version: 1 (1 sectors) Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 90% 14258 11767528 View of array drives: here tower-diagnostics-20200807-1529.zip tower-smart-20200807-0930.zip Edited August 7, 2020 by tuxbass Quote Link to comment
JorgeB Posted August 7, 2020 Share Posted August 7, 2020 41 minutes ago, tuxbass said: but drive rebuild seemed to continue It does, but any read read error on another device during a rebuild will result in a corrupt rebuilt disk, you an still tun xfs_repair on the disk, depending of where and how much corruption there is it might still have some (most) data. Quote Link to comment
tuxbass Posted August 7, 2020 Author Share Posted August 7, 2020 1 minute ago, johnnie.black said: you can still tun xfs_repair on the disk The filesystem is in btrfs, any alternative for that? Quote Link to comment
JorgeB Posted August 7, 2020 Share Posted August 7, 2020 Yes, though btrfs is not as forgiving, but there are some recovery options here: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=543490 On the plus side, if you can get it to mount you can know whitch files are corrupt by doing a scrub, you can also try btrrfs rescue (also on the link) but that won't check for corruption. Quote Link to comment
tuxbass Posted August 7, 2020 Author Share Posted August 7, 2020 (edited) Did you mean trying to mount the rebuilt data-drive? If so, then no dice: root@Tower:/tmp# mount -o usebackuproot,ro /dev/sde1 /tmp/x mount: /tmp/x: wrong fs type, bad option, bad superblock on /dev/sde1, missing codepage or helper program, or other error. If you meant parity drive (ie the one w/ read errors), then i'm not sure how that could even work, as it contains the xor'd bits, not files per se. Given the drive doesn't mount, it's likely safe to say the rebuild failed right? Any idea where that would be stated? No notification showed it, and can't find statement about it in logs either. Edited August 7, 2020 by tuxbass Quote Link to comment
JorgeB Posted August 7, 2020 Share Posted August 7, 2020 6 minutes ago, tuxbass said: Did you mean trying to mount the rebuilt data-drive? Yes, you can still try btrfs restore. 6 minutes ago, tuxbass said: it's likely safe to say the rebuild failed right? Rebuild is corrupt, no doubt about that, just by how much, but note that the disk was already unmountable before the read errors on parity, so parity wasn't already 100% valid, possibly due to previous errors. Quote Link to comment
tuxbass Posted August 7, 2020 Author Share Posted August 7, 2020 Cheers. First restore (without the -i flag) attempt: root@Tower:/dev# btrfs restore -v /dev/md2 /mnt/disk1/disk2-restore/ No mapping for 847315304448-847315320832 Couldn't map the block 847315304448 Couldn't map the block 847315304448 bad tree block 847315304448, bytenr mismatch, want=847315304448, have=0 Couldn't read tree root Could not open root, trying backup super No mapping for 847315304448-847315320832 Couldn't map the block 847315304448 Couldn't map the block 847315304448 bad tree block 847315304448, bytenr mismatch, want=847315304448, have=0 Couldn't read tree root Could not open root, trying backup super No mapping for 847315304448-847315320832 Couldn't map the block 847315304448 Couldn't map the block 847315304448 bad tree block 847315304448, bytenr mismatch, want=847315304448, have=0 Couldn't read tree root Could not open root, trying backup super ...followed by including -i flag: root@Tower:/mnt/disk1# btrfs restore -vi /dev/md2 /mnt/disk1/disk2-restore/ No mapping for 847315304448-847315320832 Couldn't map the block 847315304448 Couldn't map the block 847315304448 bad tree block 847315304448, bytenr mismatch, want=847315304448, have=0 Couldn't read tree root Could not open root, trying backup super No mapping for 847315304448-847315320832 Couldn't map the block 847315304448 Couldn't map the block 847315304448 bad tree block 847315304448, bytenr mismatch, want=847315304448, have=0 Couldn't read tree root Could not open root, trying backup super No mapping for 847315304448-847315320832 Couldn't map the block 847315304448 Couldn't map the block 847315304448 bad tree block 847315304448, bytenr mismatch, want=847315304448, have=0 Couldn't read tree root Could not open root, trying backup super Guess it's time to throw parity drive away and try to restore drive 2 contents from backups, as close as possible? Quote Link to comment
JorgeB Posted August 7, 2020 Share Posted August 7, 2020 Most likely best bet. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.