read error on parity while rebuilding data drive


Recommended Posts

Think I'm finally paying the karma debt of going through my life with no drive errors.

 

After replacing a failed data drive , the party drive started showing error. Now I'm left a bit puzzled as to what the state of the array is.
A read error notification popped up at one point, but drive rebuild seemed to continue. Now the rebuild should be complete, but the rebuilt drive is in an unmountable state ("Unmountable: No file system").

Tried running extended SMART test on the parity drive, but it stopped running it rather quickly after hitting the read error; certainly didn't run the expected 5+h that's expected for slow 4TB.

 

As a sidenote, the "SMART error log" option in main -> <parity disk>-> self-test reports "No Errors Logged", while "Last SMART test result" at the bottom of the same page states "Errors occurred - Check SMART report" - is this a possible issue with UI? I'm plenty versions behind though, so hard to verify if it's a fixed issue or not.

 

By the looks of it parity drive needs replacing as well. But as mentioned, I'm unclear how to proceed as the state of array is rather questionable.

 

Diag & smart report of the parity drive attached.

Excerpt from smart report:

Error 1 [0] occurred at disk power-on lifetime: 14243 hours (593 days + 11 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 05 40 00 00 00 b3 8e e8 e0 00  Error: UNC 1344 sectors at LBA = 0x00b38ee8 = 11767528

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  25 00 00 05 40 00 00 00 b3 8b c8 e0 08  1d+20:00:11.763  READ DMA EXT
  25 00 00 01 80 00 00 00 b3 8a 48 e0 08  1d+20:00:11.754  READ DMA EXT
  25 00 00 05 40 00 00 00 b3 85 08 e0 08  1d+20:00:11.740  READ DMA EXT
  25 00 00 05 40 00 00 00 b3 7f c8 e0 08  1d+20:00:11.346  READ DMA EXT
  25 00 00 05 40 00 00 00 b3 7a 88 e0 08  1d+20:00:11.344  READ DMA EXT

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%     14258         11767528

View of array drives: here

 

 

tower-diagnostics-20200807-1529.zip tower-smart-20200807-0930.zip

Edited by tuxbass
Link to comment
41 minutes ago, tuxbass said:

but drive rebuild seemed to continue

It does, but any read read error on another device during a rebuild will result in a corrupt rebuilt disk, you an still tun xfs_repair on the disk, depending of where and how much corruption there is it might still have some (most) data.

 

 

Link to comment

Did you mean trying to mount the rebuilt data-drive? If so, then no dice:

root@Tower:/tmp# mount -o usebackuproot,ro /dev/sde1 /tmp/x
mount: /tmp/x: wrong fs type, bad option, bad superblock on /dev/sde1, missing codepage or helper program, or other error.

If you meant parity drive (ie the one w/ read errors), then i'm not sure how that could even work, as it contains the xor'd bits, not files per se.

 

Given the drive doesn't mount, it's likely safe to say the rebuild failed right? Any idea where that would be stated? No notification showed it, and can't find statement about it in logs either.

Edited by tuxbass
Link to comment
6 minutes ago, tuxbass said:

Did you mean trying to mount the rebuilt data-drive?

Yes, you can still try btrfs restore.

 

6 minutes ago, tuxbass said:

it's likely safe to say the rebuild failed right?

Rebuild is corrupt, no doubt about that, just by how much, but note that the disk was already unmountable before the read errors on parity, so parity wasn't already 100% valid, possibly due to previous errors.

Link to comment

Cheers.

First restore (without the -i flag) attempt:

root@Tower:/dev# btrfs restore -v /dev/md2 /mnt/disk1/disk2-restore/
No mapping for 847315304448-847315320832
Couldn't map the block 847315304448
Couldn't map the block 847315304448
bad tree block 847315304448, bytenr mismatch, want=847315304448, have=0
Couldn't read tree root
Could not open root, trying backup super
No mapping for 847315304448-847315320832
Couldn't map the block 847315304448
Couldn't map the block 847315304448
bad tree block 847315304448, bytenr mismatch, want=847315304448, have=0
Couldn't read tree root
Could not open root, trying backup super
No mapping for 847315304448-847315320832
Couldn't map the block 847315304448
Couldn't map the block 847315304448
bad tree block 847315304448, bytenr mismatch, want=847315304448, have=0
Couldn't read tree root
Could not open root, trying backup super

...followed by including -i flag:

root@Tower:/mnt/disk1# btrfs restore -vi /dev/md2 /mnt/disk1/disk2-restore/
No mapping for 847315304448-847315320832
Couldn't map the block 847315304448
Couldn't map the block 847315304448
bad tree block 847315304448, bytenr mismatch, want=847315304448, have=0
Couldn't read tree root
Could not open root, trying backup super
No mapping for 847315304448-847315320832
Couldn't map the block 847315304448
Couldn't map the block 847315304448
bad tree block 847315304448, bytenr mismatch, want=847315304448, have=0
Couldn't read tree root
Could not open root, trying backup super
No mapping for 847315304448-847315320832
Couldn't map the block 847315304448
Couldn't map the block 847315304448
bad tree block 847315304448, bytenr mismatch, want=847315304448, have=0
Couldn't read tree root
Could not open root, trying backup super

Guess it's time to throw parity drive away and try to restore drive 2 contents from backups, as close as possible?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.