(6.9.0-beta25) Corrupt BTRFS, rebuild now "Unmountable: No file system"


Recommended Posts

I originally had Disk 1 with BTRFS uncorrectable errors from kernel panics and forced shutdowns. Found and fixed the offending software with updates but the damage was done. (One parity, Three data, 14TB WD DC HC530's) Had got a extra drive to backup the data to (foreshadowing here) first. Backed up the data from command line, error on some temp and couple corrupt files of no value. Confirmed the files on the drive with a ls in a few main dir's, all is good.

Removed the drive from the array config, shut down, now my mistake (I know RTFM, I should have read it again), formatted the drive in unassigned devices, added back to the array and hit start, had a oh *explitivitive deleted* moment, shut it down. Precleared the drive, then added again. Think in hindsight the damage was already done. That's why I made a back up, drive won't mount! I'm not sure how but in the copy process (cp -r) to the brand new drive it also corrupted the BTRFS on that drive also. Now the original source drive is gone and the backup is corrupt, nice! It's damaged beyond BTRFS repair tools capability but recovery looks possible as last resort with UFS Explorer but it looks like it will lose all the dir structure. It looks to be still emulating that disk, any way i can pull just the data of that emulated disk? At this point it almost looks like i got to bite the bullet and buy two 14TB hard drives! I do have seven 4TB hard drives with an older backup on it. The new data set has outgrown the disks size, even then I don't feel comfortable overwriting it with everything else that's happened. I got archival and cloud backup of everything really critical.

I guess my question is what are my options moving forward for data recovery at this point? Thank you.

Link to comment

Wish you had asked for advice earlier, probably would have been better if you had asked before doing anything at all. It's not entirely clear from your description what you had or what you did.

 

A mistake some make is formatting a disk in the array, then expecting parity to rebuild the data that was there before the format. Since parity is updated by the format (and any other write operation in the array) rebuild will just result in a formatted disk.

 

But that isn't what you say you did. Formatting or clearing a disk that is not in the array will not affect parity, but it is pointless since rebuild is just going to completely overwrite it anyway. Rebuilding in that situation would simply result in the same data that was on the disk before. Probably you already had a corrupt filesystem, and so a corrupt filesystem is the expected result of the rebuild.

 

On the other hand, you don't explicitly mention rebuilding anything (except in the title). Instead you say 

47 minutes ago, FryGuy said:

Removed the drive from the array config... formatted the drive in unassigned devices, added back to the array.... Precleared the drive, then added again.

 Did you ever do New Config during any of this?

 

Since you

47 minutes ago, FryGuy said:

got archival and cloud backup of everything really critical

I don't really have anything else to add unless you have something more to add. Maybe UFS Explorer will help, I've never used it.

 

 

 

Link to comment
19 minutes ago, trurl said:
Quote

Wish you had asked for advice earlier, probably would have been better if you had asked before doing anything at all.

I really thought it was going to be a straightforward, remove from array, reformat, re-add, start, and allow to rebuild kind of deal.

 

I

Quote

it's not entirely clear from your description what you had or what you did.

I backed up data to New Disk from the corrupt BTRFS Disk 1 in the server, Disk 1 remove from confg, shutdown, format Disk 1 in unassigned devices, add to array, start array, it didn't say building disk, immediately shutdown array, maybe i got a panic that something was wrong and nothing was. Then ran a preclear on Disk 1, started array and proceed to build drive, drive rebuild failed with "Unmountable: No file system" repeated twice with the same results.

Quote

 

A mistake some make is formatting a disk in the array, then expecting parity to rebuild the data that was there before the format. Since parity is updated by the format (and any other write operation in the array) rebuild will just result in a formatted disk.

 

But that isn't what you say you did. Formatting or clearing a disk that is not in the array will not affect parity, but it is pointless since rebuild is just going to completely overwrite it anyway. Rebuilding in that situation would simply result in the same data that was on the disk before. Probably you already had a corrupt filesystem, and so a corrupt filesystem is the expected result of the rebuild.

 

Then there should be no way that what i did corrupted the parity, the parity was valid before with no errors.

Quote

 

On the other hand, you don't explicitly mention rebuilding anything (except in the title). Instead you say 

 Did you ever do New Config during any of this?

 

I used the wording added it back, it was rebuilding the drive. Sorry for the vague description above, i was trying to keep it short it already seemed so long winded. No new config and i didnt use the format the drive check box either, i know that will definitely mess things up if i did that. New config, especially if you get the drive order wrong if i'm not mistaken.

 

Quote

Maybe UFS Explorer will help, I've never used it.

It does show all the files but it wont preserve any dir structure. Its my last resort if I must.

 

I guess the is no way to backup just the data that is emulated? It don't show up as /mnt/disk1 as its not a mounted disk is there a location that the virtual disk mounts to for the pool?

 If not then I need to do a full tar backup of the whole array to external drives to be sure i have everything on the array?

 

Link to comment
17 hours ago, FryGuy said:

I guess the is no way to backup just the data that is emulated?

What do you mean by the data that is emulated?

 

If you have a missing or disabled data disk, and all other disks including parity are good and parity is in sync, then the data for the missing or disabled disk is emulated from the parity calculation by reading parity plus all other disks. If you have single parity then a single disk can be emualted, if dual parity then 2 disks can be emulated.

 

But your use of the word emulated seems a bit more vague that all that.

 

17 hours ago, FryGuy said:

If not then I need to do a full tar backup of the whole array to external drives to be sure i have everything on the array?

Each data disk in Unraid is an independent filesystem, so each individual disk can be read all by itself on any linux, including Unraid. Unless you think there is a problem with each and every one of your disks, then I don't see any point to backing up the whole array, and if there is a problem with each and every one of your disks, then backing them up may not be possible.

 

You should have backups of anything important and irreplaceable, of course, but you indicated you did. 

 

Really everything you say you did seemed to be wrong or pointless. For example, as mentioned, when rebuilding a disk, the entire disk is going to be overwritten anyway. So preclearing the disk and / or formatting the disk actually is completely pointless.

 

Maybe rebuilding was pointless too. If you had a corrupt filesystem, then rebuilding isn't going to help.

Link to comment

  

1 hour ago, trurl said:
Quote

 

What do you mean by the data that is emulated?

 

If you have a missing or disabled data disk, and all other disks including parity are good and parity is in sync, then the data for the missing or disabled disk is emulated from the parity calculation by reading parity plus all other disks. If you have single parity then a single disk can be emualted, if dual parity then 2 disks can be emulated.

 

But your use of the word emulated seems a bit more vague that all that.

 

If you read my post with my disk layout you wouldn't be debating the semantics of what emulated means. I have one parity disk with three data, all 14 TB.

 

Quote
Quote

Each data disk in Unraid is an independent filesystem, so each individual disk can be read all by itself on any linux, including Unraid. Unless you think there is a problem with each and every one of your disks, then I don't see any point to backing up the whole array, and if there is a problem with each and every one of your disks, then backing them up may not be possible.

Every other disk is fine just can't rebuild from the parity but it can start array and emulate the missing disk. I don't understand how or why this happened that's why i'm here. Originally i was thinking that i did something horribly wrong in the procedure but now i'm reassured that i didn't. The only thing i don't understand is how my new drive backup became corrupt with cp -r  command too? I'm sure i didn't get the drives mixed up with one on a USB bus. 

Quote

You should have backups of anything important and irreplaceable, of course, but you indicated you did. 

 

Quote

Really everything you say you did seemed to be wrong or pointless. For example, as mentioned, when rebuilding a disk, the entire disk is going to be overwritten anyway. So preclearing the disk and / or formatting the disk actually is completely pointless.

I think this is the point of confusion. It was the original disk 1 that had the corrupt BTRFS it was still R/W with 5 unrecoverable errors, it stayed in the same slot the new disk was used to backup its data.  I perhaps wrongly thought it was safer if a rebuild went wrong,

At the time i didn't know how else to get the original disk to appear as a new disk to rebuild.

 

Quote

Maybe rebuilding was pointless too. If you had a corrupt filesystem, then rebuilding isn't going to help.

If a corrupt FS can affect the parity to cause it to not rebuild then i'm *expletive deleted* either way. Nothing I could have done from the start would have solved the problem. I'm an experienced user of linux and seldomly reach out for help, I usually find it so condescending and toxic on forums. This is a sh*t hit the fan moment and i'm not sure what even happened myself.

Link to comment
1 hour ago, FryGuy said:

At the time i didn't know how else to get the original disk to appear as a new disk to rebuild.

For future reference, rebuilding to the same disk is simple

  1. Stop array
  2. Unassign disk to be rebuilt
  3. Start array with disk unassigned
  4. Stop array
  5. Reassign disk to be rebuilt
  6. Start array to begin rebuild

Not sure what happened to your filesystem. Perhaps if you had asked for help sooner someone would have had some better idea how to recover. I have seen mention that btrfs recovery tools are perhaps not as well developed as XFS, for example. I personally don't use btrfs in the parity array, but I know @johnnie.black has worked with it.

 

As mentioned, rebuild seldom fixes filesystem corruption. Since parity updates happen at a very low level, parity usually will be in sync with the low level bits, even if they are a corrupt filesystem. One thing you can try if you find yourself in a similar situation is to start the array with the problem disk unassigned. That will make Unraid emulate the disk from parity, and whatever the result of that emulation is will be the result of the rebuild. So if the emulated disk is still corrupt, rebuild will not help.

 

Often in situations where people actually have a disabled disk, and the emulated disk is corrupt, we will have them repair the filesystem of the emulated disk before actually doing the rebuild.

 

I would have asked for diagnostics at the very beginning so I might have had a clearer idea about your configuration and situation, but it seemed like it was maybe too late for any advice I might have provided.

Link to comment
4 minutes ago, trurl said:

 

Quote

 

For future reference, rebuilding to the same disk is simple

  1. Stop array
  2. Unassigned disk to be rebuilt
  3. Start array with disk unassigned
  4. Stop array
  5. Reassign disk to be rebuilt
  6. Start array to begin rebuilt

 

Other than the un-necessary step of clearing (format first then preclear next try) the disk this is what i had did.

 

 

4 minutes ago, trurl said:

Not sure what happened to your filesystem. Perhaps if you had asked for help sooner someone would have had some better idea how to recover. I have seen mention that btrfs recovery tools are perhaps not as well developed as XFS, for example. I personally don't use btrfs in the parity array, but I know @johnnie.black has worked with it.

 

As mentioned, rebuild seldom fixes filesystem corruption. Since parity updates happen at a very low level, parity usually will be in sync with the low level bits, even if they are a corrupt filesystem. One thing you can try if you find yourself in a similar situation is to start the array with the problem disk unassigned. That will make Unraid emulate the disk from parity, and whatever the result of that emulation is will be the result of the rebuild. So if the emulated disk is still corrupt, rebuild will not help.

The array will start with the emulated disk and everything is green. I haven't dared to try VMs or Dockers to see if everything is there to risk the disk writes. Without running some things like Plex i'm not sure if anything missing.

 

4 minutes ago, trurl said:
Quote

Often in situations where people actually have a disabled disk, and the emulated disk is corrupt, we will have them repair the filesystem of the emulated disk before actually doing the rebuild.

The disk isn't showing as disabled, its marked as green.

 

4 minutes ago, trurl said:

I would have asked for diagnostics at the very beginning so I might have had a clearer idea about your configuration and situation, but it seemed like it was maybe too late for any advice I might have provided.

The original issue with the server that was causing the kernel panic problem was solved with updates. I don't think i have the logs of the issue anymore as it was resolved. Ran scrub / repair on disks but by times updates fixed the issue the damage was done. It should have been a trivial fix to just backup rebuild a drive. Turned out to escalate into a situation.  

 

Now how do i figure out if my emulate disk is corrupt? I'll get the two more disks (assuming I overwrite the corrupt backup I got of the emulated disk, would rather not but $$$) to pull the whole array data off, i should have a current backup set on the shelf anyway. This is the only way I see to proceed forward with recovery if the emulated disk is good.

Link to comment
4 minutes ago, trurl said:

This is a bit confusing and makes me question your use of "emulated" again. Maybe a screenshot would clarify.

I don't know what else you would call it when the array is running with the parity "emulating" the missing disk as part of the array. Clear english definition usage of the word to me. 

Screenshot_20200801_174846.png

Link to comment
4 minutes ago, FryGuy said:

I don't know what else you would call it when the array is running with the parity "emulating" the missing disk as part of the array.

I don't see a missing disk in that screenshot, nor do I see a disabled disk. A missing or disabled disk would not be green, it would have a red X instead.

 

So, parity is not emulating anything. You just have an unmountable disk, which usually means a corrupted filesystem.

Link to comment
1 minute ago, trurl said:

I don't see a missing disk in that screenshot, nor do I see a disabled disk. A missing or disabled disk would not be green, it would have a red X instead.

 

So, parity is not emulating anything. You just have an unmountable disk, which usually means a corrupted filesystem.

Ok that confirms my worst fears, hard to visually check and confirm every file/dir with such a large pool. That's exactly why I didn't want to start anything up and mess up my cache and database. Looks like it's going to be disaster recovery after all. I tested with UFS Explorer it looks like it will recover everything. Do you know of any other free BTRFS recovery tools (or i'll just web search)? It doesn't preserve dir structure but at least i'll get my files back from that corrupt backup. Thank you, terminology confusion and all.

Link to comment
7 hours ago, johnnie.black said:

I dint' read the complete thread but I see disk1 is now formatted xfs, so you won't be able to use any btrfs recovery tools, you might still be able to get something with UFS explorer.

It's another disk I have in there now. The corrupt drive is out in a USB dock for recovery. I have more disks arriving for future backup. I'm going to use to move the array to xfs disks. The damaged file system disk is only disaster recovery now, worse case I re-rip source material as I find out what has been lost. I may be able to at least use UFS Explorer to find out was has been lost, as it is listing all the file names and sizes. I'm hopeful that it will recover the files too but it don't appear to preserve the dir structure, oh well. Critical stuff I have backups of on disks and cloud. I now think I have errors on my other disks causing the system to freeze during parity check. I run a scrub last night to check the disks and forgot to turn off the scheduled weekly parity check and that froze the system up. Now running scrub again with parity check schedule disabled, by morning i'll know if it worked.

Link to comment
7 minutes ago, FryGuy said:

The other disks are mounting

I'm sorry, so what is the problem exactly? If you want to attempt to recover data form am unmountable btrfs disk I need diags after trying to mount it.

 

8 minutes ago, FryGuy said:

I just don't know why it's crashing still

What is crashing, the server? I though this was about an unmountable disk, again I'm sorry I didn't read the entire thread, so please do a quick summary if there are other issues.

 

Also this is not a good sign:

 

Aug  3 13:26:08 unRAID kernel: BTRFS info (device md2): bdev /dev/md2 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
...
Aug  3 13:26:21 unRAID kernel: BTRFS info (device md3): bdev /dev/md3 errs: wr 0, rd 0, flush 0, corrupt 435370, gen 0

Both disks 2 and 3 are showing corruption errors, this is usually data corruption caused by a hardware problem, like bad RAM, it would be a good idea to run memtest.
 

Link to comment
3 minutes ago, johnnie.black said:

I'm sorry, so what is the problem exactly? If you want to attempt to recover data form am unmountable btrfs disk I need diags after trying to mount it.

 

What is crashing, the server? I though this was about an unmountable disk, again I'm sorry I didn't read the entire thread, so please do a quick summary if there are other issues.

 

Also this is not a good sign:

 


Aug  3 13:26:08 unRAID kernel: BTRFS info (device md2): bdev /dev/md2 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
...
Aug  3 13:26:21 unRAID kernel: BTRFS info (device md3): bdev /dev/md3 errs: wr 0, rd 0, flush 0, corrupt 435370, gen 0

Both disks 2 and 3 are showing corruption errors, this is usually data corruption caused by a hardware problem, like bad RAM, it would be a good idea to run memtest.
 

No its not good. I missed that. Shutting down now and doing a memtest.

Link to comment
20 hours ago, FryGuy said:

No its not good. I missed that. Shutting down now and doing a memtest.

Ran two cycles of the latest version of memtest, just to be sure. Nearly 20 hrs testing on the 32GB ram and returned no errors. Any other ideas? I hope you don't say SATA controller, its built on the mobo.  😬

Link to comment
1 hour ago, johnnie.black said:

See here on how to reset current error count and monitor for new ones (you'll need to adjust the path), then post new diags if/when there are more errors (before rebooting).

Set the script to run hourly like described above, reset the error count, now running scrub on the two BTRFS disks. I'll keep and eye out for errors, fingers crossed.

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.