Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Need help: rebuilding Array after controller failure - two array disks seem to have a broken BTRFS file system

Featured Replies

Hello community,

 

I need some help or suggestions how to rebuild/repair an Array with two disks seemingly having a corrupted BTRFS file system.

 

Setup

  • Unraid Array (BTRFS)
  • 2 x Parity Disk 18GB
  • 6 x Array Disk 8GB

 

Current State

  • Parity: OK
  • Disks 2, 4, 5, 6: OK
  • Disk 1 and Disk 3: not mountable, BTRFS Error: superblock checksum mismatch...
  • Array: unsafe, missing disks are emulated

 

What happend

For some time different disks got missing repeatedly. When that happend I managed to repair the array either because I got the missing disk running again or by clearing the disk and rebuilding the array.

When two Array disks were missing at the same time I had enough and stopped all containers to search for the root cause.

Because the missing disks were emulated I made a full backup with rsync first, which went well after the second try.

Then I figured out that the main issue was the 10 Port PCIe SATA controller. So I switched to a proper SAS controller.

The two missing disks are back, boot log doesn't throw any errors. So far so good.

 

Then I assigned the missing disks back to the array, started the array and started the rebuild.

Only after a couple of minutes I saw the warning that the disks ar unmountable. My mistake...

grafik.thumb.png.baf0c6b16eebaa647f6b3b9a0a2b9989.png

 

So I stopped the rebuild/sync.

 

Issue

[...]
Jul  2 12:22:13 GrayBigBerta emhttpd: Mounting disks...
Jul  2 12:22:13 GrayBigBerta emhttpd: mounting /mnt/disk1
Jul  2 12:22:13 GrayBigBerta emhttpd: shcmd (1960): mkdir -p /mnt/disk1
Jul  2 12:22:13 GrayBigBerta emhttpd: /mnt/disk1: no btrfs or device /dev/md1p1 is not single
Jul  2 12:22:13 GrayBigBerta emhttpd: /mnt/disk1 mount error: Unsupported or no file system
Jul  2 12:22:13 GrayBigBerta emhttpd: shcmd (1961): rmdir /mnt/disk1
Jul  2 12:22:13 GrayBigBerta emhttpd: mounting /mnt/disk2
Jul  2 12:22:13 GrayBigBerta emhttpd: shcmd (1962): mkdir -p /mnt/disk2
Jul  2 12:22:14 GrayBigBerta emhttpd: shcmd (1963): mount -t btrfs -o noatime,space_cache=v2 /dev/md2p1 /mnt/disk2
Jul  2 12:22:14 GrayBigBerta kernel: BTRFS info (device md2p1): using crc32c (crc32c-intel) checksum algorithm
Jul  2 12:22:14 GrayBigBerta kernel: BTRFS info (device md2p1): using free space tree
Jul  2 12:22:15 GrayBigBerta kernel: BTRFS info (device md2p1): bdev /dev/md2p1 errs: wr 0, rd 0, flush 0, corrupt 22, gen 0
Jul  2 12:22:25 GrayBigBerta emhttpd: shcmd (1964): btrfs filesystem resize 1:max /mnt/disk2
Jul  2 12:22:25 GrayBigBerta root: Resize device id 1 (/dev/md2p1) from 7.28TiB to max
Jul  2 12:22:25 GrayBigBerta kernel: BTRFS info (device md2p1): resizing devid 1
Jul  2 12:22:25 GrayBigBerta emhttpd: mounting /mnt/disk3
Jul  2 12:22:25 GrayBigBerta emhttpd: shcmd (1965): mkdir -p /mnt/disk3
Jul  2 12:22:26 GrayBigBerta emhttpd: /mnt/disk3: no btrfs or device /dev/md3p1 is not single
Jul  2 12:22:26 GrayBigBerta emhttpd: /mnt/disk3 mount error: Unsupported or no file system
Jul  2 12:22:26 GrayBigBerta emhttpd: shcmd (1966): rmdir /mnt/disk3
[...]

 

Trying to mount the disks as unassigned devices:

Jul  2 12:38:30 GrayBigBerta unassigned.devices: Mounting partition 'sdg1' at mountpoint '/mnt/disks/VRJW879K'...
Jul  2 12:38:30 GrayBigBerta unassigned.devices: Mount cmd: /sbin/mount -t 'btrfs' -o rw,relatime,space_cache=v2 '/dev/sdg1' '/mnt/disks/VRJW879K'
Jul  2 12:38:30 GrayBigBerta kernel: BTRFS: device fsid d92a06ea-1eb0-4fd1-8aa3-47e0d921bdd8 devid 1 transid 29956 /dev/sdg1 scanned by mount (4220)
Jul  2 12:38:30 GrayBigBerta kernel: BTRFS info (device sdg1): using crc32c (crc32c-intel) checksum algorithm
Jul  2 12:38:30 GrayBigBerta kernel: BTRFS error (device sdg1): superblock checksum mismatch
Jul  2 12:38:30 GrayBigBerta kernel: BTRFS error (device sdg1): open_ctree failed
Jul  2 12:38:32 GrayBigBerta unassigned.devices: Mount of 'sdg1' failed: 'mount: /mnt/disks/VRJW879K: wrong fs type, bad option, bad superblock on /dev/sdg1, missing codepage or helper program, or other error.        dmesg(1) may have more information after failed mount system call. '
Jul  2 12:38:32 GrayBigBerta unassigned.devices: Partition 'VRJW879K' cannot be mounted.

 

My own idea

Because the array is still functional due to the emulated disks I had the idea to simply format the two disks and assign them as "new" disks to the array.

I did that with single disks before but not with two at the same time, so I'm not sure if that is a good idea.

 

 

Is ther a way to repair the filesystem? If yes, then I'd kindly ask for help.

Note: the rebuild was running for ten minutes or so before I canceled it, don't know if this makes any difference.

 

Or should I go with my idea?

 

 

Thank you very much in advance!

 

 

 

graybigberta-diagnostics-20240702-1251.zip

24 minutes ago, fusselnerd said:

Because the array is still functional due to the emulated disks I had the idea to simply format the two disks and assign them as "new" disks to the array.

Don't do this, formatting disks is never a solution when trying to recover data.

 

Post the output of

btrfs fi show

 

  • Author

Thank you for your fast response.

 

3 minutes ago, JorgeB said:

Don't do this, formatting disks is never a solution when trying to recover data.

Ok, I keep that in mind.

 

 

Here's the requested output:

root@GrayBigBerta:~# btrfs fi show
ERROR: superblock checksum mismatch
ERROR: cannot scan /dev/sdb1: Input/output error
ERROR: superblock checksum mismatch
ERROR: cannot scan /dev/sdg1: Input/output error
Label: none  uuid: af3b39b5-c791-4ea4-880d-fc1ad26cfc2d
        Total devices 1 FS bytes used 3.64TiB
        devid    1 size 7.28TiB used 3.72TiB path /dev/sdf1

Label: none  uuid: 39a4d42e-8ae1-436c-ae74-488cb24183bb
        Total devices 1 FS bytes used 685.45MiB
        devid    1 size 465.76GiB used 4.02GiB path /dev/nvme0n1p1

Label: none  uuid: ffae4078-e89d-4329-b2b9-bdd13773a8ec
        Total devices 1 FS bytes used 3.64TiB
        devid    1 size 7.28TiB used 3.71TiB path /dev/sdd1

Label: none  uuid: c40c0298-85ed-4130-aa76-cdacac9ccfa5
        Total devices 1 FS bytes used 120.40GiB
        devid    1 size 465.76GiB used 177.02GiB path /dev/sdm1

Label: none  uuid: 5f5f56e8-f435-4b81-9042-8cccd1fb7f8e
        Total devices 1 FS bytes used 76.42GiB
        devid    2 size 223.58GiB used 78.03GiB path /dev/sdi1

Label: none  uuid: d326d8d7-9da5-4d55-b3ad-43541260b369
        Total devices 1 FS bytes used 144.00KiB
        devid    1 size 931.51GiB used 3.02GiB path /dev/nvme2n1p1

Label: none  uuid: cf55c94a-4fd3-4030-a415-1d96a475aa3c
        Total devices 1 FS bytes used 5.40TiB
        devid    1 size 7.28TiB used 5.47TiB path /dev/sde1

Label: none  uuid: a93250e3-43bc-41c9-adbc-76ac0b3b0b16
        Total devices 1 FS bytes used 46.88MiB
        devid    1 size 111.79GiB used 3.02GiB path /dev/sdl1

Label: none  uuid: 2e238485-e144-4d1f-aa1a-13097d3a3e99
        Total devices 1 FS bytes used 66.90GiB
        devid    1 size 232.88GiB used 83.02GiB path /dev/nvme1n1p1

Label: none  uuid: 05612964-8ba0-475a-b544-e716f5a03167
        Total devices 1 FS bytes used 196.00KiB
        devid    1 size 465.76GiB used 5.02GiB path /dev/sdj1

Label: none  uuid: b43c9020-8d0b-4e0e-a3c3-39ec11f9e096
        Total devices 1 FS bytes used 3.64TiB
        devid    1 size 7.28TiB used 3.70TiB path /dev/sdh1

 

With the array stopped, type in the CLI:

 

echo 1 > /sys/block/sdb/device/delete

 

Wait 5 seconds and again output of:

 

btrfs fi show

 

  • Author

Here's the output:

 

root@GrayBigBerta:~# echo 1 > /sys/block/sdb/device/delete
root@GrayBigBerta:~# btrfs fi show
ERROR: superblock checksum mismatch
ERROR: cannot scan /dev/sdg1: Input/output error
Label: none  uuid: af3b39b5-c791-4ea4-880d-fc1ad26cfc2d
        Total devices 1 FS bytes used 3.64TiB
        devid    1 size 7.28TiB used 3.72TiB path /dev/sdf1

Label: none  uuid: 39a4d42e-8ae1-436c-ae74-488cb24183bb
        Total devices 1 FS bytes used 685.45MiB
        devid    1 size 465.76GiB used 4.02GiB path /dev/nvme0n1p1

Label: none  uuid: ffae4078-e89d-4329-b2b9-bdd13773a8ec
        Total devices 1 FS bytes used 3.64TiB
        devid    1 size 7.28TiB used 3.71TiB path /dev/sdd1

Label: none  uuid: c40c0298-85ed-4130-aa76-cdacac9ccfa5
        Total devices 1 FS bytes used 120.40GiB
        devid    1 size 465.76GiB used 177.02GiB path /dev/sdm1

Label: none  uuid: 5f5f56e8-f435-4b81-9042-8cccd1fb7f8e
        Total devices 1 FS bytes used 76.42GiB
        devid    2 size 223.58GiB used 78.03GiB path /dev/sdi1

Label: none  uuid: d326d8d7-9da5-4d55-b3ad-43541260b369
        Total devices 1 FS bytes used 144.00KiB
        devid    1 size 931.51GiB used 3.02GiB path /dev/nvme2n1p1

Label: none  uuid: cf55c94a-4fd3-4030-a415-1d96a475aa3c
        Total devices 1 FS bytes used 5.40TiB
        devid    1 size 7.28TiB used 5.47TiB path /dev/sde1

Label: none  uuid: a93250e3-43bc-41c9-adbc-76ac0b3b0b16
        Total devices 1 FS bytes used 46.88MiB
        devid    1 size 111.79GiB used 3.02GiB path /dev/sdl1

Label: none  uuid: 2e238485-e144-4d1f-aa1a-13097d3a3e99
        Total devices 1 FS bytes used 66.90GiB
        devid    1 size 232.88GiB used 83.02GiB path /dev/nvme1n1p1

Label: none  uuid: 05612964-8ba0-475a-b544-e716f5a03167
        Total devices 1 FS bytes used 196.00KiB
        devid    1 size 465.76GiB used 5.02GiB path /dev/sdj1

Label: none  uuid: b43c9020-8d0b-4e0e-a3c3-39ec11f9e096
        Total devices 1 FS bytes used 3.64TiB
        devid    1 size 7.28TiB used 3.70TiB path /dev/sdh1

 

Sorry, missed your last post, reboot to bring the other device back online and post new diags before array start.

  • Author

No problem :)

 

diags attached and here's the output from btrfs fi show (I skipped the other drives):

root@GrayBigBerta:~# btrfs fi show
ERROR: superblock checksum mismatch
ERROR: cannot scan /dev/sdb1: Input/output error
ERROR: superblock checksum mismatch
ERROR: cannot scan /dev/sdg1: Input/output error
[...]

 

graybigberta-diagnostics-20240703-1405.zip

Start the array with both disks unassigned and post new diags.

 

If I understood correctly you have a backup of both disks?

  • Author

Diags after starting array attached.

 

10 minutes ago, JorgeB said:

If I understood correctly you have a backup of both disks?

Kind of... I have a backup of most unraid shares and its content. So the content of both disks is included.

 

Note: I made the backup from the emulated fs (gladly there are two parity disks...).

Note 2: Shares are split automatically on directory level (High-water, standard configuration). So

 

Sadyl, I don't have a copy or clone of the disks themself, if that's what you mean.

graybigberta-diagnostics-20240703-1458.zip

  • Author

On a second look, the missing disks don't appear as locations in the shares anymore. Seems like the data is "lost" from the array.

I guess that happend when the Unraid started to rebuild the array automatically a couple of days ago...

Both emulated disks are not mounting, are you sure if they were mounting or not when you did the backup? If they weren't mounting at the time, and that would be the most like, not data would be copied from them.

 

This error is kind of strange but I think it may not be recoverable, I also see data corruption being detected in multiple disks, so you may have bad RAM, which could or not be related to the current problem:

 

Jul  3 14:58:54 GrayBigBerta kernel: BTRFS info (device md2p1): bdev /dev/md2p1 errs: wr 0, rd 0, flush 0, corrupt 22, gen 0

Jul  3 14:59:05 GrayBigBerta kernel: BTRFS info (device md4p1): bdev /dev/md4p1 errs: wr 359, rd 1, flush 0, corrupt 71, gen 0

Jul  3 14:59:09 GrayBigBerta kernel: BTRFS info (device md5p1): bdev /dev/md5p1 errs: wr 0, rd 0, flush 0, corrupt 287, gen 0

Jul  3 14:59:12 GrayBigBerta kernel: BTRFS info (device md6p1): bdev /dev/md6p1 errs: wr 0, rd 0, flush 0, corrupt 7, gen 0

 

First thing I would recommend is to run memtest for at least a couple of passes, you will also need to scrub all those disks, but that can be for later, run memtest now and post back the results, though keep in mind that memtest is only definite if errors are found.

 

 

 

 

 

  • Author

Thank you, memtest v7 is running now. I'll post the results after a couple of passes.

  • Author
38 minutes ago, JorgeB said:

If they weren't mounting at the time, and that would be the most like, not data would be copied from them.

No they weren't.

 

But this confuses me. Maybe you could help me understand in the meantime...

 

From my understanding the Unraid array parity can buffer disk failures, similar to a raid parity:

If I have an array with one parity and one array disk is failing for whatever reason, then the data on this failed disk is calculated from the parity and the remaining disks.

So as long as there is not a second disk failing the data should be available (emulated disk).

 

Same should apply to two parity disks and max of two array disks failing (which is the case at my setup).

 

Do I fundamantaly misunderstand something here?

54 minutes ago, fusselnerd said:

From my understanding the Unraid array parity can buffer disk failures, similar to a raid parity:

 

Parity help if a disk fails, but if together with that there's filesystem corruption, it cannot help with that part.

  • Author

So the whole btrfs filesystem of the array has a problem... ok, then that makes sence.

Thank you for clarifying!

 

btw. 2 memtest passes so far, no errors...

  • Author

Hi JorgeB,

 

memtest86 results attached. No errors after 4 passes.

IMG_20240704_025729.jpg

Keep in mind that memtest is only definitive if if finds errors, but for now, lets make a final try to recover the data, finish rebuilding both disks, we can then try to use a backup superblock to see if that works, but don't have much hope, if that doesn't work, you can then try using a file recovery app like UFS explorer on the rebuilt disks.

 

You will also need to scrub all the disks that have corruption detected, then reset the errors and monitor to see if new ones come up, but this is for later.

  • Author

Ah, now I understand what you meant with "definitive" - letting memtest run until it finds an error.

Sorry, english is not my first language 😅

 

Ok, I assigned both disks and started the array. Rebuild started automatically.

But both disks are labled as "Unmountable: Unsupported or no file system".

I read in another post that in this case, the rebuild will not actually write anything on the disks.

I paused the rebuild for now.

 

Shall I proceed?

 

grafik.thumb.png.2ba165f1b6d0b74a2a5951e2e36359c8.png

Edited by fusselnerd

It will still write to the disks, BTW, I forgot to ask, those disks are the original disks 1 and 3 right? Or are they new and you still have the old ones?

  • Author
Just now, JorgeB said:

It will still write to the disks

Ok, thx. Rebuild is resuming.

 

Just now, JorgeB said:

those disks are the original disks 1 and 3 right? Or are they new and you still have the old ones?

They are the original ones in the original order.

OK, if this happens in the future never start rebuilding on top of the old disk if the emulated disk doesn't mount, if you hadn't tried that, the original disks could still be OK, rebuilding an unmountable disk will always result in an unmountable disk, but now there's no other option, and once they are rebuilt, we can see if the backup superblock helps, if it doesn't, you can run UFS explorer on them, that cannot be run on emulated disks.

  • Author

Hi @JorgeB

Rebuild is complete.

Next step is

On 7/4/2024 at 12:49 PM, JorgeB said:

try to use a backup superblock

right?

Could you please guide me through the process?

 

 

On 7/4/2024 at 1:27 PM, JorgeB said:

if this happens in the future never start rebuilding on top of the old disk if the emulated disk doesn't mount, if you hadn't tried that, the original disks could still be OK, rebuilding an unmountable disk will always result in an unmountable disk

Got it and I keep it in mind.

I went through the Unraid docs again, it's mentioned there several times.

Lesson learned the hard way...

The only issue I have is, that I couldn't see if the drives are mountable before starting the array. And starting the array will automatically trigger the rebuild (though I might remember it wrong).

So in the future, I will test a temporary failed disk before e.g. by mounting it separatly, before starting the array, if such a situation ever happens again.

And of course, keep an eye open for fs errors.

 

But maybe this is a topic for another discussion.

Edited by fusselnerd

1 hour ago, fusselnerd said:

The only issue I have is, that I couldn't see if the drives are mountable before starting the array. And starting the array will automatically trigger the rebuild

After the disks get disabled they won't start rebuilding automatically, if they were unassigned, you can start the array without the disks assigned to see if the emulated disks are working.

 

Start the array in maintenance mode and post the output of:

 

btrfs-select-super -s 1 /dev/md1p1

and

btrfs-select-super -s 1 /dev/md3p1

 

 

  • Author

Here we go:

root@GrayBigBerta:~# btrfs-select-super -s 1 /dev/md1p1
ERROR: superblock checksum mismatch
ERROR: superblock checksum mismatch
No valid Btrfs found on /dev/md1p1
ERROR: open ctree failed
root@GrayBigBerta:~# btrfs-select-super -s 1 /dev/md3p1
ERROR: superblock checksum mismatch
ERROR: superblock checksum mismatch
No valid Btrfs found on /dev/md3p1
ERROR: open ctree failed
  • Author
6 minutes ago, JorgeB said:

After the disks get disabled they won't start rebuilding automatically, if they were unassigned, you can start the array without the disks assigned to see if the emulated disks are working.

I see, thank you!

Can you recommend a read about btrfs and Unraid array? I'm missing the fundamentals, obviously, so I'd like to dig into it a bit.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.