[SOLVED] Cache pool JBOD

November 22, 20205 yr

I'm trying to create a JBOD cache pool in 6.9beta35.

I don't know if this is a bug or I'm just doing it wrong so...

From what I understand from the below post I have to set it to single mode

When I do this "convert to single mode" (it's a 14TB and 8TB disk) the GUI says it's 16TB.

I also see the same write speeds to both disks, giving the impression it's RAID 1

Balance status:

Data, RAID1: total=4.00GiB, used=2.97GiB

Data, single: total=1.00GiB, used=0.00B

System, RAID1: total=32.00MiB, used=16.00KiB

Metadata, RAID1: total=1.00GiB, used=3.94MiB

GlobalReserve, single: total=3.78MiB, used=16.00KiB

If I execute "perform full balance", it just reverts to RAID 1 status.

Can anyone tell me what I'm doing wrong or should I post this as a bug in beta?

Maybe I have to jump through a few hoops like removing one disk -> single mode -> add disk?

thanks!

Quote

November 22, 20205 yr

Community Expert

Due to a current btrfs bug you need to run the balace to single (or any other profile) twice.

Quote

November 22, 20205 yr

Author

1 hour ago, JorgeB said:

Due to a current btrfs bug you need to run the balace to single (or any other profile) twice.

That works, thanks!

Quote

November 24, 20205 yr

Author

On 11/22/2020 at 10:01 AM, JorgeB said:

Due to a current btrfs bug you need to run the balace to single (or any other profile) twice.

Could you just confirm to me if converting from single to raid 1 does not lose data? (not stated in faq nor unraid gui)

I just added a disk to a cache pool from 1 to 2 and unraid made it single. (I believe this is the default according to the faq)

So this is the current state (2 states, related to the btrfs bug?):

Data, RAID1: total=42.00GiB, used=24.68GiB
Data, single: total=1.18TiB, used=1.16TiB
System, DUP: total=8.00MiB, used=176.00KiB
Metadata, DUP: total=2.00GiB, used=1.69GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

I have enough space available so nothing will happen to my data right?

What would happen if there was not enough space?

Quote

November 24, 20205 yr

Community Expert

27 minutes ago, WEHA said:

Could you just confirm to me if converting from single to raid 1 does not lose data?

It doesn't, unless something goes wrong.

Quote

November 24, 20205 yr

Author

25 minutes ago, JorgeB said:

It doesn't, unless something goes wrong.

Tried converting twice, remains the same state as posted earlier.

It starts and after about 30 seconds or so it goes back to no balance.

Quote

November 24, 20205 yr

Community Expert

Please post diags after a balance attempt.

P.S. "No balance" is normal after the balance ends/stops.

Quote

November 24, 20205 yr

Author

Attached

Seems like this is the curlprit?

Nov 24 09:22:05 Tower kernel: BTRFS warning (device sde1): csum failed root -9 ino 283 off 951992320 csum 0x47d58bec expected csum 0x56997f79 mirror 1

tower-diagnostics-20201124-1149.zip

Quote

November 24, 20205 yr

Community Expert

1 hour ago, WEHA said:

Seems like this is the curlprit?

Yes, that means that there's corrupt data and the balance will abort, you can run a scrub to find out the corrupt files(s), then delete them or restore from backup, also good idea to run memtest.

Quote

November 24, 20205 yr

Author

14 minutes ago, JorgeB said:

Yes, that means that there's corrupt data and the balance will abort, you can run a scrub to find out the corrupt files(s), then delete them or restore from backup, also good idea to run memtest.

*sigh* ... how do I get a list of files?

I'm running scrub and this is the status already:

Error summary:    csum=35
Corrected:      4
Uncorrectable: 31
Unverified:     0

These are software errors, correct?

Smart does not indicate a problem, this is also a new disk.

Quote

November 24, 20205 yr

Community Expert

11 minutes ago, WEHA said:

These are software errors, correct?

No, this is data corruption, usually caused by bad RAM.

11 minutes ago, WEHA said:

how do I get a list of files?

In the syslog.

Quote

November 24, 20205 yr

Author

Syslog does not show files:

Nov 24 13:01:51 Tower kernel: BTRFS info (device sde1): scrub: started on devid 1
Nov 24 13:01:51 Tower kernel: BTRFS info (device sde1): scrub: started on devid 2
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 6, gen 0
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 6, gen 0
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 7, gen 0
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 8, gen 0
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413978710016 on dev /dev/sdk1
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 9, gen 0
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413913239552 on dev /dev/sdk1
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413913341952 on dev /dev/sdk1
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413913444352 on dev /dev/sdk1
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 10, gen 0
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413915201536 on dev /dev/sdk1
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413915303936 on dev /dev/sdk1
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413915406336 on dev /dev/sdk1
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413916004352 on dev /dev/sdk1
Nov 24 13:03:23 Tower kernel: BTRFS error (device sde1): fixed up error at logical 1413978824704 on dev /dev/sdk1
Nov 24 13:03:23 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413979930624 on dev /dev/sdk1

Quote

November 24, 20205 yr

Community Expert

It should, maybe something changed on the log level in the new beta, need to test.

Quote

November 24, 20205 yr

Community Expert

It's working for me, are you sure you're seeing the full syslog? The snippet you posted is only showing "btrfs error" lines, it's is missing the "btrfs warning" lines, those are the ones that show the file, e.g.:

Nov 24 05:35:17 Test2 kernel: BTRFS warning (device md1): checksum error at logical 1479372800 on dev /dev/md1, physical 2561503232, root 5, inode 257, offset 375222272, length 4096, links 1 (path: 1.iso)
Nov 24 05:35:17 Test2 kernel: BTRFS error (device md1): bdev /dev/md1 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0
Nov 24 05:35:17 Test2 kernel: BTRFS error (device md1): unable to fixup (regular) error at logical 1479372800 on dev /dev/md1

Quote

November 24, 20205 yr

Author

It's copied from the syslog file in nano, so I would think that is the full syslog?

There are warnings from before the scrub though:

root@Tower:/var/log# cat syslog |grep "BTRFS warning"
Nov 23 03:59:25 Tower kernel: BTRFS warning (device sde1): csum failed root 5 ino 182291 off 1765621760 csum 0xd488241c expected csum 0xdbe78a4e mirror 1
Nov 23 03:59:25 Tower kernel: BTRFS warning (device sde1): csum failed root 5 ino 182291 off 1765621760 csum 0xd488241c expected csum 0xdbe78a4e mirror 1
Nov 23 20:40:23 Tower kernel: BTRFS warning (device sde1): csum failed root -9 ino 281 off 951992320 csum 0x47d58bec expected csum 0x56997f79 mirror 1
Nov 23 20:40:23 Tower kernel: BTRFS warning (device sde1): csum failed root -9 ino 281 off 951992320 csum 0x47d58bec expected csum 0x56997f79 mirror 1
Nov 24 04:03:17 Tower kernel: BTRFS warning (device sde1): csum failed root 5 ino 182291 off 4379881472 csum 0x1616fb61 expected csum 0xcbd3dbb1 mirror 2
Nov 24 09:19:19 Tower kernel: BTRFS warning (device sde1): csum failed root -9 ino 282 off 951992320 csum 0x47d58bec expected csum 0x56997f79 mirror 1
Nov 24 09:19:19 Tower kernel: BTRFS warning (device sde1): csum failed root -9 ino 282 off 951992320 csum 0x47d58bec expected csum 0x56997f79 mirror 1
Nov 24 09:22:05 Tower kernel: BTRFS warning (device sde1): csum failed root -9 ino 283 off 951992320 csum 0x47d58bec expected csum 0x56997f79 mirror 1
Nov 24 09:22:06 Tower kernel: BTRFS warning (device sde1): csum failed root -9 ino 283 off 951992320 csum 0x47d58bec expected csum 0x56997f79 mirror 1

Quote

November 24, 20205 yr

Community Expert

Please post diags again just to make sure.

Quote

November 24, 20205 yr

Author

tower-diagnostics-20201124-1452.zip

Quote

November 24, 20205 yr

Community Expert

Sorry, no idea why it's not showing the warnings, never seen that before, unlikely to show different info but check the dmesg by typing:

dmesg

on the console.

Quote

November 24, 20205 yr

Author

Same story, I see callbacks suppressed though

[203355.213783] BTRFS error (device sde1): unable to fixup (regular) error at logical 1342354677760 on dev /dev/sde1
[203436.360164] scrub_handle_errored_block: 8 callbacks suppressed
[203436.360209] btrfs_dev_stat_print_on_error: 8 callbacks suppressed
[203436.360212] BTRFS error (device sde1): bdev /dev/sde1 errs: wr 0, rd 0, flush 0, corrupt 93, gen 0
[203436.360214] scrub_handle_errored_block: 8 callbacks suppressed
[203436.360215] BTRFS error (device sde1): unable to fixup (regular) error at logical 1348826648576 on dev /dev/sde1
[203439.353192] BTRFS error (device sde1): bdev /dev/sde1 errs: wr 0, rd 0, flush 0, corrupt 94, gen 0
[203439.353195] BTRFS error (device sde1): unable to fixup (regular) error at logical 1349298642944 on dev /dev/sde1
[203440.426170] BTRFS error (device sde1): bdev /dev/sde1 errs: wr 0, rd 0, flush 0, corrupt 95, gen 0
[203440.426174] BTRFS error (device sde1): unable to fixup (regular) error at logical 1349556105216 on dev /dev/sde1
[203441.204687] BTRFS error (device sde1): bdev /dev/sde1 errs: wr 0, rd 0, flush 0, corrupt 96, gen 0
[203441.204690] BTRFS error (device sde1): unable to fixup (regular) error at logical 1349681184768 on dev /dev/sde1

Quote

November 24, 20205 yr

Community Expert

The only option I see is doing the manual way, i.e., copy/move all the data somewhere else, any files that can't be copied due to an i/o error are corrupt, note that some of the corruption might not be on files but it can be the metadata, but it sill should show that info on the log.

Quote

November 25, 20205 yr

Author

On 11/24/2020 at 3:42 PM, JorgeB said:

The only option I see is doing the manual way, i.e., copy/move all the data somewhere else, any files that can't be copied due to an i/o error are corrupt, note that some of the corruption might not be on files but it can be the metadata, but it sill should show that info on the log.

I moved everything off, 2 files remained, 1 vdisk file and docker img.

The docker image was unable to be moved due to an i/o error, so I removed it and recreated it on another pool.

I reran scrub and now no errors are detected.

Is this related to docker image being set as xfs on a btrfs pool?

I set this to xfs to be sure the bug that causes much disk i/o to be gone.

Smart does not show any errors on the disk so I can be sure this was a software corruption and not caused by a hardware (hdd) defect?

Edited November 25, 20205 yr by WEHA

Quote

November 25, 20205 yr

Community Expert

7 minutes ago, WEHA said:

The docker image was unable to be moved due to an i/o error

This means it was corrupt, btrfs won't let you copy/read a corrupt file.

8 minutes ago, WEHA said:

Is this related to docker image being set as xfs on a btrfs pool?

That shouldn't be a problem.

8 minutes ago, WEHA said:

Smart does not show any errors on the disk so I can be sure this was a software corruption and not caused by a hardware (hdd) defect?

Unlikely to be a device problem, if it's hardware related it's likely RAM.

Quote

November 25, 20205 yr

Author

1 hour ago, JorgeB said:

That shouldn't be a problem.

It's strange that it's only the docker file and not the vm file... could it be related to NOCOW / COW?

I enabled this for the system share and thus the docker image, the vdisk has NOCOW.

Thank you for assisting

Quote

November 25, 20205 yr

Community Expert

Enabling NOCOW turns off data checksum, so those can't be checked or fixed.

Quote

November 25, 20205 yr

Author

4 hours ago, JorgeB said:

Enabling NOCOW turns off data checksum, so those can't be checked or fixed.

I mean COW by enabling.

So system had COW and the vdisk had NOCOW

But docker image was corrupt and vdisk image was not.

Quote

[SOLVED] Cache pool JBOD

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)