[SOLVED] Cache pool JBOD


WEHA

Recommended Posts

I'm trying to create a JBOD cache pool in 6.9beta35.

I don't know if this is a bug or I'm just doing it wrong so...

From what I understand from the below post I have to set it to single mode

 

When I do this "convert to single mode" (it's a 14TB and 8TB disk) the GUI says it's 16TB.

I also see the same write speeds to both disks, giving the impression it's RAID 1

 

Balance status:

Data, RAID1: total=4.00GiB, used=2.97GiB

Data, single: total=1.00GiB, used=0.00B

System, RAID1: total=32.00MiB, used=16.00KiB

Metadata, RAID1: total=1.00GiB, used=3.94MiB

GlobalReserve, single: total=3.78MiB, used=16.00KiB

 

If I execute "perform full balance", it just reverts to RAID 1 status.

 

Can anyone tell me what I'm doing wrong or should I post this as a bug in beta?

Maybe I have to jump through a few hoops like removing one disk -> single mode -> add disk?

 

thanks!

Link to comment
  • JorgeB changed the title to [SOLVED] Cache pool JBOD
On 11/22/2020 at 10:01 AM, JorgeB said:

Due to a current btrfs bug you need to run the balace to single (or any other profile) twice.

Could you just confirm to me if converting from single to raid 1 does not lose data? (not stated in faq nor unraid gui)

 

I just added a disk to a cache pool from 1 to 2 and unraid made it single. (I believe this is the default according to the faq)

So this is the current state (2 states, related to the btrfs bug?):

Data, RAID1: total=42.00GiB, used=24.68GiB
Data, single: total=1.18TiB, used=1.16TiB
System, DUP: total=8.00MiB, used=176.00KiB
Metadata, DUP: total=2.00GiB, used=1.69GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

I have enough space available so nothing will happen to my data right?

What would happen if there was not enough space?

Link to comment
14 minutes ago, JorgeB said:

Yes, that means that there's corrupt data and the balance will abort, you can run a scrub to find out the corrupt files(s), then delete them or restore from backup, also good idea to run memtest.

*sigh* ... how do I get a list of files?

I'm running scrub and this is the status already:

Error summary:    csum=35
  Corrected:      4
  Uncorrectable:  31
  Unverified:     0

 

These are software errors, correct?

Smart does not indicate a problem, this is also a new disk.

Link to comment

Syslog does not show files:

Nov 24 13:01:51 Tower kernel: BTRFS info (device sde1): scrub: started on devid 1
Nov 24 13:01:51 Tower kernel: BTRFS info (device sde1): scrub: started on devid 2
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 6, gen 0
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 6, gen 0
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 7, gen 0
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 8, gen 0
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413978710016 on dev /dev/sdk1
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 9, gen 0
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413913239552 on dev /dev/sdk1
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413913341952 on dev /dev/sdk1
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413913444352 on dev /dev/sdk1
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 10, gen 0
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413915201536 on dev /dev/sdk1
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413915303936 on dev /dev/sdk1
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413915406336 on dev /dev/sdk1
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413916004352 on dev /dev/sdk1
Nov 24 13:03:23 Tower kernel: BTRFS error (device sde1): fixed up error at logical 1413978824704 on dev /dev/sdk1
Nov 24 13:03:23 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413979930624 on dev /dev/sdk1

 

Link to comment

It's working for me, are you sure you're seeing the full syslog? The snippet you posted is only showing "btrfs error" lines, it's is missing the "btrfs warning" lines, those are the ones that show the file, e.g.:

 

Nov 24 05:35:17 Test2 kernel: BTRFS warning (device md1): checksum error at logical 1479372800 on dev /dev/md1, physical 2561503232, root 5, inode 257, offset 375222272, length 4096, links 1 (path: 1.iso)
Nov 24 05:35:17 Test2 kernel: BTRFS error (device md1): bdev /dev/md1 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0
Nov 24 05:35:17 Test2 kernel: BTRFS error (device md1): unable to fixup (regular) error at logical 1479372800 on dev /dev/md1

 

 

Link to comment

It's copied from the syslog file in nano, so I would think that is the full syslog?

 

There are warnings from before the scrub though:

root@Tower:/var/log# cat syslog |grep "BTRFS warning"
Nov 23 03:59:25 Tower kernel: BTRFS warning (device sde1): csum failed root 5 ino 182291 off 1765621760 csum 0xd488241c expected csum 0xdbe78a4e mirror 1
Nov 23 03:59:25 Tower kernel: BTRFS warning (device sde1): csum failed root 5 ino 182291 off 1765621760 csum 0xd488241c expected csum 0xdbe78a4e mirror 1
Nov 23 20:40:23 Tower kernel: BTRFS warning (device sde1): csum failed root -9 ino 281 off 951992320 csum 0x47d58bec expected csum 0x56997f79 mirror 1
Nov 23 20:40:23 Tower kernel: BTRFS warning (device sde1): csum failed root -9 ino 281 off 951992320 csum 0x47d58bec expected csum 0x56997f79 mirror 1
Nov 24 04:03:17 Tower kernel: BTRFS warning (device sde1): csum failed root 5 ino 182291 off 4379881472 csum 0x1616fb61 expected csum 0xcbd3dbb1 mirror 2
Nov 24 09:19:19 Tower kernel: BTRFS warning (device sde1): csum failed root -9 ino 282 off 951992320 csum 0x47d58bec expected csum 0x56997f79 mirror 1
Nov 24 09:19:19 Tower kernel: BTRFS warning (device sde1): csum failed root -9 ino 282 off 951992320 csum 0x47d58bec expected csum 0x56997f79 mirror 1
Nov 24 09:22:05 Tower kernel: BTRFS warning (device sde1): csum failed root -9 ino 283 off 951992320 csum 0x47d58bec expected csum 0x56997f79 mirror 1
Nov 24 09:22:06 Tower kernel: BTRFS warning (device sde1): csum failed root -9 ino 283 off 951992320 csum 0x47d58bec expected csum 0x56997f79 mirror 1

 

Link to comment

Same story, I see callbacks suppressed though

 

[203355.213783] BTRFS error (device sde1): unable to fixup (regular) error at logical 1342354677760 on dev /dev/sde1
[203436.360164] scrub_handle_errored_block: 8 callbacks suppressed
[203436.360209] btrfs_dev_stat_print_on_error: 8 callbacks suppressed
[203436.360212] BTRFS error (device sde1): bdev /dev/sde1 errs: wr 0, rd 0, flush 0, corrupt 93, gen 0
[203436.360214] scrub_handle_errored_block: 8 callbacks suppressed
[203436.360215] BTRFS error (device sde1): unable to fixup (regular) error at logical 1348826648576 on dev /dev/sde1
[203439.353192] BTRFS error (device sde1): bdev /dev/sde1 errs: wr 0, rd 0, flush 0, corrupt 94, gen 0
[203439.353195] BTRFS error (device sde1): unable to fixup (regular) error at logical 1349298642944 on dev /dev/sde1
[203440.426170] BTRFS error (device sde1): bdev /dev/sde1 errs: wr 0, rd 0, flush 0, corrupt 95, gen 0
[203440.426174] BTRFS error (device sde1): unable to fixup (regular) error at logical 1349556105216 on dev /dev/sde1
[203441.204687] BTRFS error (device sde1): bdev /dev/sde1 errs: wr 0, rd 0, flush 0, corrupt 96, gen 0
[203441.204690] BTRFS error (device sde1): unable to fixup (regular) error at logical 1349681184768 on dev /dev/sde1

 

Link to comment
On 11/24/2020 at 3:42 PM, JorgeB said:

The only option I see is doing the manual way, i.e., copy/move all the data somewhere else, any files that can't be copied due to an i/o error are corrupt, note that some of the corruption might not be on files but it can be the metadata, but it sill should show that info on the log.

I moved everything off, 2 files remained, 1 vdisk file and docker img.

The docker image was unable to be moved due to an i/o error, so I removed it and recreated it on another pool.

I reran scrub and now no errors are detected.

 

Is this related to docker image being set as xfs on a btrfs pool?

I set this to xfs to be sure the bug that causes much disk i/o to be gone.

 

Smart does not show any errors on the disk so I can be sure this was a software corruption and not caused by a hardware (hdd) defect?

Edited by WEHA
Link to comment
7 minutes ago, WEHA said:

The docker image was unable to be moved due to an i/o error

This means it was corrupt, btrfs won't let you copy/read a corrupt file.

 

8 minutes ago, WEHA said:

Is this related to docker image being set as xfs on a btrfs pool?

That shouldn't be a problem.

 

8 minutes ago, WEHA said:

Smart does not show any errors on the disk so I can be sure this was a software corruption and not caused by a hardware (hdd) defect?

Unlikely to be a device problem, if it's hardware related it's likely RAM.

Link to comment
1 hour ago, JorgeB said:

That shouldn't be a problem.

It's strange that it's only the docker file and not the vm file... could it be related to NOCOW / COW?

I enabled this for the system share and thus the docker image, the vdisk has NOCOW.

 

Thank you for assisting

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.