WEHA Posted November 22, 2020 Share Posted November 22, 2020 I'm trying to create a JBOD cache pool in 6.9beta35. I don't know if this is a bug or I'm just doing it wrong so... From what I understand from the below post I have to set it to single mode When I do this "convert to single mode" (it's a 14TB and 8TB disk) the GUI says it's 16TB. I also see the same write speeds to both disks, giving the impression it's RAID 1 Balance status: Data, RAID1: total=4.00GiB, used=2.97GiB Data, single: total=1.00GiB, used=0.00B System, RAID1: total=32.00MiB, used=16.00KiB Metadata, RAID1: total=1.00GiB, used=3.94MiB GlobalReserve, single: total=3.78MiB, used=16.00KiB If I execute "perform full balance", it just reverts to RAID 1 status. Can anyone tell me what I'm doing wrong or should I post this as a bug in beta? Maybe I have to jump through a few hoops like removing one disk -> single mode -> add disk? thanks! Quote Link to comment
JorgeB Posted November 22, 2020 Share Posted November 22, 2020 Due to a current btrfs bug you need to run the balace to single (or any other profile) twice. Quote Link to comment
WEHA Posted November 22, 2020 Author Share Posted November 22, 2020 1 hour ago, JorgeB said: Due to a current btrfs bug you need to run the balace to single (or any other profile) twice. That works, thanks! 1 Quote Link to comment
WEHA Posted November 24, 2020 Author Share Posted November 24, 2020 On 11/22/2020 at 10:01 AM, JorgeB said: Due to a current btrfs bug you need to run the balace to single (or any other profile) twice. Could you just confirm to me if converting from single to raid 1 does not lose data? (not stated in faq nor unraid gui) I just added a disk to a cache pool from 1 to 2 and unraid made it single. (I believe this is the default according to the faq) So this is the current state (2 states, related to the btrfs bug?): Data, RAID1: total=42.00GiB, used=24.68GiB Data, single: total=1.18TiB, used=1.16TiB System, DUP: total=8.00MiB, used=176.00KiB Metadata, DUP: total=2.00GiB, used=1.69GiB GlobalReserve, single: total=512.00MiB, used=0.00B I have enough space available so nothing will happen to my data right? What would happen if there was not enough space? Quote Link to comment
JorgeB Posted November 24, 2020 Share Posted November 24, 2020 27 minutes ago, WEHA said: Could you just confirm to me if converting from single to raid 1 does not lose data? It doesn't, unless something goes wrong. Quote Link to comment
WEHA Posted November 24, 2020 Author Share Posted November 24, 2020 25 minutes ago, JorgeB said: It doesn't, unless something goes wrong. Tried converting twice, remains the same state as posted earlier. It starts and after about 30 seconds or so it goes back to no balance. Quote Link to comment
JorgeB Posted November 24, 2020 Share Posted November 24, 2020 Please post diags after a balance attempt. P.S. "No balance" is normal after the balance ends/stops. Quote Link to comment
WEHA Posted November 24, 2020 Author Share Posted November 24, 2020 Attached Seems like this is the curlprit? Nov 24 09:22:05 Tower kernel: BTRFS warning (device sde1): csum failed root -9 ino 283 off 951992320 csum 0x47d58bec expected csum 0x56997f79 mirror 1 tower-diagnostics-20201124-1149.zip Quote Link to comment
JorgeB Posted November 24, 2020 Share Posted November 24, 2020 1 hour ago, WEHA said: Seems like this is the curlprit? Yes, that means that there's corrupt data and the balance will abort, you can run a scrub to find out the corrupt files(s), then delete them or restore from backup, also good idea to run memtest. Quote Link to comment
WEHA Posted November 24, 2020 Author Share Posted November 24, 2020 14 minutes ago, JorgeB said: Yes, that means that there's corrupt data and the balance will abort, you can run a scrub to find out the corrupt files(s), then delete them or restore from backup, also good idea to run memtest. *sigh* ... how do I get a list of files? I'm running scrub and this is the status already: Error summary: csum=35 Corrected: 4 Uncorrectable: 31 Unverified: 0 These are software errors, correct? Smart does not indicate a problem, this is also a new disk. Quote Link to comment
JorgeB Posted November 24, 2020 Share Posted November 24, 2020 11 minutes ago, WEHA said: These are software errors, correct? No, this is data corruption, usually caused by bad RAM. 11 minutes ago, WEHA said: how do I get a list of files? In the syslog. Quote Link to comment
WEHA Posted November 24, 2020 Author Share Posted November 24, 2020 Syslog does not show files: Nov 24 13:01:51 Tower kernel: BTRFS info (device sde1): scrub: started on devid 1 Nov 24 13:01:51 Tower kernel: BTRFS info (device sde1): scrub: started on devid 2 Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0 Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0 Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0 Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0 Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 6, gen 0 Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 6, gen 0 Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 7, gen 0 Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 8, gen 0 Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413978710016 on dev /dev/sdk1 Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 9, gen 0 Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413913239552 on dev /dev/sdk1 Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413913341952 on dev /dev/sdk1 Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413913444352 on dev /dev/sdk1 Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 10, gen 0 Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413915201536 on dev /dev/sdk1 Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413915303936 on dev /dev/sdk1 Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413915406336 on dev /dev/sdk1 Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413916004352 on dev /dev/sdk1 Nov 24 13:03:23 Tower kernel: BTRFS error (device sde1): fixed up error at logical 1413978824704 on dev /dev/sdk1 Nov 24 13:03:23 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413979930624 on dev /dev/sdk1 Quote Link to comment
JorgeB Posted November 24, 2020 Share Posted November 24, 2020 It should, maybe something changed on the log level in the new beta, need to test. Quote Link to comment
JorgeB Posted November 24, 2020 Share Posted November 24, 2020 It's working for me, are you sure you're seeing the full syslog? The snippet you posted is only showing "btrfs error" lines, it's is missing the "btrfs warning" lines, those are the ones that show the file, e.g.: Nov 24 05:35:17 Test2 kernel: BTRFS warning (device md1): checksum error at logical 1479372800 on dev /dev/md1, physical 2561503232, root 5, inode 257, offset 375222272, length 4096, links 1 (path: 1.iso) Nov 24 05:35:17 Test2 kernel: BTRFS error (device md1): bdev /dev/md1 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0 Nov 24 05:35:17 Test2 kernel: BTRFS error (device md1): unable to fixup (regular) error at logical 1479372800 on dev /dev/md1 Quote Link to comment
WEHA Posted November 24, 2020 Author Share Posted November 24, 2020 It's copied from the syslog file in nano, so I would think that is the full syslog? There are warnings from before the scrub though: root@Tower:/var/log# cat syslog |grep "BTRFS warning" Nov 23 03:59:25 Tower kernel: BTRFS warning (device sde1): csum failed root 5 ino 182291 off 1765621760 csum 0xd488241c expected csum 0xdbe78a4e mirror 1 Nov 23 03:59:25 Tower kernel: BTRFS warning (device sde1): csum failed root 5 ino 182291 off 1765621760 csum 0xd488241c expected csum 0xdbe78a4e mirror 1 Nov 23 20:40:23 Tower kernel: BTRFS warning (device sde1): csum failed root -9 ino 281 off 951992320 csum 0x47d58bec expected csum 0x56997f79 mirror 1 Nov 23 20:40:23 Tower kernel: BTRFS warning (device sde1): csum failed root -9 ino 281 off 951992320 csum 0x47d58bec expected csum 0x56997f79 mirror 1 Nov 24 04:03:17 Tower kernel: BTRFS warning (device sde1): csum failed root 5 ino 182291 off 4379881472 csum 0x1616fb61 expected csum 0xcbd3dbb1 mirror 2 Nov 24 09:19:19 Tower kernel: BTRFS warning (device sde1): csum failed root -9 ino 282 off 951992320 csum 0x47d58bec expected csum 0x56997f79 mirror 1 Nov 24 09:19:19 Tower kernel: BTRFS warning (device sde1): csum failed root -9 ino 282 off 951992320 csum 0x47d58bec expected csum 0x56997f79 mirror 1 Nov 24 09:22:05 Tower kernel: BTRFS warning (device sde1): csum failed root -9 ino 283 off 951992320 csum 0x47d58bec expected csum 0x56997f79 mirror 1 Nov 24 09:22:06 Tower kernel: BTRFS warning (device sde1): csum failed root -9 ino 283 off 951992320 csum 0x47d58bec expected csum 0x56997f79 mirror 1 Quote Link to comment
JorgeB Posted November 24, 2020 Share Posted November 24, 2020 Please post diags again just to make sure. Quote Link to comment
WEHA Posted November 24, 2020 Author Share Posted November 24, 2020 tower-diagnostics-20201124-1452.zip Quote Link to comment
JorgeB Posted November 24, 2020 Share Posted November 24, 2020 Sorry, no idea why it's not showing the warnings, never seen that before, unlikely to show different info but check the dmesg by typing: dmesg on the console. Quote Link to comment
WEHA Posted November 24, 2020 Author Share Posted November 24, 2020 Same story, I see callbacks suppressed though [203355.213783] BTRFS error (device sde1): unable to fixup (regular) error at logical 1342354677760 on dev /dev/sde1 [203436.360164] scrub_handle_errored_block: 8 callbacks suppressed [203436.360209] btrfs_dev_stat_print_on_error: 8 callbacks suppressed [203436.360212] BTRFS error (device sde1): bdev /dev/sde1 errs: wr 0, rd 0, flush 0, corrupt 93, gen 0 [203436.360214] scrub_handle_errored_block: 8 callbacks suppressed [203436.360215] BTRFS error (device sde1): unable to fixup (regular) error at logical 1348826648576 on dev /dev/sde1 [203439.353192] BTRFS error (device sde1): bdev /dev/sde1 errs: wr 0, rd 0, flush 0, corrupt 94, gen 0 [203439.353195] BTRFS error (device sde1): unable to fixup (regular) error at logical 1349298642944 on dev /dev/sde1 [203440.426170] BTRFS error (device sde1): bdev /dev/sde1 errs: wr 0, rd 0, flush 0, corrupt 95, gen 0 [203440.426174] BTRFS error (device sde1): unable to fixup (regular) error at logical 1349556105216 on dev /dev/sde1 [203441.204687] BTRFS error (device sde1): bdev /dev/sde1 errs: wr 0, rd 0, flush 0, corrupt 96, gen 0 [203441.204690] BTRFS error (device sde1): unable to fixup (regular) error at logical 1349681184768 on dev /dev/sde1 Quote Link to comment
JorgeB Posted November 24, 2020 Share Posted November 24, 2020 The only option I see is doing the manual way, i.e., copy/move all the data somewhere else, any files that can't be copied due to an i/o error are corrupt, note that some of the corruption might not be on files but it can be the metadata, but it sill should show that info on the log. Quote Link to comment
WEHA Posted November 25, 2020 Author Share Posted November 25, 2020 (edited) On 11/24/2020 at 3:42 PM, JorgeB said: The only option I see is doing the manual way, i.e., copy/move all the data somewhere else, any files that can't be copied due to an i/o error are corrupt, note that some of the corruption might not be on files but it can be the metadata, but it sill should show that info on the log. I moved everything off, 2 files remained, 1 vdisk file and docker img. The docker image was unable to be moved due to an i/o error, so I removed it and recreated it on another pool. I reran scrub and now no errors are detected. Is this related to docker image being set as xfs on a btrfs pool? I set this to xfs to be sure the bug that causes much disk i/o to be gone. Smart does not show any errors on the disk so I can be sure this was a software corruption and not caused by a hardware (hdd) defect? Edited November 25, 2020 by WEHA Quote Link to comment
JorgeB Posted November 25, 2020 Share Posted November 25, 2020 7 minutes ago, WEHA said: The docker image was unable to be moved due to an i/o error This means it was corrupt, btrfs won't let you copy/read a corrupt file. 8 minutes ago, WEHA said: Is this related to docker image being set as xfs on a btrfs pool? That shouldn't be a problem. 8 minutes ago, WEHA said: Smart does not show any errors on the disk so I can be sure this was a software corruption and not caused by a hardware (hdd) defect? Unlikely to be a device problem, if it's hardware related it's likely RAM. Quote Link to comment
WEHA Posted November 25, 2020 Author Share Posted November 25, 2020 1 hour ago, JorgeB said: That shouldn't be a problem. It's strange that it's only the docker file and not the vm file... could it be related to NOCOW / COW? I enabled this for the system share and thus the docker image, the vdisk has NOCOW. Thank you for assisting Quote Link to comment
JorgeB Posted November 25, 2020 Share Posted November 25, 2020 Enabling NOCOW turns off data checksum, so those can't be checked or fixed. Quote Link to comment
WEHA Posted November 25, 2020 Author Share Posted November 25, 2020 4 hours ago, JorgeB said: Enabling NOCOW turns off data checksum, so those can't be checked or fixed. I mean COW by enabling. So system had COW and the vdisk had NOCOW But docker image was corrupt and vdisk image was not. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.