WEHA

January 21

15 hours ago, Kilrah said:

Some SMR drives are able to recognise sequential writes and bypass the caching but not all do.

I guess this was one of those disks, I put in a non-smr and it's going 120-150MB/s...

I knew SMR were no good but THIS bad... wow

January 20

2 hours ago, Kilrah said:

So you're now copying parity data onto that 6TB drive?

Looks like it's an SMR drive so... you're just going to have to be very patient.

You might be able to gain some time by pausing the operation, waiting an hour or so then resuming, and repeating everytime it falls down to negligible speeds.

I mean ok it's an SMR drive, but this bad? Is this not a sequential write?

It was pretty much like this from the beginning but I can give that a try

January 20

I had a data disk bad and even disabled so I replaced it.

I only have a larger disk available so it wants to do a parity swap, I get it can take while but I'm seeing 5 / 6MB/s... for 3TB

Can someone point out how to get the ball rolling faster because at this rate it will take more than 5 days.

Diag attached.

unneptunus-diagnostics-20240120-1510.zip

September 15, 2023

I have write errors on a cache drive and now my VM's & Dockers are not responding (just opnsense).

mount|grep cache
/dev/nvme2n1p1 on /mnt/cache type btrfs (ro,noatime,ssd,discard=async,space_cache=v2,subvolid=5,subvol=/)

I'm on holiday so I don't want to stop the array to remove the disk.

Is there a way to tell unraid to stop using that disk?

I saw this but it does not seem to be right syntax for unraid: echo 1 > /sys/block/nvme0n1/device/delete

Sep 14 07:23:20 Tower kernel: BTRFS error (device nvme2n1p1): error writing primary super block to device 2

Label: none  uuid: 987c4458-3b7c-4bbe-af87-c2f8bdde7c60
        Total devices 2 FS bytes used 724.88GiB
        devid    1 size 931.51GiB used 903.54GiB path /dev/nvme2n1p1
        devid    2 size 0 used 0 path /dev/nvme0n1p1 MISSING

Sidenote: With errors like these, I would also think that it would give at least an error when I'm looking at the array itself?

Would it be a good idea to do the following?

btrfs device remove /dev/nvme0n1p1 /mnt/cache

How can I remount rw?

Thanks!

December 30, 2022

7 minutes ago, JorgeB said:

The trial is to be able to see and send the new GUID to support, you cannot use a trial with an existing config.

You shoud be able to start a trial on an unknown usb stick to get started with your backup config.

Now you are just hijacking peoples systems who are already getting a headache of a non-working server.

EDIT: for like a (few) day(s) or something, linked to an account so you can track abuse

December 30, 2022

1 minute ago, JorgeB said:

You should contact support.

Yes, helpful, thank you... like I already mentioned that I already requested it.

There seems to a flaw in the documentation saying you need to request a trial but that's not possible when you have a backup of your configuration.

December 30, 2022

Starting a Trial is also not a solution as the configuration backup refers to an activated license so trial is invalid...

I would like to use my purchased product please???!!!

December 30, 2022

Another stick bites the dust, not sure why as these are practically unused sticks...

New stick installed, no license key obviously, the very convenient 1 year block on requesting a new license is very helpful.

I requested it via e-mail but who knows how long that is going to take.

I click free trial but there is no procedure to get he trial license, only how to install a stick... yes thank you, already done that.

I'm assuming this should be available on the machine itself but it does not have internet because the firewall is a VM on the same machine...

Either way I only get "fix error" that just opens the messages with the options "purchase key" and "redeem activation code".

So... now what?

November 26, 2020

3 minutes ago, JorgeB said:

Not because it's COW, all my btrfs data is COW, but if it's just the docker image there might be another reason for the corruption, and not be hardware.

Very well, thanks for your input.

November 26, 2020

6 hours ago, JorgeB said:

The vdisk might be corrupt, no way of knowing.

Well yes, not via btrfs but I have no issues with the vm, no errors in eventlog and full backups are working.

That's why I believe the vdisk is fine.

It's just weird to me that only docker image is affected and it was on a COW share.

But if you're confident that there is no issue with this scenario then ok.

November 25, 2020

4 hours ago, JorgeB said:

Enabling NOCOW turns off data checksum, so those can't be checked or fixed.

I mean COW by enabling.

So system had COW and the vdisk had NOCOW

But docker image was corrupt and vdisk image was not.

November 25, 2020

1 hour ago, JorgeB said:

That shouldn't be a problem.

It's strange that it's only the docker file and not the vm file... could it be related to NOCOW / COW?

I enabled this for the system share and thus the docker image, the vdisk has NOCOW.

Thank you for assisting

November 25, 2020

On 11/24/2020 at 3:42 PM, JorgeB said:

The only option I see is doing the manual way, i.e., copy/move all the data somewhere else, any files that can't be copied due to an i/o error are corrupt, note that some of the corruption might not be on files but it can be the metadata, but it sill should show that info on the log.

I moved everything off, 2 files remained, 1 vdisk file and docker img.

The docker image was unable to be moved due to an i/o error, so I removed it and recreated it on another pool.

I reran scrub and now no errors are detected.

Is this related to docker image being set as xfs on a btrfs pool?

I set this to xfs to be sure the bug that causes much disk i/o to be gone.

Smart does not show any errors on the disk so I can be sure this was a software corruption and not caused by a hardware (hdd) defect?

November 24, 2020

Same story, I see callbacks suppressed though

[203355.213783] BTRFS error (device sde1): unable to fixup (regular) error at logical 1342354677760 on dev /dev/sde1
[203436.360164] scrub_handle_errored_block: 8 callbacks suppressed
[203436.360209] btrfs_dev_stat_print_on_error: 8 callbacks suppressed
[203436.360212] BTRFS error (device sde1): bdev /dev/sde1 errs: wr 0, rd 0, flush 0, corrupt 93, gen 0
[203436.360214] scrub_handle_errored_block: 8 callbacks suppressed
[203436.360215] BTRFS error (device sde1): unable to fixup (regular) error at logical 1348826648576 on dev /dev/sde1
[203439.353192] BTRFS error (device sde1): bdev /dev/sde1 errs: wr 0, rd 0, flush 0, corrupt 94, gen 0
[203439.353195] BTRFS error (device sde1): unable to fixup (regular) error at logical 1349298642944 on dev /dev/sde1
[203440.426170] BTRFS error (device sde1): bdev /dev/sde1 errs: wr 0, rd 0, flush 0, corrupt 95, gen 0
[203440.426174] BTRFS error (device sde1): unable to fixup (regular) error at logical 1349556105216 on dev /dev/sde1
[203441.204687] BTRFS error (device sde1): bdev /dev/sde1 errs: wr 0, rd 0, flush 0, corrupt 96, gen 0
[203441.204690] BTRFS error (device sde1): unable to fixup (regular) error at logical 1349681184768 on dev /dev/sde1

November 24, 2020

tower-diagnostics-20201124-1452.zip

November 24, 2020

It's copied from the syslog file in nano, so I would think that is the full syslog?

There are warnings from before the scrub though:

root@Tower:/var/log# cat syslog |grep "BTRFS warning"
Nov 23 03:59:25 Tower kernel: BTRFS warning (device sde1): csum failed root 5 ino 182291 off 1765621760 csum 0xd488241c expected csum 0xdbe78a4e mirror 1
Nov 23 03:59:25 Tower kernel: BTRFS warning (device sde1): csum failed root 5 ino 182291 off 1765621760 csum 0xd488241c expected csum 0xdbe78a4e mirror 1
Nov 23 20:40:23 Tower kernel: BTRFS warning (device sde1): csum failed root -9 ino 281 off 951992320 csum 0x47d58bec expected csum 0x56997f79 mirror 1
Nov 23 20:40:23 Tower kernel: BTRFS warning (device sde1): csum failed root -9 ino 281 off 951992320 csum 0x47d58bec expected csum 0x56997f79 mirror 1
Nov 24 04:03:17 Tower kernel: BTRFS warning (device sde1): csum failed root 5 ino 182291 off 4379881472 csum 0x1616fb61 expected csum 0xcbd3dbb1 mirror 2
Nov 24 09:19:19 Tower kernel: BTRFS warning (device sde1): csum failed root -9 ino 282 off 951992320 csum 0x47d58bec expected csum 0x56997f79 mirror 1
Nov 24 09:19:19 Tower kernel: BTRFS warning (device sde1): csum failed root -9 ino 282 off 951992320 csum 0x47d58bec expected csum 0x56997f79 mirror 1
Nov 24 09:22:05 Tower kernel: BTRFS warning (device sde1): csum failed root -9 ino 283 off 951992320 csum 0x47d58bec expected csum 0x56997f79 mirror 1
Nov 24 09:22:06 Tower kernel: BTRFS warning (device sde1): csum failed root -9 ino 283 off 951992320 csum 0x47d58bec expected csum 0x56997f79 mirror 1

November 24, 2020

Syslog does not show files:

Nov 24 13:01:51 Tower kernel: BTRFS info (device sde1): scrub: started on devid 1
Nov 24 13:01:51 Tower kernel: BTRFS info (device sde1): scrub: started on devid 2
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 6, gen 0
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 6, gen 0
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 7, gen 0
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 8, gen 0
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413978710016 on dev /dev/sdk1
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 9, gen 0
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413913239552 on dev /dev/sdk1
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413913341952 on dev /dev/sdk1
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413913444352 on dev /dev/sdk1
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 10, gen 0
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413915201536 on dev /dev/sdk1
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413915303936 on dev /dev/sdk1
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413915406336 on dev /dev/sdk1
Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413916004352 on dev /dev/sdk1
Nov 24 13:03:23 Tower kernel: BTRFS error (device sde1): fixed up error at logical 1413978824704 on dev /dev/sdk1
Nov 24 13:03:23 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413979930624 on dev /dev/sdk1

November 24, 2020

14 minutes ago, JorgeB said:

Yes, that means that there's corrupt data and the balance will abort, you can run a scrub to find out the corrupt files(s), then delete them or restore from backup, also good idea to run memtest.

*sigh* ... how do I get a list of files?

I'm running scrub and this is the status already:

Error summary:    csum=35
Corrected:      4
Uncorrectable: 31
Unverified:     0

These are software errors, correct?

Smart does not indicate a problem, this is also a new disk.

November 24, 2020

Attached

Seems like this is the curlprit?

Nov 24 09:22:05 Tower kernel: BTRFS warning (device sde1): csum failed root -9 ino 283 off 951992320 csum 0x47d58bec expected csum 0x56997f79 mirror 1

tower-diagnostics-20201124-1149.zip

November 24, 2020

25 minutes ago, JorgeB said:

It doesn't, unless something goes wrong.

Tried converting twice, remains the same state as posted earlier.

It starts and after about 30 seconds or so it goes back to no balance.

November 24, 2020

On 11/22/2020 at 10:01 AM, JorgeB said:

Due to a current btrfs bug you need to run the balace to single (or any other profile) twice.

Could you just confirm to me if converting from single to raid 1 does not lose data? (not stated in faq nor unraid gui)

I just added a disk to a cache pool from 1 to 2 and unraid made it single. (I believe this is the default according to the faq)

So this is the current state (2 states, related to the btrfs bug?):

Data, RAID1: total=42.00GiB, used=24.68GiB
Data, single: total=1.18TiB, used=1.16TiB
System, DUP: total=8.00MiB, used=176.00KiB
Metadata, DUP: total=2.00GiB, used=1.69GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

I have enough space available so nothing will happen to my data right?

What would happen if there was not enough space?

November 22, 2020

1 hour ago, JorgeB said:

Due to a current btrfs bug you need to run the balace to single (or any other profile) twice.

That works, thanks!

November 22, 2020

I'm trying to create a JBOD cache pool in 6.9beta35.

I don't know if this is a bug or I'm just doing it wrong so...

From what I understand from the below post I have to set it to single mode

When I do this "convert to single mode" (it's a 14TB and 8TB disk) the GUI says it's 16TB.

I also see the same write speeds to both disks, giving the impression it's RAID 1

Balance status:

Data, RAID1: total=4.00GiB, used=2.97GiB

Data, single: total=1.00GiB, used=0.00B

System, RAID1: total=32.00MiB, used=16.00KiB

Metadata, RAID1: total=1.00GiB, used=3.94MiB

GlobalReserve, single: total=3.78MiB, used=16.00KiB

If I execute "perform full balance", it just reverts to RAID 1 status.

Can anyone tell me what I'm doing wrong or should I post this as a bug in beta?

Maybe I have to jump through a few hoops like removing one disk -> single mode -> add disk?

thanks!

November 8, 2020

When making some changes it's sometimes necessary or preferred to start docker / vm manager without auto-start enabled.

That way you can just start whichever docker or vm you want.

So what I'm asking is: add a 3rd option to the enable docker / enable vms dropdown like Yes, no auto start

November 7, 2020

On 11/3/2020 at 1:33 PM, JorgeB said:

Fragmentation yes, increased write amplification not so good, since it can reduce the SSD life.

So I read about the "bug" that causes many writes to sdd's especially evo...

Mine have a 1200TBW and are around 1500TBW now (in 2 years time)

In the new beta there is a solution, but there also issues.

My thought is, can I upgrade to the new beta, recreate the cache (on new drives) with the new partition layout and revert back to 6.8.3 if the need arises?

WEHA

Posts

Joined

Last visited

Content Type

Profiles

Forums

Downloads

Store

Gallery

Bug Reports

Documentation

Landing

Posts posted by WEHA

Parity swap VERY slow (5MB/s)

Parity swap VERY slow (5MB/s)

Parity swap VERY slow (5MB/s)

Disable/Remove a drive from cache while the array is started

Stick faulty for second time in a year, license requested but can't start trial

Stick faulty for second time in a year, license requested but can't start trial

Stick faulty for second time in a year, license requested but can't start trial

Stick faulty for second time in a year, license requested but can't start trial

[SOLVED] Cache pool JBOD

[SOLVED] Cache pool JBOD

[SOLVED] Cache pool JBOD

[SOLVED] Cache pool JBOD

[SOLVED] Cache pool JBOD

[SOLVED] Cache pool JBOD

[SOLVED] Cache pool JBOD

[SOLVED] Cache pool JBOD

[SOLVED] Cache pool JBOD

[SOLVED] Cache pool JBOD

[SOLVED] Cache pool JBOD

[SOLVED] Cache pool JBOD

[SOLVED] Cache pool JBOD

[SOLVED] Cache pool JBOD

[SOLVED] Cache pool JBOD

Add "no auto start" to docker & vm manager enable setting

Cache seems corrupted, nothing to see in GUI