WEHA

Members
  • Posts

    91
  • Joined

  • Last visited

Posts posted by WEHA

  1. 15 hours ago, Kilrah said:

    Some SMR drives are able to recognise sequential writes and bypass the caching but not all do.

     

     

    I guess this was one of those disks, I put in a non-smr and it's going 120-150MB/s...

    I knew SMR were no good but THIS bad... wow

  2. 2 hours ago, Kilrah said:

    So you're now copying parity data onto that 6TB drive?

    Looks like it's an SMR drive so... you're just going to have to be very patient. 

     

    You might be able to gain some time by pausing the operation, waiting an hour or so then resuming, and repeating everytime it falls down to negligible speeds.

    I mean ok it's an SMR drive, but this bad? Is this not a sequential write?

    It was pretty much like this from the beginning but I can give that a try

  3. I have write errors on a cache drive and now my VM's & Dockers are not responding (just opnsense).

    mount|grep cache
    /dev/nvme2n1p1 on /mnt/cache type btrfs (ro,noatime,ssd,discard=async,space_cache=v2,subvolid=5,subvol=/)

     

    I'm on holiday so I don't want to stop the array to remove the disk.

    Is there a way to tell unraid to stop using that disk?

    I saw this but it does not seem to be right syntax for unraid: echo 1 > /sys/block/nvme0n1/device/delete

     

    Sep 14 07:23:20 Tower kernel: BTRFS error (device nvme2n1p1): error writing primary super block to device 2

     

    Label: none  uuid: 987c4458-3b7c-4bbe-af87-c2f8bdde7c60
            Total devices 2 FS bytes used 724.88GiB
            devid    1 size 931.51GiB used 903.54GiB path /dev/nvme2n1p1
            devid    2 size 0 used 0 path /dev/nvme0n1p1 MISSING

     

    Sidenote: With errors like these, I would also think that it would give at least an error when I'm looking at the array itself?

     

    Would it be a good idea to do the following?

    btrfs device remove /dev/nvme0n1p1 /mnt/cache

     

    How can I remount rw?

     

    Thanks!

    image.png

  4. 7 minutes ago, JorgeB said:

    The trial is to be able to see and send the new GUID to support, you cannot use a trial with an existing config.

    You shoud be able to start a trial on an unknown usb stick to get started with your backup config.

    Now you are just hijacking peoples systems who are already getting a headache of a non-working server.

     

    EDIT: for like a (few) day(s) or something, linked to an account so you can track abuse

     

  5. Another stick bites the dust, not sure why as these are practically unused sticks...

    New stick installed, no license key obviously, the very convenient 1 year block on requesting a new license is very helpful.

    I requested it via e-mail but who knows how long that is going to take.

    I click free trial but there is no procedure to get he trial license, only how to install a stick... yes thank you, already done that.

    I'm assuming this should be available on the machine itself but it does not have internet because the firewall is a VM on the same machine...

    Either way I only get "fix error" that just opens the messages with the options "purchase key" and "redeem activation code".

     

    So... now what?

  6. 6 hours ago, JorgeB said:

    The vdisk might be corrupt, no way of knowing.

    Well yes, not via btrfs but I have no issues with the vm, no errors in eventlog and full backups are working.

    That's why I believe the vdisk is fine.

    It's just weird to me that only docker image is affected and it was on a COW share.

    But if you're confident that there is no issue with this scenario then ok.

  7. 1 hour ago, JorgeB said:

    That shouldn't be a problem.

    It's strange that it's only the docker file and not the vm file... could it be related to NOCOW / COW?

    I enabled this for the system share and thus the docker image, the vdisk has NOCOW.

     

    Thank you for assisting

  8. On 11/24/2020 at 3:42 PM, JorgeB said:

    The only option I see is doing the manual way, i.e., copy/move all the data somewhere else, any files that can't be copied due to an i/o error are corrupt, note that some of the corruption might not be on files but it can be the metadata, but it sill should show that info on the log.

    I moved everything off, 2 files remained, 1 vdisk file and docker img.

    The docker image was unable to be moved due to an i/o error, so I removed it and recreated it on another pool.

    I reran scrub and now no errors are detected.

     

    Is this related to docker image being set as xfs on a btrfs pool?

    I set this to xfs to be sure the bug that causes much disk i/o to be gone.

     

    Smart does not show any errors on the disk so I can be sure this was a software corruption and not caused by a hardware (hdd) defect?

  9. Same story, I see callbacks suppressed though

     

    [203355.213783] BTRFS error (device sde1): unable to fixup (regular) error at logical 1342354677760 on dev /dev/sde1
    [203436.360164] scrub_handle_errored_block: 8 callbacks suppressed
    [203436.360209] btrfs_dev_stat_print_on_error: 8 callbacks suppressed
    [203436.360212] BTRFS error (device sde1): bdev /dev/sde1 errs: wr 0, rd 0, flush 0, corrupt 93, gen 0
    [203436.360214] scrub_handle_errored_block: 8 callbacks suppressed
    [203436.360215] BTRFS error (device sde1): unable to fixup (regular) error at logical 1348826648576 on dev /dev/sde1
    [203439.353192] BTRFS error (device sde1): bdev /dev/sde1 errs: wr 0, rd 0, flush 0, corrupt 94, gen 0
    [203439.353195] BTRFS error (device sde1): unable to fixup (regular) error at logical 1349298642944 on dev /dev/sde1
    [203440.426170] BTRFS error (device sde1): bdev /dev/sde1 errs: wr 0, rd 0, flush 0, corrupt 95, gen 0
    [203440.426174] BTRFS error (device sde1): unable to fixup (regular) error at logical 1349556105216 on dev /dev/sde1
    [203441.204687] BTRFS error (device sde1): bdev /dev/sde1 errs: wr 0, rd 0, flush 0, corrupt 96, gen 0
    [203441.204690] BTRFS error (device sde1): unable to fixup (regular) error at logical 1349681184768 on dev /dev/sde1

     

  10. It's copied from the syslog file in nano, so I would think that is the full syslog?

     

    There are warnings from before the scrub though:

    root@Tower:/var/log# cat syslog |grep "BTRFS warning"
    Nov 23 03:59:25 Tower kernel: BTRFS warning (device sde1): csum failed root 5 ino 182291 off 1765621760 csum 0xd488241c expected csum 0xdbe78a4e mirror 1
    Nov 23 03:59:25 Tower kernel: BTRFS warning (device sde1): csum failed root 5 ino 182291 off 1765621760 csum 0xd488241c expected csum 0xdbe78a4e mirror 1
    Nov 23 20:40:23 Tower kernel: BTRFS warning (device sde1): csum failed root -9 ino 281 off 951992320 csum 0x47d58bec expected csum 0x56997f79 mirror 1
    Nov 23 20:40:23 Tower kernel: BTRFS warning (device sde1): csum failed root -9 ino 281 off 951992320 csum 0x47d58bec expected csum 0x56997f79 mirror 1
    Nov 24 04:03:17 Tower kernel: BTRFS warning (device sde1): csum failed root 5 ino 182291 off 4379881472 csum 0x1616fb61 expected csum 0xcbd3dbb1 mirror 2
    Nov 24 09:19:19 Tower kernel: BTRFS warning (device sde1): csum failed root -9 ino 282 off 951992320 csum 0x47d58bec expected csum 0x56997f79 mirror 1
    Nov 24 09:19:19 Tower kernel: BTRFS warning (device sde1): csum failed root -9 ino 282 off 951992320 csum 0x47d58bec expected csum 0x56997f79 mirror 1
    Nov 24 09:22:05 Tower kernel: BTRFS warning (device sde1): csum failed root -9 ino 283 off 951992320 csum 0x47d58bec expected csum 0x56997f79 mirror 1
    Nov 24 09:22:06 Tower kernel: BTRFS warning (device sde1): csum failed root -9 ino 283 off 951992320 csum 0x47d58bec expected csum 0x56997f79 mirror 1

     

  11. Syslog does not show files:

    Nov 24 13:01:51 Tower kernel: BTRFS info (device sde1): scrub: started on devid 1
    Nov 24 13:01:51 Tower kernel: BTRFS info (device sde1): scrub: started on devid 2
    Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
    Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
    Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0
    Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0
    Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 6, gen 0
    Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 6, gen 0
    Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 7, gen 0
    Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 8, gen 0
    Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413978710016 on dev /dev/sdk1
    Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 9, gen 0
    Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413913239552 on dev /dev/sdk1
    Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413913341952 on dev /dev/sdk1
    Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413913444352 on dev /dev/sdk1
    Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 10, gen 0
    Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413915201536 on dev /dev/sdk1
    Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413915303936 on dev /dev/sdk1
    Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413915406336 on dev /dev/sdk1
    Nov 24 13:03:22 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413916004352 on dev /dev/sdk1
    Nov 24 13:03:23 Tower kernel: BTRFS error (device sde1): fixed up error at logical 1413978824704 on dev /dev/sdk1
    Nov 24 13:03:23 Tower kernel: BTRFS error (device sde1): unable to fixup (regular) error at logical 1413979930624 on dev /dev/sdk1

     

  12. 14 minutes ago, JorgeB said:

    Yes, that means that there's corrupt data and the balance will abort, you can run a scrub to find out the corrupt files(s), then delete them or restore from backup, also good idea to run memtest.

    *sigh* ... how do I get a list of files?

    I'm running scrub and this is the status already:

    Error summary:    csum=35
      Corrected:      4
      Uncorrectable:  31
      Unverified:     0

     

    These are software errors, correct?

    Smart does not indicate a problem, this is also a new disk.

  13. On 11/22/2020 at 10:01 AM, JorgeB said:

    Due to a current btrfs bug you need to run the balace to single (or any other profile) twice.

    Could you just confirm to me if converting from single to raid 1 does not lose data? (not stated in faq nor unraid gui)

     

    I just added a disk to a cache pool from 1 to 2 and unraid made it single. (I believe this is the default according to the faq)

    So this is the current state (2 states, related to the btrfs bug?):

    Data, RAID1: total=42.00GiB, used=24.68GiB
    Data, single: total=1.18TiB, used=1.16TiB
    System, DUP: total=8.00MiB, used=176.00KiB
    Metadata, DUP: total=2.00GiB, used=1.69GiB
    GlobalReserve, single: total=512.00MiB, used=0.00B

    I have enough space available so nothing will happen to my data right?

    What would happen if there was not enough space?

  14. I'm trying to create a JBOD cache pool in 6.9beta35.

    I don't know if this is a bug or I'm just doing it wrong so...

    From what I understand from the below post I have to set it to single mode

     

    When I do this "convert to single mode" (it's a 14TB and 8TB disk) the GUI says it's 16TB.

    I also see the same write speeds to both disks, giving the impression it's RAID 1

     

    Balance status:

    Data, RAID1: total=4.00GiB, used=2.97GiB

    Data, single: total=1.00GiB, used=0.00B

    System, RAID1: total=32.00MiB, used=16.00KiB

    Metadata, RAID1: total=1.00GiB, used=3.94MiB

    GlobalReserve, single: total=3.78MiB, used=16.00KiB

     

    If I execute "perform full balance", it just reverts to RAID 1 status.

     

    Can anyone tell me what I'm doing wrong or should I post this as a bug in beta?

    Maybe I have to jump through a few hoops like removing one disk -> single mode -> add disk?

     

    thanks!

  15. On 11/3/2020 at 1:33 PM, JorgeB said:

    Fragmentation yes, increased write amplification not so good, since it can reduce the SSD life.

    So I read about the "bug" that causes many writes to sdd's especially evo...

    Mine have a 1200TBW and are around 1500TBW now (in 2 years time) :(

    In the new beta there is a solution, but there also issues.

    My thought is, can I upgrade to the new beta, recreate the cache (on new drives) with the new partition layout and revert back to 6.8.3 if the need arises?