• [6.9.0-Beta30] excessive writes on ssd pool


    caplam
    • Minor

    I have a problem that i've not seen so far in the forum.

    In 6.8.3 i had an excessive write problem. It's known and related to partition btrfs.

    I upgraded to beta30 directly from 6.8.3

    Yesterday i received new ssds so i setup a new pool (2 ssd) and make it my cache pool. It stores appdata, domains and system.

    I continue to have high writes rates on loop2 devices. 

    https://forums.unraid.net/topic/97902-getting-rid-of-high-ssd-write-rates/

    https://forums.unraid.net/bug-reports/prereleases/unraid-os-version-690-beta30-available-r1076/?do=findComment&comment=11066

     

    So i stop docker service and restart it without loop device using the directory option. Now /var/lib/docker is mapped to a dedicated share.

    But i continue to have excessive writes

    here is the result of iotop -aoP on unraid server after 1 hour:

    Total DISK READ :       0.00 B/s | Total DISK WRITE :       4.19 M/s
    Actual DISK READ:       0.00 B/s | Actual DISK WRITE:       7.88 M/s
      PID  PRIO  USER     DISK READ DISK WRITE>  SWAPIN      IO    COMMAND                                                                        
    28332 be/4 root         51.46 M      3.99 G  0.00 %  0.13 % qemu-system-x86_64 -name guest=wazo,debu~ny,resourcecontrol=deny -msg timestamp=on
    27768 be/4 root         80.00 K      2.96 G  0.00 %  0.07 % qemu-system-x86_64 -name guest=Hermes,de~ny,resourcecontrol=deny -msg timestamp=on
    24611 be/4 root          2.13 M    848.54 M  0.00 %  0.06 % shfs /mnt/user -disks 31 -o noatime,allow_other -o remember=330
    19819 be/4 root        100.00 K    507.45 M  0.00 %  0.03 % qemu-system-x86_64 -name guest=PiHole,de~ny,resourcecontrol=deny -msg timestamp=on
    21224 be/4 root          0.00 B    218.03 M  0.00 %  0.01 % [kworker/u65:10-btrfs-endio-write]
    28870 be/4 root          0.00 B    169.11 M  0.00 %  0.01 % qemu-system-x86_64 -name guest=Apollon,d~ny,resourcecontrol=deny -msg timestamp=on
    21422 be/4 root          0.00 B    159.13 M  0.00 %  0.00 % [kworker/u65:1-btrfs-endio-write]
    27287 be/4 root          0.00 B    139.56 M  0.00 %  1.22 % dockerd -p /var/run/dockerd.pid --log-op~ --log-level=error --storage-driver=btrfs
    15717 be/4 root          0.00 B    132.48 M  0.00 %  0.00 % [kworker/u65:2-btrfs-endio-write]
    25364 be/4 root          0.00 B    130.80 M  0.00 %  0.01 % [kworker/u65:7-events_unbound]
    10515 be/4 root          0.00 B    126.08 M  0.00 %  0.00 % [kworker/u65:9-btrfs-worker]
    10708 be/4 root          0.00 B     97.09 M  0.00 %  0.00 % [kworker/u65:4-btrfs-endio-write]
    10514 be/4 root          0.00 B     94.36 M  0.00 %  0.00 % [kworker/u65:0-btrfs-endio-write]
    26862 be/4 root          0.00 B     68.48 M  0.00 %  0.00 % [kworker/u65:3-btrfs-endio-write]
    22073 be/4 root          0.00 B     55.11 M  0.00 %  0.00 % [kworker/u66:7-btrfs-endio-write]
    13555 be/4 root          0.00 B     52.02 M  0.00 %  0.00 % [kworker/u66:0-btrfs-endio-write]
    13144 be/4 root          8.00 K     51.37 M  0.00 %  0.00 % [kworker/u66:14-btrfs-endio-write]
    10269 be/4 root          0.00 B     50.30 M  0.00 %  0.00 % [kworker/u66:2-btrfs-endio-write]
    25365 be/4 root          0.00 B     49.25 M  0.00 %  0.00 % [kworker/u66:5-btrfs-endio-write]
    16626 be/4 root          0.00 B     48.81 M  0.00 %  0.00 % [kworker/u66:4-btrfs-endio-write]
     3032 be/4 root          0.00 B     41.62 M  0.00 %  0.00 % [kworker/u66:3-btrfs-endio-write]
    10709 be/4 root          0.00 B     40.86 M  0.00 %  0.00 % [kworker/u65:11-btrfs-endio-write]
    10710 be/4 root          0.00 B     37.89 M  0.00 %  0.00 % [kworker/u65:12-btrfs-endio-write]
     8224 be/4 root          0.00 B     30.77 M  0.00 %  0.00 % [kworker/u66:6-btrfs-endio-write]
     2808 be/4 root          0.00 B     27.78 M  0.00 %  0.00 % [kworker/u66:1-btrfs-endio-write]
     8142 be/4 root          0.00 B     10.25 M  0.00 %  0.01 % [kworker/u64:1-bond0]
     3432 be/4 103           0.00 B      7.12 M  0.00 %  0.00 % postgres: 10/main: stats collector process
     8848 be/4 nobody        8.00 K      2.38 M  0.00 % 99.99 % mono --debug Sonarr.exe -nobrowser -data=/config
    26116 be/4 nobody       17.23 M      2.20 M  0.00 %  0.01 % mono --debug Radarr.exe -nobrowser -data=/config

    The first 2 line are 2 vms i can't post the same result for these commands as it was done in a ssh session in mremotng (no copy avaliable)

    but for the host hermes the amount of data written was around 10 times less and 20 times for the host wazo.

    It doesn't involve loop3 device. I don't know where it writes. 

    Each guest has a single vdisk raw format stored in domains share using virtio driver.

     

    I don't really know where and how investigate.

     




    User Feedback

    Recommended Comments

    i read that. Does this mean there will be no solution to that?

    In your post and followings it's discussed about loopback and overhead. But here i don't see excessive writes on loop3 device so i guess the writes are on vdisks. 

     

    4 Gb for one hour on a vm that does almost nothing i find that huge considering inside the vm iotop show 200MB activity on disk. This vm is an ipbx with almost no traffic. It's my home line with no calls today.

    The other guest is a home automation system (jeedom) based on php and mysql. Inside the vm the activity is 10 times less than what we see outside.

    I find that the amplification is massive. And it doesn't concern all the vm. I have others with no problem.

    I have these 2 vm since a long time (rebuild them from time to time ) and before unraid they were running on proxmox on a lvm pool (formatted in ext4 if i remeber correctly).

    If i can't have vm running on a pool i'll probably consider another system. 

    Perhaps the choice of raw for vdisk is not the best on a btrfs pool. I must admit i don't really understand all the implications.

    I don't get the point of using only btrfs for the pool and moreover we don't even have a gui for managing snapshots.

     

     

    Link to comment
    15 minutes ago, caplam said:

    But here i don't see excessive writes on loop3 device so i guess the writes are on vdisks. 

    So are mine, and like 90% or more are from one Windows Server 2012 R2 VM, I also have WIndows 8.1 and Windows 10 VMs and those don't write much, but total writes were already reduced 15x since -beta25, like mentioned btrfs also will have some write amplification, there's a study about that, I can live with 200GB per day, just couldn't with 3TB per day.

     

     

    • Like 1
    Link to comment

    wow this study is astonishing. I will read that but it's question of 30 times write amplification. 

    I don't even understand why btrfs is still being used. Performance are ridiculous.

    I hope unraid will offer an alternative.

    Link to comment
    Just now, caplam said:

    I hope unraid will offer an alternative.

    Hopefully we'll get ZFS in the near future, and that will have some advantages over btrfs (also some disadvantages), but I fear that write amplification might also be an issue, since it's a also a COW filesystem.

    • Like 1
    Link to comment

    and if we use qcow2 for vdisks on ext4. That was the cas when i was using proxmox and i never had such a problem.

    At this time i had a small 256Gb ssd which i used during 4 years and it was still ok when i stopped my cluster.

    Link to comment

    i fired up my grafana docker and i can see that the average write rate on one of the ssd is around 6,5MB/s.

    so it will write more than 500 GB a day. 

     

    edit this mean my ssds will last 13 month 🤬

    Edited by caplam
    Link to comment
    10 minutes ago, caplam said:

    and if we use qcow2 for vdisks on ext4. That was the cas when i was using proxmox and i never had such a problem.

    If you don't need a pool or the other btrfs features like snapshots xfs should give much better results.

    Link to comment
    7 hours ago, JorgeB said:

    If you don't need a pool or the other btrfs features like snapshots xfs should give much better results.

    Does unraid support a RAID1 pool using XFS?

    Link to comment

    if limetech is offering zfs, do you know if this will be a complete zfs array to replace the entire array and cache pool or the ability to make cache pool a zfs pool ?

    If this the first we'll lose the ability to have different size disks.

    For now i'm converting my cache pool to single disk and then i'll make 2 single ssd pools formatted with xfs. I'll use one as cache for array and docker, the other one for domains share.

     

    edit: i remember having my synology nas formatted with btrfs and not having such issues.

    Edited by caplam
    Link to comment
    9 minutes ago, caplam said:

    if limetech is offering zfs, do you know if this will be a complete zfs array to replace the entire array

    If/when it happens I expect to be the same as btrfs, i.e., it can be used as an independent filesystem for data devices and single/mirror/raid for pools, so all of Unraid's array flexibility options can be kept.

    Link to comment

    So my server is back online with 2 single ssd pools formatted with xfs.

    I now use docker-xfs.img for docker image.

    Based the first hours activity, i can say that i have no more excessive writes.

    I will stay away from btrfs for the future.

     

    I found one strange thing on my vm explaining the excessive writes on vm-pool.

    For an unknown reason in vm setting memory assignement has changed from 2GB to 1GB. Normally it would be ok but yesterday at midnight it started to write at 20MB/s rate. The vm was swapping. Normally cache on this vm is mounted on tmpfs.

    So i shut of the vm and assign 4GB of memory: no more writes. 

    I've never seen that before. 

    So i checked other vm and the other vm which was writing a lot had it's memory down to 1GB (normally 2GB).

    Btrfs problem apart, i would say that was a bug in vm settings has with correct memory settings i have no more excessive writes.

     

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.