• (6.12.3) Writes to ZFS pools constantally


    isvein
    • Annoyance

    Hello :)
    This seems to be a thing, problem many has noticed, but I could not find a bug report on it yet (maybe I did not look deep enough)

    I first came across the problem on Reddit and then found out I had the same situation.
    It looks to be related to docker, because if I/we shut down docker or does not run any dockers, nothing is written.
    I cant say for sure if this happened with zfs before 6.12.3, but people on reddit seems to have noticed this on 6.12.3 and not before.

    There is also the thing that zfs drives in array does not spin down if you have them on a timer, but this seems to be linked to ZFS-Master plugin IF you have the "Main" tab open as the plugin then reads the snapshot data and that wakes up the drive. But this does not explain the writes.

    I have tried to isolate it to a specific docker, with no luck.
    The writes also happens on zfs drives on the array, not only zfs-pools.

    Anyone else know more about this?

     

     

     

     

     

    oneroom-diagnostics-20230719-2154.zip




    User Feedback

    Recommended Comments



    The spin up is happening with out ZFS master installed so it's something about 6.12.x

     

    I also removed any ZFS drives from the main array so now the only ZFS drives are SSD caches. and 1 of those pools spins down (standby for ssd). so it's some but to do with 6.12.x and the main array.

     

    Possibly something built into the OS? It's not something that shows up in disk activity or open files.

     

    However I do also notice that my hard drive lights are not lining up showing data access so perhaps the drives are not spun up but are showing as spun up??

     

    Would expect the UPS usage to be lower though if the array drives were spun down...

     

    It happens with & without the cache plugin installed.

     

    It also happens if plex is stopped.

     

    moulin-rouge-diagnostics-20230719-2158.zip

    Edited by dopeytree
    Link to comment

    My Cache has conastant wites because of my UUD but My zfs_pool is also having constant rights and I never could figure out why. Though this was my first crack at zfs. I was noticing it on 6.12.2 too. I don't know if I had this pool set up on 12 or 12.1. 

     

    Looking at open files and file activity I can't find any reason for this. 

    vulcan-diagnostics-20230719-1602.zip

    Edited by GTvert90
    Link to comment

    I tested  now, sat drives spin down to 30min (just not to need to wait for hours) and kept away from the "main" page and they did indeed spin down. My arrays are 10 of 11 drives are zfs.
    But once I clicked on "Main", they all spun up again as expected pr info from @Iker (all zfs drives, the xfs one keeps off as also expected)

    Link to comment
    13 hours ago, dopeytree said:

    Possibly something built into the OS?

    Not saying that there isn't an issue, but it's not a general issue, I have multiple zfs pools and they stay spun down, e.g.:

     

    image.png

    Link to comment

    I don't think it's a zfs issue I think it's something to do with the main array regardless of format. 

     

    Will run a safe mode tonight to double check. 

     

    Is there another way to detail what process is requesting data?

    • Upvote 1
    Link to comment

    Just to have tested it, I checked if changing docker to folder instead of image made any difference, it did not

    Link to comment
    6 hours ago, JorgeB said:

    Not saying that there isn't an issue, but it's not a general issue, I have multiple zfs pools and they stay spun down, e.g.:

     

    image.png

    But do you get the random read and writes?

    Link to comment
    9 minutes ago, isvein said:

    But do you get the random read and writes?

    Nope, they stay down for days if not used.

    Link to comment

    Found out it seems to have nothing to do with docker at least.
    I cleared the R/W counter and waited around 20min with docker service turned off and as you can see, its only R/W on the ZFS drives, not the XFS one.
    I start to think this not a error or bug, but how zfs may work, going to test some more.

    drives.JPG

    Link to comment

    Ok, so I think I got it, maybe.
    I turned off arc cache on all datasets and then I restarted the server.
    When started, the arc used 7% of ram and it started to build up to 34% and as it did the read and writes increased too.
    So I think this has something to do with that zfs reads and writes data to ram "all the time".
    Clearly some data is cached to ram even if you tun it off on the datasets, else the zfs ram usage would not increase.

    But it would be good to get some info on this from someone who knows zfs better :)

    Link to comment
    10 hours ago, isvein said:

    When started, the arc used 7% of ram and it started to build up to 34% and as it did the read and writes increased too.

    The ARC should only start to get populated if you read/write from a zfs filesystem, do you have something installed like the cache dirs plugin that would generate reads?

    Link to comment
    26 minutes ago, JorgeB said:

    The ARC should only start to get populated if you read/write from a zfs filesystem, do you have something installed like the cache dirs plugin that would generate reads?

    I dont have that one installed, but maybe it is some other plugin, gonna check it out more :)
    It makes sense that the cache gets r/w`s as I have docker there.

    But so far it just started to fill up after reboot in like the first 5min.

    I do have the
    Dynamix File Integrity installed, but if that was generating writes and reads, I guess it would so on the xfs drive that was getting 0 r/w

    Link to comment

    Same with me. 
    I converted SSD Pool and cache to ZFS as well as 4 of my 8 HDDs in the Array.

    Even with Docker and VM disabled and no SMB activity, the ZFS drives keep spinning up. Very unlucky...
    2023-07-22_14h07_40.thumb.png.154699c994d010de62e3bbcc07907c10.png

     

    There should be a warning about that issue, so people don't progress with converting to zfs until this is solved.

     



    W480M VISION W
    Intel® Xeon® W-1350 @ 3.30GHz
     32 GiB DDR4 Single-bit ECC

    Link to comment

    I have 2 zfs nvme cache pools one is for data transfer and one for appdata

     

    the one for data only dose not have that constant writes but the cache with appdata have the same issue with constat writes

     

    edit: trunning docker off will stop the writes

    tried stopping all the containers but it was still writing 

    Edited by Alansari
    Link to comment

    Did a test in safe mode & it still spins up all disks in the array so we KNOW it is NOT a PLUGIN issue. 

     

    Next I will go through each container 1 by 1.

    Link to comment

    Culprit seems to be Dashdot container that is spinning up the main array and stopping it staying spun down.

    If you edit the container and turn 'privileged' to off it should stop the constant array spinning up.

    Edited by dopeytree
    Link to comment

    So, I suddenly have constant read access to all my disks formatted with ZFS (converted all my 14x disks to ZFS from XFS).

     

    I found the exact moment, when it happens: as soon as ZFS in the Dashboard is nearing 100% it jumps down to about 50% and starts the read accesses on the unraid array (I assume that's ARC-cache and as it fills up it starts clearing old cache, hence re-caching the filetree). That's probably also why cache dirs plugin does not work as it should.

     

    It seems like in my case, the filetree occupies 1.10GB of ARC before enabling dockers/VM's, which is 30% of default ARC-Size. I increased ARC-Size to 16GB (@32GB RAM). Let's see if this changes anything.

     

    So after further investigation, increasing the ARC just reduces the likeliness of this unnecessary re-caching event. I now found another setting to limit purging of metadata in arc. I added following to /boot/config/modprobe.de/zfs.conf and did a reboot:

     

    options zfs zfs_arc_max=16000000000
    options zfs zfs_arc_meta_min=8000000000

     

    Since then, I was able to fill my ARC to 100% and it did not invoke a filetree re-caching, yet.

     

    Update: Not quiet working yet to my liking. I now also disabled ARC for data on my unraid-array-disks. (set zfs primarycache=metadata diskX)

    Spoiler

    grafik.png.d3d735e5cd0b9680279018840479b0b9.png

     

    Edited by madejackson
    Link to comment

    So I've still not found a complete solution. It seems now the drives do spin down, but spin up again ~ twice an hour. Yeah, I had a couple of file accesses from dockers, but nowhere near that amount.

    Spoiler
    Sep  1 14:31:44 Tower emhttpd: read SMART /dev/sdo
    Sep  1 14:31:44 Tower emhttpd: read SMART /dev/sdl
    Sep  1 14:31:44 Tower emhttpd: read SMART /dev/sdi
    Sep  1 14:32:31 Tower emhttpd: read SMART /dev/sdm
    Sep  1 14:34:49 Tower emhttpd: read SMART /dev/sdk
    Sep  1 14:34:56 Tower emhttpd: read SMART /dev/sdh
    Sep  1 14:34:56 Tower emhttpd: read SMART /dev/sdf
    Sep  1 14:35:04 Tower emhttpd: read SMART /dev/sdg
    Sep  1 14:40:53 Tower emhttpd: read SMART /dev/sde
    Sep  1 14:45:14 Tower emhttpd: spinning down /dev/sda
    Sep  1 14:45:17 Tower emhttpd: read SMART /dev/sda
    Sep  1 14:50:01 Tower emhttpd: spinning down /dev/sdk
    Sep  1 14:50:17 Tower emhttpd: spinning down /dev/sdj
    Sep  1 14:50:17 Tower emhttpd: spinning down /dev/sdo
    Sep  1 14:50:20 Tower emhttpd: spinning down /dev/sds
    Sep  1 14:50:23 Tower emhttpd: spinning down /dev/sdm
    Sep  1 14:50:24 Tower emhttpd: spinning down /dev/sdr
    Sep  1 14:56:01 Tower emhttpd: spinning down /dev/sdh
    Sep  1 14:56:01 Tower emhttpd: spinning down /dev/sdg
    Sep  1 14:56:01 Tower emhttpd: spinning down /dev/sdf
    Sep  1 14:56:01 Tower emhttpd: spinning down /dev/sdn
    Sep  1 14:56:01 Tower emhttpd: spinning down /dev/sdq
    Sep  1 14:56:01 Tower emhttpd: spinning down /dev/sdl
    Sep  1 15:00:18 Tower emhttpd: spinning down /dev/sda
    Sep  1 15:00:21 Tower emhttpd: read SMART /dev/sda
    Sep  1 15:03:32 Tower emhttpd: spinning down /dev/sde
    Sep  1 15:03:32 Tower emhttpd: spinning down /dev/sdi
    Sep  1 15:05:53 Tower emhttpd: read SMART /dev/sdm
    Sep  1 15:05:53 Tower emhttpd: read SMART /dev/sdj
    Sep  1 15:05:53 Tower emhttpd: read SMART /dev/sdk
    Sep  1 15:05:53 Tower emhttpd: read SMART /dev/sdh
    Sep  1 15:05:54 Tower emhttpd: read SMART /dev/sdg
    Sep  1 15:05:54 Tower emhttpd: read SMART /dev/sde
    Sep  1 15:05:54 Tower emhttpd: read SMART /dev/sdr
    Sep  1 15:05:54 Tower emhttpd: read SMART /dev/sdf
    Sep  1 15:05:54 Tower emhttpd: read SMART /dev/sds
    Sep  1 15:05:54 Tower emhttpd: read SMART /dev/sdn
    Sep  1 15:05:54 Tower emhttpd: read SMART /dev/sdq
    Sep  1 15:05:54 Tower emhttpd: read SMART /dev/sdl
    Sep  1 15:05:54 Tower emhttpd: read SMART /dev/sdi
    Sep  1 15:06:25 Tower emhttpd: read SMART /dev/sdo
    Sep  1 15:15:23 Tower emhttpd: spinning down /dev/sda
    Sep  1 15:15:26 Tower emhttpd: read SMART /dev/sda
    Sep  1 15:24:52 Tower emhttpd: spinning down /dev/sdk
    Sep  1 15:25:19 Tower emhttpd: spinning down /dev/sdi
    Sep  1 15:25:21 Tower emhttpd: spinning down /dev/sdg
    Sep  1 15:25:21 Tower emhttpd: spinning down /dev/sde
    Sep  1 15:25:21 Tower emhttpd: spinning down /dev/sds
    Sep  1 15:25:21 Tower emhttpd: spinning down /dev/sdo
    Sep  1 15:25:23 Tower emhttpd: spinning down /dev/sdh
    Sep  1 15:25:23 Tower emhttpd: spinning down /dev/sdf
    Sep  1 15:25:23 Tower emhttpd: spinning down /dev/sdn
    Sep  1 15:25:23 Tower emhttpd: spinning down /dev/sdq
    Sep  1 15:25:23 Tower emhttpd: spinning down /dev/sdl
    Sep  1 15:25:25 Tower emhttpd: spinning down /dev/sdm
    Sep  1 15:25:25 Tower emhttpd: spinning down /dev/sdj
    Sep  1 15:25:27 Tower emhttpd: spinning down /dev/sdr
    Sep  1 15:30:28 Tower emhttpd: spinning down /dev/sda
    Sep  1 15:30:31 Tower emhttpd: read SMART /dev/sda
    Sep  1 15:33:21 Tower emhttpd: read SMART /dev/sdm
    Sep  1 15:33:21 Tower emhttpd: read SMART /dev/sdj
    Sep  1 15:33:21 Tower emhttpd: read SMART /dev/sdk
    Sep  1 15:33:21 Tower emhttpd: read SMART /dev/sdh
    Sep  1 15:33:21 Tower emhttpd: read SMART /dev/sde
    Sep  1 15:33:21 Tower emhttpd: read SMART /dev/sdr
    Sep  1 15:33:21 Tower emhttpd: read SMART /dev/sdf
    Sep  1 15:33:21 Tower emhttpd: read SMART /dev/sds
    Sep  1 15:33:21 Tower emhttpd: read SMART /dev/sdq
    Sep  1 15:33:21 Tower emhttpd: read SMART /dev/sdo
    Sep  1 15:33:21 Tower emhttpd: read SMART /dev/sdl
    Sep  1 15:33:21 Tower emhttpd: read SMART /dev/sdi
    Sep  1 15:33:43 Tower emhttpd: read SMART /dev/sdg
    Sep  1 15:33:43 Tower emhttpd: read SMART /dev/sdn
    Sep  1 15:45:31 Tower emhttpd: spinning down /dev/sda
    Sep  1 15:45:34 Tower emhttpd: read SMART /dev/sda
    Sep  1 16:00:34 Tower emhttpd: spinning down /dev/sda
    Sep  1 16:00:36 Tower emhttpd: read SMART /dev/sda
    Sep  1 16:04:53 Tower emhttpd: spinning down /dev/sdk
    Sep  1 16:05:17 Tower emhttpd: spinning down /dev/sdo
    Sep  1 16:05:19 Tower emhttpd: spinning down /dev/sdg
    Sep  1 16:05:19 Tower emhttpd: spinning down /dev/sds
    Sep  1 16:05:21 Tower emhttpd: spinning down /dev/sdh
    Sep  1 16:05:21 Tower emhttpd: spinning down /dev/sdl
    Sep  1 16:05:23 Tower emhttpd: spinning down /dev/sdm
    Sep  1 16:05:23 Tower emhttpd: spinning down /dev/sdj
    Sep  1 16:05:23 Tower emhttpd: spinning down /dev/sdr
    Sep  1 16:05:23 Tower emhttpd: spinning down /dev/sdf
    Sep  1 16:05:23 Tower emhttpd: spinning down /dev/sdn
    Sep  1 16:05:23 Tower emhttpd: spinning down /dev/sdq

     

     

    Edited by madejackson
    Link to comment

    Maybe list the dockers you are using. It is almost definitely one of those. 

    I'm now running 3x ZFS pools and have fixed this issue...

     

    Quote

     

    Culprit seems to be Dashdot container that is spinning up the main array and stopping it staying spun down.

    If you edit the container and turn 'privileged' to off it should stop the constant array spinning up.

     

     

    Link to comment

    Did anyone solve this as I get constant writes every 5 seconds?  I just thought this was related to ZFS and how zfs_txg_timeout works?  What's weird is I have 2 ZFS pools, one for media and one for dockers.  The docker one is constantly writing and I attributed it to the fact logs and other things are running in those dockers causing the writes every 5 seconds.  I'm trying to determine if tuning zfs_txg_timeout to a longer timeout value or some other ZFS tuning can be adjusted as it's noisy in the living room.

    Link to comment

    I guess I'm experiencing the same problem (I'm new of unraid and ZFS). But I've some writes in disks that are not part of any share, and this seems strange. All my disks have ZFS.

     

    709452827_Screenshot2023-12-17at00_37_17.thumb.png.42427df9a955d579a7b2648ddbceb328.png

     

    But if I check zstats for those disk it appears that no access has been made:

     281378459_Screenshot2023-12-17at00_57_42.png.33c95afcfe70fef6d5bf06b47e62c298.png

     

    Who do this allocation? Of course the value grows continuously

     

    Don't know if can help, but if I check the leaf vdev's I/O I have some data:

    191026363_Screenshot2023-12-17at01_19_37.thumb.png.279c339a7a25a6e769dc75d2c59ea6bb.png

     

     

    Link to comment

    Has there been an updates on this? Whenever I refresh Main, it firsts reads my 1 ZFS disk and then also writes to it, thus the parity drive spins up also. I only have 1 ZFS disk in the array, all others are XFS.

     

     

    Link to comment

    This morning at 4.26 the random writes activity on disks stops for me, this is the report of my disk12(sdn):

    1189047501_Screenshot2023-12-18at09_55_58.thumb.png.38f53969903f3e03aba2b7ba829accc0.png

     

    Checking syslog I have this: 

    cat /var/log/syslog | grep "Dec 18 04:26"
    Dec 18 04:26:20 littleboy monitor: Stop running nchan processes

     

    Is there something else I could check? I guess nchan is used by the gui as pub/sub for updates, but I can't understand how this could be related 

     

    Edited by skler
    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.