Hi @onno204,
Indeed, it looks like you did way more research in this issue than i did.
Personally my problems started a few months back, i think February, not exactly sure, when i upgraded first to 6.12.2. At that point i had neither Tdarr ( the only -arr ever installed on this server ), neither Grafana or Prometheus, these 2 were installed to try to debug the slowness.
I agree with you that -arr suite can cause this issue, but this is only because of the high I/O they require.
I have personally never used SabNZBD, so my issue is definetely not coming from there.
In my testing, what i did:
-A full appdata/docker/vm backup and deleted everything, disabled docker, disabled vm. Issues persisted when doing high I/O operations.
-After this, decided to uninstall ALL plugins, was left with a "as clean as it gets without new install" server. Issues persisted when doing high I/O operations.
-Disable bond0 and removed the extra NIC had an old Intel PRO 1000/PT set up in Transmit Load Balancing. Left the server only on eth0 (Realtek, but meh, consumer board). Issues persisted when doing high I/O operations.
-Since zfs is a new implementation, replaced my 3 m.2 nvme cache drives with a single xfs formatted m.2 nvme drive. Issues persisted when doing high I/O operations.
-Recreated unraid usb with fresh 6.12.10 install using USB Creator tool and only replaced config file. Issues persisted when doing high I/O operations.
Considering all this, i decided to rollback to 6.11.5. Installed a fresh copy with unraid usb creator, replaced config folder with the old. Everything went smooth and issues is gone, now with all previous plugins and dockers running. I don't have community apps working, but that's not an issue for me at this time.
The only thing that changed was the unraid version, so it's clearly an issue in the OS. How and why is not everyone affected by it beats me, but it may have something to do with kernel drivers for different hardware platforms. I'm not that technical unfortunately.
I agree with you that rollback is not the best option, but i personally decided to not continue with unraid. I have an order for a 730xd and some drives coming which will be running normal raidz2 using another solution. This is not because i have not loved unraid, i absolutely did and considering there are few people encountering this issue, i am positive it will keep growing. It is after all a perfect solution for a home server. However my use case has changed since i deployed this box and now data integrity and iops are way more important. Not a flaw of unraid, it's just not built for that.
I am unable to keep testing for this as i mentioned before, i already went to 6.11.5. What i did not test is high I/O usage on direct disk share bypassing the /mnt/user shares as i didn't have the time. Might pe worth looking into.
I hope you will be able to find the issue !