Hello everyone,
I'm writing here because I have a problem I'm not able to solve alone.
First here is my Unraid server configuration :
- CPU : Intel 8400T
- 16 GB of RAM
- Array composed of 2 SATA SSD (with 1 of those 2 drive being the parity drive)
Because I only have SSD in my server, I don't have a cache drive.
- 20 docker container, a lot of them have dedicated IPv4 and IPv6 on custom interface (not host not bridge to be accessible directly on my network). Docker is configured with a docker.img in BTRFS.
- 2 VM : my pfSense firewall with PCIe passthrough for my NIC, and my HomeAssistant VM
Description of the problem :
Since a few days, I found my Unraid server very unresponsive.
A lot of time, I cannot access my dockers (Plex, Nextcloud etc) web interfaces. I also cannot access Unraid Docker tab (or with a lot of pain), and every docker command (docker container stop for example) is painfully slow (in minutes)
On the other hand, my VM were working as expected, with normal speed.
So I tried to search what caused this behavior.
Searching the cause of the problem :
First I found that the load average (with htop) was very high, above 20. But CPU utilization with HTOP was low (5-10%). So I suspected IOWait.
By running iotop I can see multiple command using a lot of IO : "unraidd1", "docker ..." etc.
The IOWait was also confirmed by Netdata (here is an example but IOWait can go above 90%) :
So first I tried to see if this high IOWait was because of a particular docker container. But even shutting down all container, the IOWait keeps being high for a few minutes before coming back to normal value (<3%), so it looks as it was not because of a container but with docker itself. When IOWait come back to low value, the docker tab/docker commands become responsive again.
I also saw in Netdata that during high IOWait, the disk usage was at 100% on the parity drive but not on other drive.
A temporary solution :
So after a lot of searching I tried to disable parity (by stoping array and removing parity disk) in Unraid, and I do not have the high IOWait problem anymore.
I need help :
Does someone have an explanation and a solution for this problem ?
I didn't have this problem a few weeks ago...
IOWait is so high that I'm not able to sync parity again : if docker is enabled and my containers started, the problem happens immediately and parity sync is painfully slow : about 2MB/s on SSD, and every docker is not responsive ...
Do you think this is a hardware failure ?
I changed a few months ago the parity SSD Drive, so it's almost new.
You can find my Unraid diagnostic below.
Thanks a lot