6.8.3 Disk writes causing high CPU


Recommended Posts

  • Replies 210
  • Created
  • Last Reply

Top Posters In This Topic

  • 2 months later...

So at this point, I've suffered through this by scheduling tasks in the wee hours for nearly 2 years.

 

But I've begun using the server for more workloads than just Plex and infrastructure dockers again. (I had originally built this box to help with VMs for Crypto projects + Plex)

 

But at this point, the high IO wait is actually bad enough to raise the fans and CPU / PCM temps for sustained periods. It's time to add more storage and change a few other things on top of it.

 

So here's my question: I started unRAID on this box back in 2019. A few hardware changes and however many versions later I still have this problem. Would rebuilding it from scratch potentially help alleviate this problem?

 

And if it's worth a shot, what is a good general strategy? What other things should I consider?

 

@Squid @trurl 

 

Link to comment

Guys any help would be much appreciated. I'm at a loss with this. Today I did a test....

 

I stopped and disabled VMs / Dockers
Moved everything off the cache

Reformatted that drive from XFS to BTRFS

Moved everything back...

Whamo... When doing a big download with Sabnzbd I got decent performance.

image.thumb.png.f826a2b95ed78ed23c96256cc3417688.png



Then later today, doing the same sort of workload I'm stuck at roughly 80MiB/s on writes again... How is this? It can't be a hardware thing??

Link to comment

I have a different, new SSD that I'm thinking of trying this weekend. However, it seems to me it has to be some sort of OS or OS / hw configuration thing since I was able to get that freshly formatted drive to to perform at 300MiB/s and then the next work load and every one beyond that, I can't get any higher than 80MiB/s

I'm down to do tests, I just don't know what to do...

Link to comment

Like here is me copying an 85GB file, across the LAN via SMB to a share which uses the cache drive. As you can see, the server is receiving at nearly a full 1Gbps on the network interface, and it's flushing the file to the cache drive in bursts of around 400MiB/s... until about half way through the file copy which was about 5minutes to this point. Then the system falls back to a steady stream of 80MiB/s of file write speed...  ¯¯\_(ツ)_/¯¯  So it doesn't seem to me to be a hardware problem.

I've also set the "dirty ratio" in Tips & Tweaks to 1% & 2% ... if it's at default this is way worse.

 

image.thumb.png.a5d80b87b27d769f7dcbd8f615b259cf.png

Link to comment
16 hours ago, CowboyRedBeard said:

those are hung at 100% so maybe that's why

The dashboard takes into consideration I/O wait on its graphs.  Depending upon how you look at it it's either right or wrong.  It's wrong because the core isn't actually running at 100% (rather it's idle waiting for the data transfer and the processes can't continue without it) or right because the core isn't able to do anything else while it's waiting for the transfer to happen (so it's effectively at 100% from the user's point of view)

 

Link to comment

Supermicro X9DRi-LN4+ the onboard controller.

 

The cache drive is connected to a SATA 3 port on the motherboard. The other SSD that's not assigned to the array (VMs on this) is on a SATA 3 port also. I've done tests to/from those.

 

And I've even tried a PCIE SATA controller that I have in the machine.

 

The network is the onboard ethernet from the motherboard. Which doesn't seem to be a bottleneck at all, I can copy at a full 1Gbps on that

 

Link to comment

First, I apologize if I walk you back through things you've already done, but I'd like to get an assessment of where you are so we can do some troubleshooting.

 

First look at your SSD disks and be sure they are formatted like this:

1166968895_SSDFormat.png.d4dff17c0ae49e68a7a912ea25a738d1.png

 

For best operation, they should show 1 MiB-aligned.

 

Give me a screen shot of the Tips & Tweaks page so I can see what you've 'Tweaked'.

Link to comment

Probably the posts on this page are a very depiction of the problem as it appears currently. But essentially, with any file write process to cache I end up with high i/o wait times. I will see the cache drive able to write at around 300MiB/s for just a minute or two and then after that it will only give around 80MiB/s after.

This shows up in netdata and on the unraid dashboard as in the following posts:

 

 

And in that second one you can even see the CPU temps rise, which as was mentioned here was thought to be odd since it's just "waiting" ... but I monitor CPU temp / Fan speed with IPMI and then send that data to influxDB where I can trend it (which is that graph in the second post)


Happy to conduct any tests you think are meaningful and post the results here. But primarily I see this with cache drives only (spinning disks don't obtain the same sort of speeds so I guess the system can keep up with them). And, I also see this if it's Sab downloading / unpacking a file, or transferring data to or from a non-array / non-cache SSD.

Also, earlier on in this thread, I had an Intel Optane NVME drive in the box on PCIE slot and was able to get crazy sustain write speeds to it without this issue occurring. I've since pulled that out, but could put it back in for testing if needed.

Link to comment
6 minutes ago, CowboyRedBeard said:

I will see the cache drive able to write at around 300MiB/s for just a minute or two and then after that it will only give around 80MiB/s after.

What you are seeing is the disk caching happening with Linix.  It will initially fill the ram disk cache and then start writing to disk when the disk cache gets to a set threshold.  These are the Disk Cache settings iin Tips & Tweaks.  If I recall, you have 128GB of ram.  @ 1% dirty background, that's 1.28GB.  A file transfer will fill that ram cache before committing to disk.  That's why you are seeing high speeds on a reboot, because the disk cache is empty.

 

You're theoretical max is about 100MB/s your 1GB network.  Honestly, the 80MB/s is reasonable really.

 

I want you to do some adjustments in Tips and Tweaks to the network though and let's see if it helps:

  • Disable NIC Flow Control.
  • Disable NIC Offload.
  • Set Ethernet NIC Rx Buffer to 1024.
  • Set Ethernet NIC Tx Buffer to 1024.

These settings help on Intel NICs at times.  See if that improves your speed.

 

As for the high CPU use and temps, I have no idea what's happening there.  I'm not an expert on iowaits, but could they also be from NIC io?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.