6.8.3 Disk writes causing high CPU

CowboyRedBeard · November 8, 2021

Tried that (rebooted the server even)... doesn't seem to make a difference

CowboyRedBeard · November 8, 2021

JorgeB · November 8, 2021

38 minutes ago, CowboyRedBeard said:

doesn't seem to make a difference

Shame, it was worth a try.

JorgeB · November 10, 2021

FYI this is the post where a user reported this helped, and the symptoms looked similar to yours:

https://forums.unraid.net/topic/114827-lockups-when-parity-is-enabled/?do=findComment&comment=1047923

CowboyRedBeard · February 4, 2022

So at this point, I've suffered through this by scheduling tasks in the wee hours for nearly 2 years.

But I've begun using the server for more workloads than just Plex and infrastructure dockers again. (I had originally built this box to help with VMs for Crypto projects + Plex)

But at this point, the high IO wait is actually bad enough to raise the fans and CPU / PCM temps for sustained periods. It's time to add more storage and change a few other things on top of it.

So here's my question: I started unRAID on this box back in 2019. A few hardware changes and however many versions later I still have this problem. Would rebuilding it from scratch potentially help alleviate this problem?

And if it's worth a shot, what is a good general strategy? What other things should I consider?

@Squid @trurl

CowboyRedBeard · February 10, 2022

Guys any help would be much appreciated. I'm at a loss with this. Today I did a test....

I stopped and disabled VMs / Dockers
Moved everything off the cache

Reformatted that drive from XFS to BTRFS

Moved everything back...

Whamo... When doing a big download with Sabnzbd I got decent performance.

Then later today, doing the same sort of workload I'm stuck at roughly 80MiB/s on writes again... How is this? It can't be a hardware thing??

Squid · February 10, 2022

On 2/4/2022 at 10:10 AM, CowboyRedBeard said:

But at this point, the high IO wait is actually bad enough to raise the fans and CPU / PCM temps for sustained periods

That seems very strange because during IO wait the processor is basically at idle waiting on the IO to complete.

CowboyRedBeard · February 11, 2022

Yeah, but if you watch the threads... those are hung at 100% so maybe that's why?

And I track fans / CPUs via IPMI. You can see it here (mover kicks off at 5:00)

CowboyRedBeard · February 11, 2022

I have a different, new SSD that I'm thinking of trying this weekend. However, it seems to me it has to be some sort of OS or OS / hw configuration thing since I was able to get that freshly formatted drive to to perform at 300MiB/s and then the next work load and every one beyond that, I can't get any higher than 80MiB/s

I'm down to do tests, I just don't know what to do...

CowboyRedBeard · February 11, 2022

Like here is me copying an 85GB file, across the LAN via SMB to a share which uses the cache drive. As you can see, the server is receiving at nearly a full 1Gbps on the network interface, and it's flushing the file to the cache drive in bursts of around 400MiB/s... until about half way through the file copy which was about 5minutes to this point. Then the system falls back to a steady stream of 80MiB/s of file write speed... ¯¯\_(ツ)_/¯¯ So it doesn't seem to me to be a hardware problem.

I've also set the "dirty ratio" in Tips & Tweaks to 1% & 2% ... if it's at default this is way worse.

Squid · February 11, 2022

16 hours ago, CowboyRedBeard said:

those are hung at 100% so maybe that's why

The dashboard takes into consideration I/O wait on its graphs. Depending upon how you look at it it's either right or wrong. It's wrong because the core isn't actually running at 100% (rather it's idle waiting for the data transfer and the processes can't continue without it) or right because the core isn't able to do anything else while it's waiting for the transfer to happen (so it's effectively at 100% from the user's point of view)

Squid · February 11, 2022

16 hours ago, CowboyRedBeard said:

(mover kicks off at 5:00)

How long is mover taking to run. On one of your old diagnostics every day at 5:30 you had the drives doing a trim at 5:30. During that trim operation, all transfers to and from the SSDs are effectively paused due to I/O wait.

CowboyRedBeard · February 11, 2022

I recently moved trim to 08:30 and have mover set to start at 05:05 ... mover takes different amount of time based on how much was downloaded by SAB that day.

Having recently upgraded netdata and other things I don't have any recent history examples. I'll try to load up some files for mover tonight and post those results tomorrow.

CowboyRedBeard · February 12, 2022

So this is last night, downloads start at 1:30... which you can see it goes up to 430'ish MiB/s... and then IO Wait goes up as high as 22% and write speeds level off at 80MiB/s

Then this is when the mover starts:

Squid · February 12, 2022

Pause the downloads while you are post processing / unpacking? Rearrange your mappings so that when the system moves it to the cache enabled share after its finished so that it can do a simple rename instead of a copy / delete operation.

CowboyRedBeard · February 13, 2022

I have Sab using a cache enabled share, I guess pausing during post processing may help... but that's really just another way to mask the issue without fixing it.

I'd love to figure out why this is happening

dlandon · February 17, 2022

I'm a little late to this party and wasn't able to determine your current Unraid version. What Unraid version are you running?

CowboyRedBeard · February 17, 2022

Hi, welcome to the club! haha

I'm currently on 6.9.2 but have had this issue since 6.7

dlandon · February 18, 2022

And your network and disk controller hardware?

CowboyRedBeard · February 18, 2022

Supermicro X9DRi-LN4+ the onboard controller.

The cache drive is connected to a SATA 3 port on the motherboard. The other SSD that's not assigned to the array (VMs on this) is on a SATA 3 port also. I've done tests to/from those.

And I've even tried a PCIE SATA controller that I have in the machine.

The network is the onboard ethernet from the motherboard. Which doesn't seem to be a bottleneck at all, I can copy at a full 1Gbps on that

dlandon · February 18, 2022

First, I apologize if I walk you back through things you've already done, but I'd like to get an assessment of where you are so we can do some troubleshooting.

First look at your SSD disks and be sure they are formatted like this:

1166968895_SSDFormat.png.d4dff17c0ae49e68a7a912ea25a738d1.png

For best operation, they should show 1 MiB-aligned.

Give me a screen shot of the Tips & Tweaks page so I can see what you've 'Tweaked'.

CowboyRedBeard · February 18, 2022

No apologies needed, I appreciate the help!

image.png.5d3ec1d88ef998f75fcaf7f0237a998d.png

dlandon · February 18, 2022

Ok, I like all your settings. Now review the issues with me. Lets start with non spinners and work our way up from there.

As I understand you are not seeing good performance with a SSD device? Give me some details.

CowboyRedBeard · February 18, 2022

Probably the posts on this page are a very depiction of the problem as it appears currently. But essentially, with any file write process to cache I end up with high i/o wait times. I will see the cache drive able to write at around 300MiB/s for just a minute or two and then after that it will only give around 80MiB/s after.

This shows up in netdata and on the unraid dashboard as in the following posts:

And in that second one you can even see the CPU temps rise, which as was mentioned here was thought to be odd since it's just "waiting" ... but I monitor CPU temp / Fan speed with IPMI and then send that data to influxDB where I can trend it (which is that graph in the second post)

Happy to conduct any tests you think are meaningful and post the results here. But primarily I see this with cache drives only (spinning disks don't obtain the same sort of speeds so I guess the system can keep up with them). And, I also see this if it's Sab downloading / unpacking a file, or transferring data to or from a non-array / non-cache SSD.

Also, earlier on in this thread, I had an Intel Optane NVME drive in the box on PCIE slot and was able to get crazy sustain write speeds to it without this issue occurring. I've since pulled that out, but could put it back in for testing if needed.

dlandon · February 18, 2022

6 minutes ago, CowboyRedBeard said:

I will see the cache drive able to write at around 300MiB/s for just a minute or two and then after that it will only give around 80MiB/s after.

What you are seeing is the disk caching happening with Linix. It will initially fill the ram disk cache and then start writing to disk when the disk cache gets to a set threshold. These are the Disk Cache settings iin Tips & Tweaks. If I recall, you have 128GB of ram. @ 1% dirty background, that's 1.28GB. A file transfer will fill that ram cache before committing to disk. That's why you are seeing high speeds on a reboot, because the disk cache is empty.

You're theoretical max is about 100MB/s your 1GB network. Honestly, the 80MB/s is reasonable really.

I want you to do some adjustments in Tips and Tweaks to the network though and let's see if it helps:

Disable NIC Flow Control.
Disable NIC Offload.
Set Ethernet NIC Rx Buffer to 1024.
Set Ethernet NIC Tx Buffer to 1024.

These settings help on Intel NICs at times. See if that improves your speed.

As for the high CPU use and temps, I have no idea what's happening there. I'm not an expert on iowaits, but could they also be from NIC io?

6.8.3 Disk writes causing high CPU

Recommended Posts

Link to comment

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Posted Images

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation