6.8.3 Disk writes causing high CPU


Recommended Posts

  • Replies 210
  • Created
  • Last Reply

Top Posters In This Topic

  • 2 weeks later...
On 9/22/2020 at 1:44 PM, jonp said:

Any insights / information you can provide would be much appreciated. We really need to narrow down reproducable steps.

Sent from my Pixel 3 XL using Tapatalk
 

I saw that beta 29 is out, does it address this in any way?

Also, happy to provide any data that might be helpful... If you can tell me how to collect it.

Link to comment
  • 1 month later...

I'm having the same issue of high disk writes on btrs cache with dockers. This seems to be very well known... 

 

My SSDs are are 35 days old, my server didn't do much (the mover barely moves anything) and yet, 25 TBW on both SSDs. They're 500G SSDs, so that's 50x their size in a month with a server idling 99.999% of the time. Insane.

 

 

Link to comment
7 hours ago, dnLL said:

I'm having the same issue of high disk writes on btrs cache with dockers. This seems to be very well known... 

 

My SSDs are are 35 days old, my server didn't do much (the mover barely moves anything) and yet, 25 TBW on both SSDs. They're 500G SSDs, so that's 50x their size in a month with a server idling 99.999% of the time. Insane.

 

 

There are fixes in the 6.9.0 beta series to address this.

Link to comment
  • 2 weeks later...

Still a problem, this shows server temps and fan speeds, etc.... This is when I have it set to download and unpack files at 02:00

 

image.thumb.png.8938cc51339da39b3f126dc1f855fcea.png

 

Any chance this gets fixed soon?

Alternatively, @jonp / @itimpi do you think maybe a quicker fix for me would be to rebuild unraid from scratch?

 

I mean one of the other side affects is that SSD write speeds are way way down, to 80MBps range... which also sucks.

Edited by CowboyRedBeard
Link to comment

I guess I will add to this as myself and a friend have same issues.
When writing to array from off server (like unloading gopro footage) our cpu is absolutely pinned causing issues with all dockers and the like, essentially rendering dockers useless while transfer is active.

Anything useful we can provide to help resolve the issue?

Link to comment
  • 2 weeks later...

I know it is you guys :-). we've had a hell of a time trying to recreate this issue. And then we thought we had it figured out with that SSD partitioning issue, but apparently that's not the silver bullet for everyone. I'm hoping to find some time over the holidays to really beat again at the server in the lab and see if I can reproduce. That is the key at this point. We have to be able to reproduce this issue to have any chance of solving it. Alternatively, if anybody has a system that does not have production data on it that is exhibiting this issue, I may be interested in requesting remote access to that box directly. I just don't want to do this to anyone's system that has data on it that they are concerned about.

Sent from my Pixel 3 XL using Tapatalk

Link to comment
17 minutes ago, jonp said:

I know it is you guys :-). we've had a hell of a time trying to recreate this issue. And then we thought we had it figured out with that SSD partitioning issue, but apparently that's not the silver bullet for everyone. I'm hoping to find some time over the holidays to really beat again at the server in the lab and see if I can reproduce. That is the key at this point. We have to be able to reproduce this issue to have any chance of solving it. Alternatively, if anybody has a system that does not have production data on it that is exhibiting this issue, I may be interested in requesting remote access to that box directly. I just don't want to do this to anyone's system that has data on it that they are concerned about.

Sent from my Pixel 3 XL using Tapatalk
 

Well, mine is in production with data... but I'd be happy to run some tests and provide metrics / data / output...

I reproduce it daily. 😪

Link to comment
  • 1 month later...

hi, is there any progress o this? i'm a new unraid user and it appears i'm facing a similar issue. cpu spikes while writing on the ssd cache. rtorrent is maxing out with 100% cpu although i only have few torrents running. when the mover starts it gets even worse, dockers barely useable, so currently i'm only letting it run at night. any help is appreciated, i can offer system diags if it helps.

Link to comment
On 2/19/2021 at 3:15 AM, CowboyRedBeard said:

I've not seen a fix yet, I've been experiencing it for nearly a year.

does it make sense to disable the cache for media shares entirely and only use it for appdata? how do you circumvent this until it is adressed in a new unraid version? i'm still on trial so any additional information is very much appreciated!

Link to comment
  • 2 weeks later...

I've been able to limp it along by scheduling large write I/O jobs, but that might not work for everyone

I'm actually thinking about building another install just to see if it goes away

 

Additionally... I've upgraded from beta 25 to 6.9.0 and the issue persists 

Edited by CowboyRedBeard
Link to comment
  • 5 weeks later...

I just discovered the same issue. Copying 300 GB to my BTRFS encrypted Cache (RAID 5) makes the IO Wait time go through the roof, rendering the server useless for the time of copying. It completely normalises after the transfer is complete. 

 

I am running 6.9.1. Happy to run tests and provide information, as I can see this thread has been open for a long time. I sympathise with you @CowboyRedBeard

Link to comment
  • 1 month later...
  • 3 months later...

Hi,

 

New unraid user here, two weeks fighting with it trying to solve this issue. I would like to bump this thread since this topic really looks like the issue I am experiencing. Every time I need to write or read from the array the CPU goes crazy and the IO Wait skyrockets. I have tried different things I have been collecting from different threads:

 

1. Download torrents to cache. That is ok, but my cache is 500Gb and I download that amount every day so I have qBitorrent move the files after downloading to the array, this causes the issue when the move happens.

2. Rclone uploads to the cloud, this would read a folder from the array and upload it to Google. Same issue, CPU and IO Waits go to the moon. I have scheduled the upload during the night.

3. I have changed the /config in docker from /mnt/user/appdata to /mnt/cache/appdata as suggested in other threads. Same result.

4. I have disabled the Tunable (enable Direct IO) under Global Share Settings as suggested in another thread. Same result.

5. Moved all the appdata share to cache with mover. Same result.

6. Changed my rclone upload script pointing to upload folder from /mnt/user/download to /mnt/disk1/download to try to bypass Unraid's SHFS. Same result, posting screenshot running rclone upload:

 

224182495_Capturadepantalla2021-09-09221152.thumb.png.eb564c3e2955e34ab77d44024d65562d.png

 

7. Folder caching plugging is not installed. 

8. Cache filesystem and array filesystem matches (XFS).

9. My CPU governor is Performance.

 

To be honest I am out of ideas and I don't know where else to look for information on how to solve this issue, but it really desperates me, as I can't use the server as a media center to serve content to my TV since the content freezes all the time (Freezes on TV match CPU spikes due to this issue)

Any advise?

EDIT: By the way this is on latest stable version 6.9.2

 

Edited by Funes
Link to comment

I've been waiting over a year to see resolution to this problem... I've actually changed how I use unraid to mitigate it's affects on applications and users.


I've not had the time, but thought about doing a fresh install in hopes of fixing it since this install of unraid is pretty old and even has gone through 2 physical servers... But new users seeing it makes me think that won't help.

Link to comment
  • 1 month later...
6 hours ago, JorgeB said:

Sorry if this was already suggested, I don't remember, but there have been some reports where it helps for servers with lots of RAM, install the Tips and Tweaks plugin and set "vm.dirty_background_ratio" to 1 and "vm.dirty_ratio" to 2, then test to see if it makes it any better.

 

What does that do? I'm assuming this is something to do with VMs, which I don't understand how that'd have an effect on general OS disk I/O

Mine are currently 10 & 20 in a server with 128G of RAM

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.