Unraid 6.7.2 - High disk usage


Recommended Posts

Hello Everyone,

I'm having quite a disturbing problem on my unraid box.

Every time I write something to my VideoLibrary Share - the whole system slows down.

Some CPU's are then on 100%, PLEX Docker stops responding, and the disks are writing/reading even if the copy is finished.

After 10min or so all is back to normal.

I can't understand why.

 

 

That CPU usage for a copy of a file, seems a little bit extreme...

image.png.75481994de8de209693ca92bcdf757b9.png

 

 

image.thumb.png.e42333e7812117dce9d93692c314810b.png

 

image.thumb.png.8ffd30a0763f88fedbdb16ecf92717bf.png

 

image.thumb.png.9274c97afc7730fc27efefae76098c7c.png

 

I've also tried to copy a file directly to a disk - but same issue occurred ...

Has anyone an idea why this is happening and how to fix it? It's really annoying.

 

Edited by TDA
Link to comment
1 hour ago, Benson said:

Abnormal slow disk in array ?Parity check speed normal ?

Not that I'm aware of.

Also I've tried to copy files directly to a disk, which is a WD RED 4TB - so it should be around 100mb/s.

What I found particularly strange is the fact that when the copy is finished disks still "do something" for 10min or so (checked with netdata).

 

For the parity check I don't know - I don't have it on a schedule - but speed of the last one I made was 107.9mb/s

Link to comment

Maybe fragmentation can play a role in all of this?
If yes - how should I defragment my disks?
Through the command:

xfs_db -r /dev/mapper/mdX ; I found out that I have 3 disks with the following fragmentation factor: 76.77%; 99.44%, 52.00%

 

Link to comment
On 7/13/2019 at 4:35 PM, TDA said:

the whole system slows down.

Please be more descriptive in what is slowing down.  Does it take longer to navigate around the webGui?

 

On 7/13/2019 at 4:35 PM, TDA said:

Some CPU's are then on 100%, PLEX Docker stops responding, and the disks are writing/reading even if the copy is finished.

Try disabling Plex and see if the same behavior occurs.  First step is to isolate the issue.

Link to comment
1 hour ago, jonp said:

Please be more descriptive in what is slowing down.  Does it take longer to navigate around the webGui?

 

Try disabling Plex and see if the same behavior occurs.  First step is to isolate the issue.

Please be more descriptive in what is slowing down.  Does it take longer to navigate around the webGui?

WebGUI isn't particularly slower - but it is.

But access on via unc isn't possible.

 

Try disabling Plex and see if the same behavior occurs.  First step is to isolate the issue.

I'm trying right now and it's not good at all ... the copy itself hangs now an then:

COPYING.thumb.png.f5ca8e476cb2d7b62d27e72b5d946884.png

After a little bit:

Capture.PNG.c15761add198c45e3548cd7590c12048.PNG

Also - why it's W&R ? Should only Write or?

 

After finishing the copy:

2010392491_COPYNG-FINSIHED.thumb.png.68079f313f7608bc0b9025a8eec050a2.png

1410321911_COPYNG-FINSIHED2.png.2dcba470ddacae188c5d46cea2644f7f.png
Defragmentation hasn't a role in all of this?

Edited by TDA
Finished copy
Link to comment

Due you have a cache drive?

Is the share to which you are writing set to Cache Yes.

What schedule do you have mover set to run?

 

About your question on why read&write .... because when files are written to a parity protected disk a write operation actually requires a read from the parity and data disk + write to parity and data disk.

Link to comment
2 minutes ago, remotevisitor said:

Due you have a cache drive?

Is the share to which you are writing set to Cache Yes.

What schedule do you have mover set to run?

 

About your question on why read&write .... because when files are written to a parity protected disk a write operation actually requires a read from the parity and data disk + write to parity and data disk.

Hello,

For the Share in question, is not Cache enabled.

Edited by TDA
Link to comment
On 7/14/2019 at 5:35 AM, TDA said:

Every time I write something to my VideoLibrary Share - the whole system slows down.

Depends on how much data write and array performance, usually it need wait to complete write, if lot of RAM cache and large amount data sitting in cache, then you will got "hang" symptom. Disk array pool always worst then a high performance cache pool.

 

On 7/14/2019 at 5:35 AM, TDA said:

Some CPU's are then on 100%, PLEX Docker stops responding, and the disks are writing/reading even if the copy is finished.

Since 6.7 CPU usage include I/O wait, so you no need too worries if I/O in waiting. For about PLEX docker stops responding, it seems due to disk array in busy too.

 

3 hours ago, TDA said:

After finishing the copy:

I see it still have 84.5MB R/W, so speed is normal. Suppose you already turn-on "turbo-write".

 

May be you can try setting some turnable to see have improve or not

 

 

Edited by Benson
Link to comment
7 hours ago, Benson said:

Depends on how much data write and array performance, usually it need wait to complete write, if lot of RAM cache and large amount data sitting in cache, then you will got "hang" symptom. Disk array pool always worst then a high performance cache pool.

This share isn't cache enabled - why should data "sit" on cache since the disks speed is about 100mb/s R/W - and the copy is already finished?

Also I'm aware that ssd are faster than HDD - but other than unraid, I never saw this behavior (copy files at a speed supported by the HDD -> finish copy -> disk still working)

Quote

 

Since 6.7 CPU usage include I/O wait, so you no need too worries if I/O in waiting. For about PLEX docker stops responding, it seems due to disk array in busy too.

 

I see it still have 84.5MB R/W, so speed is normal. Suppose you already turn-on "turbo-write".

No I haven't - it's set to AUTO (Settings --> Disk Settings --> Tunable (md_write_method) )
Also since all disks have R/W performance of about 100mb/s - 84.5 seems to me normal, not turbo at all.

Quote

May be you can try setting some turnable to see have improve or not

Quote

 

I would like to try, but I'm not sure what settings are the best with my spec.

On the unraid wiki:

For users with at least 1GB of RAM, the following (conservative) settings are suggested:

md_num_stripes=2048

md_write_limit=768

md_sync_window=1024

 

And inside the post you linked:

Bottom line is this: the greater this number is, the more I/O can be queued down into the disk drives.  However each 'stripe' requires memory in the amount of (4096 x highest disk number in array).  So if you have 20 disks, each stripe will require 81920 bytes of memory; multiplied by 1280 = over 104MB.  The default value was chosen to maximize performance in systems with only 512MB of RAM.  If you have more RAM then you can experiment with higher values.  If you go too high and the system starts running out of memory, what will happen is 'random' processes will start getting killed (not good).

 

You want to make sure the sum of md_write_limit+md_sync_window < md_num_stripes so that reads do not get starved if you starting writing a large file while a parity-sync/check is in process.

 

Now thing is, I have 128GB RAM and only 6disks.

According to the post I could set it this way and I shouldn't have problem if I multiply the "advanced" settings *4:

Advanced:

md_num_stripes=2048

md_write_limit=768

md_sync_window=1024

 

Plan to use:

md_num_striped=8192

md_write_limit=3072

md_sync_window=4096

 

Those settings should be right for a 128GB system or ?

 

 

 

Edited by TDA
Link to comment
1 hour ago, TDA said:

This share isn't cache enabled - why should data "sit" on cache since the disks speed is about 100mb/s R/W - and the copy is already finished?

Also I'm aware that ssd are faster than HDD - but other than unraid, I never saw this behavior (copy files at a speed supported by the HDD -> finish copy -> disk still working)

I know you said not use cache pool, but I am talking about RAM cache and you have 128GB, if disk array write performance lower then data input, then lot of data will sit in memory and waiting for write to array. Thats why you say job finish but R/W still ongoing.

 

If you set md_write_method AUTO, then you should try the different if set to ON.

 

1 hour ago, TDA said:

Those settings should be right for a 128GB system or ?

You should try it, there are no absolute value, too large also no use. Anyway I set 10x current.

Edited by Benson
Link to comment
25 minutes ago, Benson said:

I know you said not use cache pool, but I am talking about RAM cache and you have 128GB, if disk array write performance lower then data input, then lot of data will sit in memory and waiting for write to array. Thats why you say job finish but R/W still ongoing.

But disks performance isn't higher as the copy speed - so why should data be cached?

25 minutes ago, Benson said:

If you set md_write_method AUTO, then you should try the different if set to ON.

I will try

25 minutes ago, Benson said:

 

You should try it, there are no absolute value, too large also no use. Anyway I set 10x current.

I'm testing with the test-script to see which value should give better performance

Link to comment
20 minutes ago, TDA said:

But disks performance isn't higher as the copy speed - so why should data be cached?

This almost same as ask why CPU have cache, HDD have cache, SSD have or haven't cache.

 

Unraid disk array not a high performance storage, cache pool or UD much better and that's why they exist.

Edited by Benson
Link to comment
1 hour ago, Benson said:

This almost same as ask why CPU have cache, HDD have cache, SSD have or haven't cache.

 

Unraid disk array not a high performance storage, cache pool or UD much better and that's why they exist.

That's the results:

image.thumb.png.a699fc5ba9cfcdfad6a860339c809fc0.png

 

I have also enabled :

Tunable (md_write_method): reconstruct_write (which should be the "turbo mode")

 

I'll test this evening if there are some improvements.

 

If not I don't really know what I could do to solve this problem (also on Windows SVR I never had such problems)

 

Link to comment

All above may not solve the PLEX problem, as I know people usually set appdata path to cache pool or UD with suitable device.

 

I mainly focus on how to get best performance on sequential R/W, data transfer, network transfer.

 

 

Edited by Benson
Link to comment
14 minutes ago, Benson said:

All above may not solve the PLEX problem, as I know people usually set appdata path to cache pool or UD with suitable device.

 

I mainly focus on how to get best performance on sequential R/W, data transfer, network transfer.

 

 

Plex AppData (as for all dockers) is on the SSD.

The Problem is related only to the data transfer which saturate the whole box - and this is the reason why PLEX isn't working when the disk "works" after the copy is finished.

I have to figure out how to eliminate this strange behavior of continuing to write/read after the copy is finished (since as said, on a Win SVR this behavior had never happened).

 

It's understandable that Plex doesn't work during the copy, since the disks bandwidth is occupied by the copy-process, but when it's finished the disks shouldn't continue to do "who know what" for 10min.

Link to comment
On 7/17/2019 at 1:09 PM, Benson said:

All above may not solve the PLEX problem, as I know people usually set appdata path to cache pool or UD with suitable device.

 

I mainly focus on how to get best performance on sequential R/W, data transfer, network transfer.

 

 

with the new settings, now I've following scenario:
- Copy from cache to disks, or from windows to disks is light fast (cause all of the thing are cached in ram)

After the copy is done - disks are still working (obviously).

 

But ... well I can live with that.
Only thing (who is also obvious) is that I can't use Plex while copying cause the disks bandwidth is saturated - but that's a physical limitation.

At least now I haven't to wait 10min to copy a file which is copied at 86mb/s (which the disks should support) and then wait another 10min but I copy at 500+mb/s and then wait that the files are effectively written on the disks (Luckily I've a UPS... cause otherwise if I have an electricity outage all ram cached files are gone :D)

 

Other technical limitations is that the disks are SATA and not SCSI (with SCSI concurrent R/W is way better).

 

Thanks for help 🙂

 

Link to comment
  • 2 weeks later...

Hello,

One thing I discovered, and for which I don't have an explanation is following :

image.thumb.png.cda29ab09238b800d857a5aa48eee2f2.png

 

Now file are written back to HDD since it was cached into the RAM.
That the Parity Disk is writing is clear - it has to.

The File which is copying is actually goind to disk 6.

So why on earth also disk 1,2,3 are working (reading)?

 

Probably I missed something from the concept ... but this don't seems logic to me.

Edited by TDA
Link to comment
12 minutes ago, jonathanm said:

A quick search would have found your answer.

 

Thanks 🙂
Then I think I'll disable the turbo mode; every time I copy new film to my library and I or a friend is watching a movie it block all 🙂

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.