[6.7.x] Very slow array concurrent performance

Marshalleq · August 18, 2019

Right, so initial testing with the release candidate 6.7.3-rc2. To test I played a plex video, preloaded many gigabytes of files to a cached share. Invoked the mover manually, added an additional copy of another large set of files from a disk to the cache and with all this going simultaneously I don't seem to have any issues. The wa (io wait) in top only get's up to 8.0 instead of 20.0 under the previous kernel. (Gotta edit that as I think I wrote 0.20 which was incorrect, it was 20. Also my write speed (from HDD to SSD) is about normal at 53MB/s - yes it's slow, it's always been slow, even with Seagate enterprise capacity disks - seems to be an overhead of the unraid parity.

This is my first and only test so far (I'll try tomorrow when someone is watching plex on the Apple TV in the lounge where the issue was visible today) I'd be interested if anyone else can test though, by upgrading to 'next' - there are very very few changes in it so it should be quite safe.

If the problem goes away for you I'd say very lucky we are. Otherwise we shall need to investigate further. Fingers crossed!

Edited August 18, 2019 by Marshalleq

Vr2Io · August 18, 2019

8 hours ago, Marshalleq said:

Right, so initial testing with the release candidate 6.7.3-rc2. To test I played a plex video, preloaded many gigabytes of files to a cached share. Invoked the mover manually, added an additional copy of another large set of files from a disk to the cache and with all this going simultaneously I don't seem to have any issues. The wa (io wait) in top only get's up to 8.0 instead of 20.0 under the previous kernel. (Gotta edit that as I think I wrote 0.20 which was incorrect, it was 20. Also my write speed (from HDD to SSD) is about normal at 53MB/s - yes it's slow, it's always been slow, even with Seagate enterprise capacity disks - seems to be an overhead of the unraid parity.

This is my first and only test so far (I'll try tomorrow when someone is watching plex on the Apple TV in the lounge where the issue was visible today) I'd be interested if anyone else can test though, by upgrading to 'next' - there are very very few changes in it so it should be quite safe.

If the problem goes away for you I'd say very lucky we are. Otherwise we shall need to investigate further. Fingers crossed!

The problem should only happen if one writing session simultaneous with another read/write session in disk array, it shouldn't happen in cache pool or UD for my test.

If problem haven't trigger, then disk array read write speed should be expected from 190MB/s to 90MB/s for spinnder disk in disk array, no matter have parity or not.

Edited August 18, 2019 by Benson

Vr2Io · August 18, 2019

8 hours ago, Marshalleq said:

53MB/s - yes it's slow, it's always been slow, even with Seagate enterprise capacity disks - seems to be an overhead of the unraid parity.

It should another issue cause this.

Marshalleq · August 18, 2019

Yes, simultaneous write session while multiple read sessions to spinning disk is what I did. There was multiple plex sessions ongoing while doing a large multi-terabyte copy from SSD cache to the array. But I could try be a little harder on it and try again with even more writes and reads.

Regarding the speed, my reading on this forum indicated that 53MB/s was fairly normal for writing with Parity. If it's not, the only thing I can think of it being is a faulty cable, but I have run speed tests on all my drives and they perform at their rated speed individually - so I don't think it's that. I'm doing another speed test now to make sure nothing hasn't gone wrong. I'd be interested in knowing what your configuration is. my drives are mainly on a Dell PERC310 in IT mode, which seems to have more than enough bandwidth for the job, but perhaps it's that.

Edit: Quick calculation:

The Dell Perc H310 supports 8 drives and runs on the PCI Express 2.0 bus. PCI Express 2.0 supports 500MB/s. So dividing by 8 means each drive would get a maximum of 62.5MB/s. This could be the reason why I guess. Individual drive speed tests wouldn't be restricted by the bus speed, so that would be why I hadn't seen the issue. I also assume read would not be impacted as I don't think read needs to calculate across all drives. Perhaps I should look into reconstruct write mode again.

Edited August 18, 2019 by Marshalleq
Update

Vr2Io · August 18, 2019

51 minutes ago, Marshalleq said:

I'd be interested in knowing what your configuration is. my drives are mainly on a Dell PERC310 in IT mode

Not special, 16 disks, most are WD shuck disk, mix 5400 and 7200 rpm. All connect thr a SAS blackplane to LSI 9207-8i IT in pcie3.0 x8, but confirm no different if connect to 9211 IT (pcie 2.0). Direct pcie from CPU. Change with different platform aslo same speed, no much different on with or without parity.

Yes, must be reconstruct write mode.

Edited August 18, 2019 by Benson

Vr2Io · August 18, 2019

35 minutes ago, Marshalleq said:

PCI Express 2.0 supports 500MB/s.

It is 1x lane speed, so 8x will be 4GB/s, so each disk have ~500MB/s used bandwidth for 8 disks.

Edited August 18, 2019 by Benson

Marshalleq · August 18, 2019

Of course! So not that then. The speed test came out OK. Also @johnnie.black I'd suggest that an impact to performance that brings a systems to it's knees in the main area it is designed for should not be categorised as minor. Perhaps we should increase the ticket rating which may also get more visibility?

trott · August 19, 2019

I agree, it is not minor; I just move from Ubuntu server to unraid and still in trial, I have finished my moving my data to array, now when the qbittorent is download something, I cannot watch movie, it is alway in buffer

dalben · August 19, 2019

Glad I found this as I thought I was going crazy. I have the same symptoms where a video stream will hang/freeeze if there is a background write happening on the array. At first I thought it was a lag spinning up a disk for the write so I have one spin-up group that spins up all disks when even one is up.

That didn't solve the problem. I was about to spend the next weekend with fingers under the hood moving disks from the onboard sata to the LSI2008 SAS-2 card to see if I could find a sweet spot where the errors disappeared.

If there have been some changes to address this in the latest Beta/RC, I'll happily try it out and see if it works. I see it's flagged as minor. Technically it is, but if you had in WAF to the equation then it's a showstopper.

Edited August 19, 2019 by dalben

JorgeB · August 19, 2019

Minor is the default when a bug report gets created, I can change it to urgent, but I'm sure LT has seen the bug report and are working on it., minor or urgent it's not going to make any difference on how long it takes fix it, I expect a new release as soon as there's a fix.

Marshalleq · August 19, 2019

WAF? I can't see how anyone could ever see a bug that kills services on a server as minor though.

Marshalleq · August 19, 2019

Sorry - our posts crossed. @johnnie.black I love your confidence - have you had any confirmation that they've seen it? They're usually pretty good at saying 'Hey we've seen it' I thought, but this one is stunningly quiet. If I had the workload they did, I'd definitely be using the flags of minor / major to filter through everything. That's just my 2c though - (born from 28 years in IT though!)..

dalben · August 19, 2019

24 minutes ago, Marshalleq said:

WAF? I can't see how anyone could ever see a bug that kills services on a server as minor though.

WAF - The most important variable when building a home server used predominantly for media streaming

Wife Acceptance Factor

Marshalleq · August 19, 2019

<FacePalm> You totally baited me. I feel like I got rick rolled.

Marshalleq · August 19, 2019

Let me know if the beta helps - it'd be great to disprove that theory....

JorgeB · August 19, 2019

27 minutes ago, Marshalleq said:

They're usually pretty good at saying 'Hey we've seen it'

Not in my experience, I would say the opposite is true, but since the bug reports board has so few posts it's hard to miss one, especially when there are multiple replys.

Marshalleq · August 19, 2019

Well that's true enough. I'm fairly well an Unraid Noob - but I've not seen anyone ignore posts while in the process of fixing them before. Anyway, I am powerless to do anything.

dalben · August 20, 2019

19 hours ago, Marshalleq said:

Let me know if the beta helps - it'd be great to disprove that theory....

Installed the RC this morning. Should be able to give a report tonight as to whether it helps..

Marshalleq · August 20, 2019

I had one freeze last night where all of plex went offline for about 1 minute, but it was over wifi in dubious circumstances, so not exactly sure. Definitely keen to hear your experience.

itimpi · August 20, 2019

4 hours ago, dalben said:

Installed the RC this morning. Should be able to give a report tonight as to whether it helps..

I did not think the RC was even trying to address this problem? Instead it is focused on getting to the bottom of why some users are experiencing SQLite DB corruption. Having said that I guess the two issues could be related in some way

Kevek79 · August 20, 2019

25 minutes ago, itimpi said:

I did not think the RC was even trying to address this problem? Instead it is focused on getting to the bottom of why some users are experiencing SQLite DB corruption. Having said that I guess the two issues could be related in some way

I am reading on both threads for a while now, even if I have not experienced the SQL Bug yet.

What seems to be common to both issues is that using a cache drive mitigates both issues in some way.

SQL Lite Bug seems to affect only users that do not have their app data on the cache drive.

On the other hand caching my media share helped a lot with transferring new content to the server while it is being streamed from. (And make sure that mover only runs at time with no server usage)

I think it is highly likely that those two issues are connected and a solution for one of the issues could may be solve both.

@Marshalleq Please keep us updated on what you find out.

dalben · August 20, 2019

OK, ran some tests.

Copied a 1Gb file from cache to array and no video stalling.

Copied a 4.3Gb file and when it got to about 2.8Gb the video stalled. Then everything was slow for about 35-40secs before it all came good again.

So the latest RC doesn't help with this problem.

Edited August 20, 2019 by dalben

Marshalleq · August 20, 2019

OK thanks for clarifying. Can you advise what was slow though, other file copying, the GUI etc?

rclifton · August 21, 2019

After pulling my hair out for the last week looking for what I originally assumed was probably a network issue, I found this thread which describes the issue I'm having exactly.

My system is a dual xeon 2650 setup with 96GB of ram, dual LSI2008SAS2 cards, two cache drives connected to the onboard sata controller intel c600/x79 chipset in raid1. Mover is currently configured to run hourly as my cache drives are relatively small @ 120GBs for the number of users within my household (8). I was already planning to jump to a 1TB nvme drive but guess I may need to seriously consider downgrading as my wife's identical twin lives with us which means WAFx2 is a major issue! 😱

Is there anything major to look out for when downgrading?

JonathanM · August 21, 2019

29 minutes ago, rclifton said:

my wife's identical twin lives with us

Cue George Takei "Ohhh Myyyy!"

[6.7.x] Very slow array concurrent performance

User Feedback

Recommended Comments

Marshalleq 139

Link to comment

Vr2Io 367

Link to comment

Vr2Io 367

Link to comment

Marshalleq 139

Link to comment

Vr2Io 367

Link to comment

Vr2Io 367

Link to comment

Marshalleq 139

Link to comment

trott 14

Link to comment

dalben 46

Link to comment

JorgeB 7479

Link to comment

Marshalleq 139

Link to comment

Marshalleq 139

Link to comment

dalben 46

Link to comment

Marshalleq 139

Link to comment

Marshalleq 139

Link to comment

JorgeB 7479

Link to comment

Marshalleq 139

Link to comment

dalben 46

Link to comment

Marshalleq 139

Link to comment

itimpi 2244

Link to comment

Kevek79 12

Link to comment

dalben 46

Link to comment

Marshalleq 139

Link to comment

rclifton 7

Link to comment

JonathanM 2302

Link to comment

Join the conversation