[6.7.x] Very slow array concurrent performance

dgreig · October 12, 2019

I'm testing array performance in the next while. I will report back when testing is completed

B_Sinn3d · October 14, 2019

Anyone confirm this is resolved with 6.8? @johnnie.black?

dgreig · October 14, 2019

I can confirm that this issue is resolved in 6.8. Happy days 😁

Zonediver · October 14, 2019

Yep - problem fixed! Thanks a lot 😉

Edited October 14, 2019 by Zonediver

Ancan · October 14, 2019

I'm not that lucky with 6.8rc. Transfer rates got really bad, and I got *lots* of these on the parity drive when I stress the array in any way.

Oct 14 08:19:32 Nasse kernel: sd 10:0:5:0: attempting task abort! scmd(0000000009d51915)
Oct 14 08:19:32 Nasse kernel: sd 10:0:5:0: [sdg] tag#2081 CDB: opcode=0x12 12 01 00 00 fe 00
Oct 14 08:19:32 Nasse kernel: scsi target10:0:5: handle(0x000b), sas_address(0x4433221107000000), phy(7)
Oct 14 08:19:32 Nasse kernel: scsi target10:0:5: enclosure logical id(0x500605b005524f40), slot(0)
Oct 14 08:19:32 Nasse kernel: sd 10:0:5:0: task abort: SUCCESS scmd(0000000009d51915)
Oct 14 08:19:32 Nasse kernel: sd 10:0:5:0: Power-on or device reset occurred

Swapped place of the parity drive and the issue followed, so I was afraid the drive was broken, but after going back to 6.7.3rc4 all is back to normal and transfer speeds are good again.

limetech · October 14, 2019

11 minutes ago, Ancan said:

I'm not that lucky with 6.8rc. Transfer rates got really bad, and I got *lots* of these on the parity drive when I stress the array in any way.

Oct 14 08:19:32 Nasse kernel: sd 10:0:5:0: attempting task abort! scmd(0000000009d51915)
Oct 14 08:19:32 Nasse kernel: sd 10:0:5:0: [sdg] tag#2081 CDB: opcode=0x12 12 01 00 00 fe 00
Oct 14 08:19:32 Nasse kernel: scsi target10:0:5: handle(0x000b), sas_address(0x4433221107000000), phy(7)
Oct 14 08:19:32 Nasse kernel: scsi target10:0:5: enclosure logical id(0x500605b005524f40), slot(0)
Oct 14 08:19:32 Nasse kernel: sd 10:0:5:0: task abort: SUCCESS scmd(0000000009d51915)
Oct 14 08:19:32 Nasse kernel: sd 10:0:5:0: Power-on or device reset occurred

Swapped place of the parity drive and the issue followed, so I was afraid the drive was broken, but after going back to 6.7.3rc4 all is back to normal and transfer speeds are good again.

Need diagnostics

Marshalleq · October 14, 2019

I haven't done specific testing yet - just letting everything settle. The system does not seem to have exhibited any issues so far. Thankyou @limetech, I know this was a challenging one.

I do have the below recurring error in the log, which I assume is unrelated and may have existed prior, but hard to tell. It does have an open kernel.org ticket.

Oct 14 15:37:07 OBI-WAN kernel: pcieport 0000:40:03.1: AER: Multiple Corrected error received: 0000:00:00.0 Oct 14 15:37:07 OBI-WAN kernel: pcieport 0000:40:03.1: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID) Oct 14 15:37:07 OBI-WAN kernel: pcieport 0000:40:03.1: AER: device [1022:1453] error status/mask=00001180/00006000

[1022:1453] 40:03.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge

Ancan · October 15, 2019

9 hours ago, limetech said:

Need diagnostics

Here you go. I upgraded to 6.8, and ran some jobs until the error started again.

nasse-diagnostics-20191015-0508.zip

JorgeB · October 15, 2019

1 hour ago, Ancan said:

Here you go. I upgraded to 6.8, and ran some jobs until the error started again.

That looks like a hardware problem, possibly power related, replace/swap both cables on the parity disk.

You should also update the LSI firmware since it's very old:

LSISAS2008: FWVersion(10.00.08.00)

current version is 20.00.07.00

Ancan · October 15, 2019

4 hours ago, johnnie.black said:
That looks like a hardware problem, possibly power related, replace/swap both cables on the parity disk.

You should also update the LSI firmware since it's very old:
LSISAS2008: FWVersion(10.00.08.00)
current version is 20.00.07.00

As I wrote I've already moved the disk to another slot, and the issue follows the parity disk so it shouldn't be cable/power related. On 6.7.3rc4 there's no problem. I've been running unbalance for hours now without a since hickup, and on 6.8 it's fine for a while, then device resets all the time. Might be related to the kernel and not unraid per se. Anyway, not directly related to this thread so I'll continue the discussion elsewhere if needed.

Thank's for the heads-up on the f/w. Will try to upgrade now. My only controller so I hope nothing goes wrong.

JorgeB · October 15, 2019

3 minutes ago, Ancan said:

Thank's for the heads-up on the f/w. Will try to upgrade now.

You should do that, it can be because of running the new driver with an older firmware.

JorgeB · October 17, 2019

Changed Status to Solved

Racer · November 3, 2019

I don't think that issue is fixed with v6.8. Atleast not for me and some other users.

I still have high iowait and a completely unresponsive array with rc5 when the mover is running.

Streams / shares are not accessible or stop completely. Im running two SSDs in raid 1 with btrfs.

Somebody said that it could be a btrfs issue but I don't know.

limetech · November 3, 2019

3 minutes ago, GHunter said:

Post it as a new issue so it won't be ignored.

It's already posted and it's not being ignored. Sorry if your particular issue is not the current issue being looked at.

patchrules2000 · November 20, 2019

I, and other users, seem to still be having this same performace issue in latest rc's.

Why has this been closed off as fixed? or is it raised in another issue that i havent found?

JorgeB · November 20, 2019

37 minutes ago, patchrules2000 said:

Why has this been closed off as fixed?

Because this specific issue is easily reproducible (see original post) and has been been fixed since rc1, there might be another one but since I can't reproduce I can't report it, so anyone still having issues needs to make a detailed report especially detailing how to reproduce.

patchrules2000 · November 20, 2019

Thanks for the quick reply johnnie,

Was about to get some data to open this back up an noticed rc6 has been released and thought might as well try it first.

Turns out RC6 fixes something that previous rc's did not for me and now everything seems to be running smoothly.

Pushing 200+MB read + write, Cpu encoding at 90% usage while deliverying 3+ media streams and not a single slow down or high IO/Wait issue to report.

Suffice to say im very pleased as long as this dosent resurface as an issue between RC6 + Final release.

Keep the magic unraid sauce flowing please!

Carlos Talbot · November 27, 2019

I recently upgraded to rc7 thinking this problem was behind me. It still persists. It's very easy to reproduce. I copy several GB of files from an unassigned disks to a /mnt/user path. After the memory buffer fills and writes are flushed to disk I start seeing the IO wait shoot up to 45, disrupting all running dockers. It takes at least 5 minutes for the load to subside and system return to normal.

I have a cache pool setup with 2 SSDs (no Samsung drives at this point).

Is BRTFS the culprit?

I'll have to go back to rc5 as the lack of nvidia drivers is killing my performance as well.

tower-diagnostics-20191127-1917.zip

JorgeB · November 27, 2019

5 minutes ago, Carlos Talbot said:

I copy several GB of files from an unassigned disks to a /mnt/user path. After the memory buffer fills and writes are flushed to disk I start seeing the IO wait shoot up to 45, disrupting all running dockers.

Is the copy going to the array or the cache pool?

Carlos Talbot · November 27, 2019

8 minutes ago, johnnie.black said:

Is the copy going to the array or the cache pool?

Array - /mnt/user/subfolder

JorgeB · November 27, 2019

2 minutes ago, Carlos Talbot said:

Array - /mnt/user/subfolder

That doesn't really answer my question, is that share set to use cache?

Carlos Talbot · November 27, 2019

21 minutes ago, johnnie.black said:

That doesn't really answer my question, is that share set to use cache?

Sorry, yes, it's set to Yes for cache.

This got me thinking. I tried the same copy command to a another share that is not using cache. Sure enough the load held steady at 5 and never got higher (this also includes a plex transcode in the background). Containers were accessible without issue.

So it does appear to be the cache that is affecting this.

Edited November 27, 2019 by Carlos Talbot

JorgeB · November 27, 2019

16 minutes ago, Carlos Talbot said:

So it does appear to be the cache that is affecting this.

Yep, you might want to try with a single xfs or btrfs cache device just to compare, some users have bad performance with cache pool, possibly not just Samsung devices, and this is a very old issue.

trurl · November 27, 2019

59 minutes ago, Carlos Talbot said:

Array - /mnt/user/subfolder

In case you still need something to fill in your understanding

Array = disks in the parity array

/mnt/user/subfolder = a user share named subfolder

User shares always include cache. Unless the share is set cache-no then all new writes go to cache.

Carlos Talbot · November 27, 2019

18 minutes ago, trurl said:

In case you still need something to fill in your understanding

Array = disks in the parity array

/mnt/user/subfolder = a user share named subfolder

User shares always include cache. Unless the share is set cache-no then all new writes go to cache.

Got it. I'm in the process of switching from 2 drives in the cache pool to 1 and keeping it at BTRFS. I'm just surprised this issue is still ongoing as it's very easy to reproduce.

Edited November 27, 2019 by Carlos Talbot

[6.7.x] Very slow array concurrent performance

User Feedback

Recommended Comments

dgreig 7

Link to comment

B_Sinn3d 5

Link to comment

dgreig 7

Link to comment

Zonediver 145

Link to comment

Ancan 6

Link to comment

limetech 3328

Link to comment

Marshalleq 139

Link to comment

Ancan 6

Link to comment

JorgeB 7502

Link to comment

Ancan 6

Link to comment

JorgeB 7502

Link to comment

JorgeB 7502

Link to comment

Racer 0

Link to comment

limetech 3328

Link to comment

patchrules2000 2

Link to comment

JorgeB 7502

Link to comment

patchrules2000 2

Link to comment

Carlos Talbot 4

Link to comment

JorgeB 7502

Link to comment

Carlos Talbot 4

Link to comment

JorgeB 7502

Link to comment

Carlos Talbot 4

Link to comment

JorgeB 7502

Link to comment

trurl 2950

Link to comment

Carlos Talbot 4

Link to comment

Join the conversation