Parity check speed problem

tapodufeu · August 23, 2020

Hi All,

I am trying to understand why my weekly parity check speed was suddenly divided by 6 after replacing a disk causing errors.

I spent hours and hours on internet and on this forum looking for tips or advice on how to fix this, but nothing worked so now I post this message.

First you can see on parity check history the problem. On june 25, I had a power cut. I was out of the town and tried mulitple times to fix it remotely. On june 27 I was finally home and restarted my server and fix the issue. My server was off between june 25 and june 27.

After the fix (array off, automatic fix by unraid), the speed of parity check suddenly felt down. And I don't know why.

First I tried to reorganize disk positions on my motherboard and PCIe SATA controller. (parity disk directly on motherboard sata controller and data disk on pcie sata controller). No change !

Now I try to investigate mdadm command but it looks different to what I see on posts and cannot understand it correctly.

For example:

do you know what is sbsynced2 and mdResyncAction status 'Q'... ?

I have posted diagnostics file if it helps.

To conclude, everything is working fine (docker, services etc..) but slower than before. Lot slower than before !! I can notice it on evey restart especially with my pihole docker which cannot respond dns request on time during multiple minutes or dns service very very slow during parity check. It never happend before.

So I assume my unraid is not running at the right speed and something is going wrong. Please help.

Tapodufeu, unraid lover for months

tower-diagnostics-20200823-0949.zip

Edited August 23, 2020 by tapodufeu

itimpi · August 23, 2020

19 minutes ago, tapodufeu said:

do you know what is sbsynced2 and mdResyncAction status 'Q'... ?

Not that it is likely to be of any use to you, but my investigations suggest that sbsynced2 is a timestamp for the last completed sync, and the mdResyncAction value says that the action (last one if not one currently running) was checking both parity1 and parity2.

22 minutes ago, tapodufeu said:

I have posted diagnostics file if it helps.

I did not notice anything obvious to explain the slowdown. It might be worth installing the DiskSpeed docker to see if that shows any specific disk showing slowdown symptoms.

tapodufeu · August 23, 2020

I tried diskspeed already. I had weird results showing different disk slow everytime I tried. The speed gap detection is really an issue, with different disj never ending the process...

I tried also to check diskpeed with hdparm, but it did not show any disk slower than another.

Do you know anything thing I can do to get a deeper analysis (and log) next time I do a parity check? I am not used at all with mdadm.

I do a new diskspeed benchmark later today when my server is free to use.

tapodufeu · August 23, 2020

all issues appeared after I clicked "repair array". Maybe there is an issue not covered completely by the repair ?

Edited August 23, 2020 by tapodufeu

trurl · August 23, 2020

2 hours ago, tapodufeu said:

all issues appeared after I clicked "repair array". Maybe there is an issue not covered completely by the repair ?

On mobile now so can't look at Diagnostics.

There isn't anything called "repair array ". Can you can explain that part better?

tapodufeu · August 23, 2020

The array did not restart automatically and I clicked a button in the interface.

tapodufeu · August 23, 2020

I have just run a diskspeed... nothing really interesting. All dockers were running during the test but with no a really few activity.

The 2 parity disk looks slower than others..... but at least 45M in read and write.

I am not sure disk are really the sourvce of this issue.

tapodufeu · August 25, 2020

anyone has an idea why my unraid server is so slow managing disks ?

I don't know where I can investigate... surely in the raid administration?

JorgeB · August 25, 2020

Those WD SMR disks (WS10SPZX) are known to be bad performers, especially on writes but also on reads, and that's visible in the graphs where they are reading on average about 3x slower then they should, diskspeed test only reads a few sectors from the disks so first thing would be to run a complete surface test on each of those disks so see how they are really performing.

tapodufeu · August 25, 2020

I totally agree and red tons of documentations about it. BUT, if you pay attention to the first image displaying check history dates and durations, you can notice that suddenly, after fixing errors, speed fall down. No disk were changed, the array is exactly the same than before. This is why I doubt my problem is related to hardware. But i might be possible that an hardware error suddenly raised during the xfs repair.

I launch a complete surface test.

Edited August 25, 2020 by tapodufeu

JorgeB · August 25, 2020

31 minutes ago, tapodufeu said:

if you pay attention to the first image displaying check history dates and durations, you can notice that suddenly, after fixing errors, speed fall down.

That cannot be the reason.

31 minutes ago, tapodufeu said:

No disk were changed, the array is exactly the same than before.

Yes, but like all hadware any disk can go bad at any time, like developing slow sectors, just because it was fine yesterday, doesn't mean it is today, though I still think that is too of an average speed for it to be a disk issue, but you should still rule that out first.

Your controller configuration is also not the best, but that shouldn't cause slow downs out of the blue without any changes.

JorgeB · August 25, 2020

You can test the disks with dd, just make sure they aren't being accessed or the results won't be reliable:

dd if=/dev/sdX of=/dev/null bs=4k status=progress

Do it one at a time to avoid controller bottlenecks and post the results.

tapodufeu · August 25, 2020

What do you mean by "my controller configuration is not the best". Do you recommend I change anything?

I do the dd test as soon as my array restart... partity rebuild, I tried to change 1 disk of parity, based on the rebuilt speed, I have not changed the right disk yet

JorgeB · August 25, 2020

26 minutes ago, tapodufeu said:

What do you mean by "my controller configuration is not the best".

You're using a Marvell controller, which are already not recommended, with (or connect to) a SATA port multiplier, which is also not recommended, 3 of your disks are sharing a single SATA port, this isn't good for performance and it tends to produce timeouts, or even drop disks, though it does sometimes work reasonably well for some.

tapodufeu · August 25, 2020

Good to know. Thank you Johnnie for this precious information. I was not aware at all about this bottleneck in my configuration and maybe a lot of people can have the same issue without knowing it.

I have an asrock mitx J5005 https://www.asrock.com/mb/Intel/J5005-ITX/index.asp

It is a very cheap configuration, but it is so more powerfull than a NAS more expensive from qnap or synology (when it works haha).

It is not easy to find a sata controller, with 8 disks on a pcie x1 port (this is the only port I have on my motherboard).

Based on what you say I think I found where you look at in syslog file

PCIE SATA CONTROLLER MARVELL

Jul 22 09:43:31 Tower kernel: ata4.00: ATA-10: ST1000LM035-1RK172, WL12V84C, LCM2, max UDMA/100
Jul 22 09:43:31 Tower kernel: ata4.00: configured for UDMA/100

Jul 22 09:43:31 Tower kernel: ata5.00: ATA-10: WDC WD10SPZX-75Z10T2, WXG1A29LVT1Y, 03.01A03, max UDMA/133
Jul 22 09:43:31 Tower kernel: ata5.00: configured for UDMA/133
Jul 22 09:43:31 Tower kernel: ata6.00: ATA-10: ST1000LM035-1RK172, WL1FVK5S, LCM2, max UDMA/100
Jul 22 09:43:31 Tower kernel: ata6.00: configured for UDMA/100
Jul 22 09:43:31 Tower kernel: ata6.01: ATA-9: WDC WD10JPVX-75JC3T0, WX81E73KEER4, 01.01A01, max UDMA/133
Jul 22 09:43:31 Tower kernel: ata6.01: configured for UDMA/133
Jul 22 09:43:31 Tower kernel: ata6.02: ATA-10: WDC WD10SPZX-60Z10T0, WD-WX71A9849U04, 04.01A04, max UDMA/133
Jul 22 09:43:31 Tower kernel: ata6.02: configured for UDMA/133

this is the controller I bought Marvell 88SE9215 + JMicron JMB5xx 8 Ports

https://www.amazon.fr/gp/product/B07ZT31GTD/ref=ppx_yo_dt_b_asin_title_o07_s00?ie=UTF8&psc=1

I should have bought the 6 ports version of this controller... it is the same without a JMicron JMB5xx sata multiplier !!

As soon as my rebuild parity is done (at 5Mb/s... 2 days OMG !!!), I change how my disks are connected.

Do you think using ST disks for parity disks and WD disks for data can help?

Do you think ATA-8, ATA-9 or ATA-10 has any importance ? it is the ATA stack version, correct ? If I cannot avoid using a sata port multiplier, I assume I should at least use the same version of ATA compatibilty and also avoid mixing UDMA100 and UDMA133.

BTW if you have any recommendation for another sata controller on pcie x1 port with 6 sata ports, do not hesitate. Thanks

Edited August 25, 2020 by tapodufeu

JorgeB · August 26, 2020

8 hours ago, tapodufeu said:

if you have any recommendation for another sata controller on pcie x1 port with 6 sata ports, do not hesitate. Thanks

Unfortunately for an x1 slot there are only good options for 2 port controllers, 4 port Marvell 9215 might work ok, anything x1 with more than 4 ports will use port multipliers, at least AFAIK.

tapodufeu · September 1, 2020

It took me 3-4 days per parity disk to replace my 2 parity disk (wd blue SMR) by Seagate CMR. Around 3-4 Mb/s..... OMG so long. Almost a week for 2 disks !!!!

BUT, I am now back on "regular" performance, around 65MB/s for parity check.

Conclusion: NEVER USE SMR DISK FOR PARITY !!!

SMR disks for parity with SMR disks for data... 15Mb/s

CMR disks for parity with SMR disks for data... 65MB/s

I have notice a limitation on 8 ports RTD1295 with port multiplexer. If I connect more than 3 drives on the port multiplexer, performance fall down significantly.

Parity disk attached with sata onboard or pci ext sata card has no impact on performance. I have even tried to connect a parity drive on a multiplexed sata port... performance remains the same. Just the number of drives connected on a multiplexed port has impact on performance.

So I recommend to not use the 8 ports pcie card with an RTD1295 but maximum 6 ports (3 sata ports + the 4th port is multiplexer with 3 sata). Performance are the same for 4 to 6 drives. Performance are bad to very bad with 7 or 8 drives !! and you will save money, in Europe sata 6 cards cost 20€... the 8 ports cost 45€. the 8 ports card does not worth it at all.

I am replacing all my WD blue SMR disks by older WD blue (before 2018) with regular CMR technology. I will let you know impact on performance next week when all drivers are switched.

On the 2nd hand market, they all cost the same price, so easy to sell my SMR to buy CMR without losing money. (they are all second hand since the beginning of my NAS)

And I will also move my seagate CMR disk to data disk. I have notice a little increase of the noise level with seagate CMR compared to wd blue SMR when I ise them for parity. Not only during writes... the noise level is always higher with my seagate drives.

The noise level is very important for me, my NAS is ia HTPC under my tv used for plex/pihole/torrent/openvpn/smb, that why I have designed my NAS with 2,5 disk instead of 3,5 because they do less noise.

tapodufeu · September 1, 2020

last thing... maybe important for some people, but my setup with seagate CMR as parity drives consume 21 to 25W. (8 drives + ssd cache + motherboard)

My setup with WD drives as parity consume 19 to 21 W.

During sleep, the lowest I used to have with my old configuration was 15,6 W....

After 24 hours with this reorganisation of drives, during sleep I am above 17 W.

My seagate CMR drives are LM035, so not the latest technology. and my WD SMR drives are form 2019 !!

Edited September 1, 2020 by tapodufeu

JorgeB · September 1, 2020

1 hour ago, tapodufeu said:

Seagate CMR

On 8/25/2020 at 11:01 PM, tapodufeu said:

ST1000LM035

If you mean these they are also SMR, but in my experience Seagate SMR perform much better with Unraid than Toshiba and especially WD SMR, usually you don't even notice they are SMR.

tapodufeu · September 1, 2020

are you sure about it?

They perform better than my other CMR drives I have and based on list provided by seagate, LM015, LM024 and LM048 are confirmed SMR. LM035 are manufactured in 2016, apparently CMR.

Do you know any tips to check the technology used?

JorgeB · September 1, 2020

56 minutes ago, tapodufeu said:

are you sure about it?

Yes, any single platter 1TB 2.5" disk is SMR.

56 minutes ago, tapodufeu said:

LM015, LM024 and LM048 are confirmed SMR

You can't use just that part of the model, e.g.:

ST1000LM015 -> CMR

ST2000LM015 -> SMR

tapodufeu · September 1, 2020

2 x ST1000LM035

JorgeB · September 1, 2020

Yes, those are both single 1TB platter disks, and they are SMR, but like mentioned and in my experience Seagate has the best firmware for SMR disks, I have used several SMR disks with Unraid, 2.5 and 3.5", and Seagate disks perform usually much better, most times about the same as CMR, I also have Toshiba SMR (and had some WD in the past) and those perform much worse.

tapodufeu · September 14, 2020

So, now I removed all WD smr disk and change the controller to use a 6 ports sata controller (3 sata + 3 sata multiplex).

Bad performances disappeared and I am back on regular performance.

replacing my 2 parity disk from wd smr to seagate were 2 very slow operations... 3 to 4 MB/s (see check history). I was not just only a replacement but a switch data/parity disk to use seagate disks for parity.

then I replaced one by one all my WD SMR disk by WD CMR disk. You can see that slowly, the speed increase at each replacement (4 disks). all WD10SPZX replaced by WD10JPVX (I was lucky to find a guy selling 4 of them, completely new and cheap)

the last graph is the speed disk test which also demonstrate the speed gap between old CMR and new SMR disk with western digital. Expect 30 to 40% more just during the speed test and 5 to 6 times better with unraid (12Mb/s compared to 65Mb/s during parity check).

to conclude, for all unraid users, try to avoid as much as possible all SMR disk, especially in 2.5 format, performance are even worst in 2,5 compared to 3.5.

Soon I wil replace my motherboard by the new intel j4125B with a pcie2 x16 port, allowing me to use a LSI SAS HBA 8-ports instead of my 4-6 ports sata card pciex1. I do not expect a huge performance gap (my disks are not that fast !!) but surely batter performance under heavy load or simultaneous access and less errors.

Last point, a disk created mysteriously some errors when connect to my marvell data card. Impossible to reproduce those erros with crystal disk or any other smart tools... moreover, smart error reported by the marvel card was not reported anymore when connected to crystal disk on windows.

So I replaced also the 8x sata port card (under warranty) and downgrade to a 6 ports card which do not create any error.

Problem solved... hardware choices and configuration was the issue.

Edited September 14, 2020 by tapodufeu

Parity check speed problem

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation