Parity check speed problem


Recommended Posts

Hi All,

 

I am trying to understand why my weekly parity check speed was suddenly divided by 6 after replacing a disk causing errors.

I spent hours and hours on internet and on this forum looking for tips or advice on how to fix this, but nothing worked so now I post this message.

 

First you can see on parity check history the problem. On june 25, I had a power cut. I was out of the town and tried mulitple times to fix it remotely. On june 27 I was finally home and restarted my server and fix the issue.  My server was off between june 25 and june 27.

 

523016367_paritycheck.thumb.png.21bfca02af3901e27841c53b4f301338.png

 

 

After the fix (array off, automatic fix by unraid), the speed of parity check suddenly felt down. And I don't know why.

First I tried to reorganize disk positions on my motherboard and PCIe SATA controller. (parity disk directly on motherboard sata controller and data disk on pcie sata controller). No change !

 

 

235580734_arraydevices.thumb.png.afd4be3fb4cbc60b5a413667e677b80e.png

 

Now I try to investigate mdadm command but it looks different to what I see on posts and cannot understand it correctly.

 

For example:

 

mdadm.thumb.png.ced19234a3418d80b902dd5b939b3821.png

 

do you know what is sbsynced2 and mdResyncAction status 'Q'... ?

 

I have posted diagnostics file if it helps.

 

To conclude, everything is working fine (docker, services etc..) but slower than before. Lot slower than before !! I can notice it on evey restart especially with my pihole docker which cannot respond dns request on time during multiple minutes or dns service very very slow during parity check. It never happend before.

 

So I assume my unraid is not running at the right speed and something is going wrong. Please help.

 

Tapodufeu, unraid lover for months :)

 

 

 

tower-diagnostics-20200823-0949.zip

Edited by tapodufeu
Link to comment
19 minutes ago, tapodufeu said:

do you know what is sbsynced2 and mdResyncAction status 'Q'... ?

Not that it is likely to be of any use to you, but my investigations suggest that sbsynced2 is a timestamp for the last completed sync, and the mdResyncAction value says that the action (last one if not one currently running) was checking both parity1 and parity2. 

22 minutes ago, tapodufeu said:

I have posted diagnostics file if it helps.

I did not notice anything obvious to explain the slowdown.   It might be worth installing the DiskSpeed docker to see if that shows any specific disk showing slowdown symptoms.

 

Link to comment

I tried diskspeed already. I had weird results showing different disk slow everytime I tried. The speed gap detection is really an issue, with different disj never ending the process...

 

I tried also to check diskpeed with hdparm, but it did not show any disk slower than another.

 

Do you know anything thing I can do to get a deeper analysis (and log) next time I do a parity check? I am not used at all with mdadm.

 

I do a new diskspeed benchmark later today when my server is free to use.

 

 

Link to comment

I have just run a diskspeed... nothing really interesting. All dockers were running during the test but with no a really few activity.

 

The 2 parity disk looks slower than others..... but at least 45M in read and write.

 

I am not sure disk are really the sourvce of this issue.

diskspeed.png

Link to comment

Those WD SMR disks (WS10SPZX) are known to be bad performers, especially on writes but also on reads, and that's visible in the graphs where they are reading on average about 3x slower then they should, diskspeed test only reads a few sectors from the disks so first thing would be to run a complete surface test on each of those disks so see how they are really performing.

Link to comment

I totally agree and red tons of documentations about it. BUT,  if you pay attention to the first image displaying check history dates and durations, you can notice that suddenly, after fixing errors, speed fall down. No disk were changed, the array is exactly the same than before. This is why I doubt my problem is related to hardware. But i might be possible that an hardware error suddenly raised during the xfs repair.

 

I launch a complete surface test.

 

 

Edited by tapodufeu
Link to comment
31 minutes ago, tapodufeu said:

  if you pay attention to the first image displaying check history dates and durations, you can notice that suddenly, after fixing errors, speed fall down.

That cannot be the reason.

 

31 minutes ago, tapodufeu said:

No disk were changed, the array is exactly the same than before.

Yes, but like all hadware any disk can go bad at any time, like developing slow sectors, just because it was fine yesterday, doesn't mean it is today, though I still think that is too of an average speed for it to be a disk issue, but you should still rule that out first.

 

Your controller configuration is also not the best, but that shouldn't cause slow downs out of the blue without any changes.

Link to comment
26 minutes ago, tapodufeu said:

What do you mean by "my controller configuration is not the best".

You're using a Marvell controller, which are already not recommended, with (or connect to) a SATA port multiplier, which is also not recommended, 3 of your disks are sharing a single SATA port, this isn't good for performance and it tends to produce timeouts, or even drop disks, though it does sometimes work reasonably well for some.

Link to comment

Good to know. Thank you Johnnie for this precious information. I was not aware at all about this bottleneck in my configuration and maybe a lot of people can have the same issue without knowing it.

 

I have an asrock mitx J5005 https://www.asrock.com/mb/Intel/J5005-ITX/index.asp

It is a very cheap configuration, but it is so more powerfull than a NAS more expensive from qnap or synology (when it works haha).

It is not easy to find a sata controller, with 8 disks on a pcie x1 port (this is the only port I have on my motherboard). 

 

Based on what you say I think I found where you look at in syslog file

 

PCIE SATA CONTROLLER MARVELL

Jul 22 09:43:31 Tower kernel: ata4.00: ATA-10: ST1000LM035-1RK172,             WL12V84C, LCM2, max UDMA/100
Jul 22 09:43:31 Tower kernel: ata4.00: configured for UDMA/100

Jul 22 09:43:31 Tower kernel: ata5.00: ATA-10: WDC WD10SPZX-75Z10T2,         WXG1A29LVT1Y, 03.01A03, max UDMA/133
Jul 22 09:43:31 Tower kernel: ata5.00: configured for UDMA/133
Jul 22 09:43:31 Tower kernel: ata6.00: ATA-10: ST1000LM035-1RK172,             WL1FVK5S, LCM2, max UDMA/100
Jul 22 09:43:31 Tower kernel: ata6.00: configured for UDMA/100
Jul 22 09:43:31 Tower kernel: ata6.01: ATA-9: WDC WD10JPVX-75JC3T0,         WX81E73KEER4, 01.01A01, max UDMA/133
Jul 22 09:43:31 Tower kernel: ata6.01: configured for UDMA/133
Jul 22 09:43:31 Tower kernel: ata6.02: ATA-10: WDC WD10SPZX-60Z10T0,      WD-WX71A9849U04, 04.01A04, max UDMA/133
Jul 22 09:43:31 Tower kernel: ata6.02: configured for UDMA/133

 

this is the controller I bought Marvell 88SE9215 + JMicron JMB5xx 8 Ports

https://www.amazon.fr/gp/product/B07ZT31GTD/ref=ppx_yo_dt_b_asin_title_o07_s00?ie=UTF8&psc=1

I should have bought the 6 ports version of this controller... it is the same without a JMicron JMB5xx sata multiplier !! 

 

As soon as my rebuild parity is done (at 5Mb/s... 2 days OMG !!!), I change how my disks are connected.

 

Do you think using ST disks for parity disks and WD disks for data can help? 

Do you think ATA-8, ATA-9 or ATA-10 has any importance ? it is the ATA stack version, correct ? If I cannot avoid using a sata port multiplier, I assume I should at least use the same version of ATA compatibilty and also avoid mixing UDMA100 and UDMA133.

 

BTW if you have any recommendation for another sata controller on pcie x1 port with 6 sata ports, do not hesitate. Thanks

Edited by tapodufeu
Link to comment
8 hours ago, tapodufeu said:

if you have any recommendation for another sata controller on pcie x1 port with 6 sata ports, do not hesitate. Thanks

Unfortunately for an x1 slot there are only good options for 2 port controllers, 4 port Marvell 9215 might work ok, anything x1 with more than 4 ports will use port multipliers, at least AFAIK. 

Link to comment

It took me 3-4 days per parity disk to replace my 2 parity disk (wd blue SMR) by Seagate CMR. Around 3-4 Mb/s..... OMG so long. Almost a week for 2 disks !!!!

 

BUT, I am now back on "regular" performance, around 65MB/s for parity check.

 

Conclusion: NEVER USE SMR DISK FOR PARITY !!!

SMR disks for parity with SMR disks for data... 15Mb/s

CMR disks for parity with SMR disks for data... 65MB/s

 

I have notice a limitation on 8 ports RTD1295 with port multiplexer. If I connect more than 3 drives on the port multiplexer, performance fall down significantly.

Parity disk attached with sata onboard or pci ext sata card has no impact on performance. I have even tried to connect a parity drive on a multiplexed sata port... performance remains the same. Just the number of drives connected on a multiplexed port has impact on performance.

 

So I recommend to not use the 8 ports pcie card with an RTD1295 but maximum 6 ports (3 sata ports + the 4th port is multiplexer with 3 sata). Performance are the same for 4 to 6 drives. Performance are bad to very bad with 7 or 8 drives !! and you will save money, in Europe sata 6 cards cost 20€... the 8 ports cost 45€. the 8 ports card does not worth it at all.

 

I am replacing all my WD blue SMR disks by older WD blue (before 2018) with regular CMR technology. I will let you know impact on performance next week when all drivers are switched.

On the 2nd hand market, they all cost the same price, so easy to sell my SMR to buy CMR without losing money. (they are all second hand since the beginning of my NAS)

 

And I will also move my seagate CMR disk to data disk. I have notice a little increase of the noise level with seagate CMR compared to wd blue SMR when I ise them for parity. Not only during writes... the noise level is always higher with my seagate drives.

The noise level is very important for me, my NAS is ia HTPC under my tv used for plex/pihole/torrent/openvpn/smb, that why I have designed my NAS with 2,5 disk instead of 3,5 because they do less noise.

Link to comment

last thing... maybe important for some people, but my setup with seagate CMR as parity drives consume 21 to 25W. (8 drives + ssd cache + motherboard)

My setup with WD drives as parity consume 19 to 21 W.

 

During sleep, the lowest I used to have with my old configuration was 15,6 W....

After 24 hours with this reorganisation of drives, during sleep I am above 17 W.

 

My seagate CMR drives are LM035, so not the latest technology. and my WD SMR drives are form 2019 !!

 

Edited by tapodufeu
Link to comment

Yes, those are both single 1TB platter disks, and they are SMR, but like mentioned and in my experience Seagate has the best firmware for SMR disks, I have used several SMR disks with Unraid, 2.5 and 3.5", and Seagate disks perform usually much better, most times about the same as CMR, I also have Toshiba SMR (and had some WD in the past) and those perform much worse.

Link to comment
  • 2 weeks later...

So, now I removed all WD smr disk and change the controller to use a 6 ports sata controller (3 sata + 3 sata multiplex).

 

Bad performances disappeared and I am back on regular performance.

 

replacing my 2 parity disk from wd smr to seagate were 2 very slow operations... 3 to 4 MB/s (see check history). I was not just only a replacement but a switch data/parity disk to use seagate disks for parity.

 

then I replaced one by one all my WD SMR disk by WD CMR disk. You can see that slowly, the speed increase at each replacement (4 disks). all WD10SPZX replaced by WD10JPVX (I was lucky to find a guy selling 4 of them, completely new and cheap)

 

the last graph is the speed disk test which also demonstrate the speed gap between old CMR and new SMR disk with western digital. Expect 30 to 40% more just during the speed test and 5 to 6 times better with unraid (12Mb/s compared to 65Mb/s during parity check).

 

to conclude, for all unraid users, try to avoid as much as possible all SMR disk, especially in 2.5 format, performance are even worst in 2,5 compared to 3.5.

 

Soon I wil replace my motherboard by the new intel j4125B with a pcie2 x16 port, allowing me to use a LSI SAS HBA 8-ports  instead of my 4-6 ports sata card pciex1. I do not expect a huge performance gap (my disks are not that fast !!) but surely batter performance under heavy load or simultaneous access and less errors.

 

Last point,  a disk created mysteriously some errors when connect to my marvell data card. Impossible to reproduce those erros with crystal disk or any other smart tools... moreover, smart error reported by the marvel card was not reported anymore when connected to crystal disk on windows.

So I replaced also the 8x sata port card (under warranty) and downgrade to a 6 ports card which do not create any error.

 

Problem solved... hardware choices and configuration was the issue.

 

 

checkhistory.png

unraid_pool.png

diskspeed2.png

Edited by tapodufeu
  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.