EXTREMELY slow parity sync speed

Jomo · June 24, 2019

The issue: I'm doing a parity sync and the speed keeps dropping really low, downwards to ~1 MB/s..
EDIT: Speed is slowing even further to ~300 KB/s
EDIT 2: Canceled the parity sync, the new one is running at 150 MB/s, no idea why it would have drastically changed so much

Unraid version: 6.7.0
Attached is the diagnostic

tower-diagnostics-20190624-1124.zip

Edited June 24, 2019 by Jomo

JorgeB · June 24, 2019

1 hour ago, Jomo said:

no idea why it would have drastically changed so much

You were getting read errors on parity:

Jun 23 23:38:29 Tower kernel: sd 8:0:7:0: [sdi] tag#5 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Jun 23 23:38:29 Tower kernel: sd 8:0:7:0: [sdi] tag#5 Sense Key : 0x3 [current] [descriptor]
Jun 23 23:38:29 Tower kernel: sd 8:0:7:0: [sdi] tag#5 ASC=0x11 ASCQ=0x0
Jun 23 23:38:29 Tower kernel: sd 8:0:7:0: [sdi] tag#5 CDB: opcode=0x88 88 00 00 00 00 00 e9 fb dc 40 00 00 04 00 00 00
Jun 23 23:38:29 Tower kernel: print_req_error: critical medium error, dev sdi, sector 3925597778
Jun 23 23:38:29 Tower kernel: md: disk0 read error, sector=3925597712
Jun 23 23:38:29 Tower kernel: md: disk0 read error, sector=3925597720
Jun 23 23:38:29 Tower kernel: md: disk0 read error, sector=3925597728
Jun 23 23:38:29 Tower kernel: md: disk0 read error, sector=3925597736
Jun 23 23:38:29 Tower kernel: md: disk0 read error, sector=3925597744

Jomo · June 24, 2019

I forgot to format the 2 drives I added (in addition to the parity drive) before I initiated that first sync. So I did the format mid sync, could that have caused the errors? If not, no idea what would have caused them nor why I am currently not getting any

JorgeB · June 24, 2019

Formatting would't cause read errors, besides this was parity, parity isn't formatted, check SMART for that disk (it«s not on the diags) and run a long test.

Jomo · June 24, 2019

tower-smart-20190624-0919.zip

JorgeB · June 24, 2019

SAS SMART is not as easy for me to analyze as ATA SMART, but there a couple of uncorrected errors which is not a good sign and what appear to be sectors needing reallocation, still wait for the long test to finish.

Jomo · June 24, 2019

Oh, I hit start for the SMART extended test, then it immediately said errors found, check SMART report, so that's when I downloaded the report and attached it. Thought it was done (though I was confused about how quick "extended test" was)

JorgeB · June 24, 2019

Long test is still in progress, I believe the GUI test buttons don't work 100% with SAS devices.

Jomo · June 24, 2019

How can I tell when it is done?

Jomo · June 24, 2019

Now that it is about 70% done, the speed has drastically dropped again.

Also, I am unable to check the results of the SMART test for the drive with errors, it is just spinning at "Last SMART test results".

Guessing at this point I just need to remove the drive and just deal with one parity drive for now.

JorgeB · June 25, 2019

Use the console to get the smart report:

smartctl -x /dev/sdX

Jomo · June 25, 2019

This is what I got:

Note that since yesterday, I unassigned the drive, ran the parity sync with my other parity drive just fine, and then I started a preclear of this drive to see if that got any errors too, so far no but it is still on the preread (~19 hours later, 91% done...)

Quote

smartctl 7.0 2018-12-30 r4883 [x86_64-linux-4.19.41-Unraid] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               HITACHI
Product:              H7230AS60SUN3.0T
Revision:             A6C0
Compliance:           SPC-4
User Capacity:        3,000,592,982,016 bytes [3.00 TB]
Logical block size:   512 bytes
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000cca03e2602ec
Serial number:        001237RNX1ND        YVGNX1ND
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Tue Jun 25 09:23:33 2019 EDT
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning: Enabled
Read Cache is:        Enabled
Writeback Cache is:   Disabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     33 C
Drive Trip Temperature:        85 C

Manufactured in week 37 of year 2012
Specified cycle count over device lifetime: 50000
Accumulated start-stop cycles: 63
Specified load-unload count over device lifetime: 600000
Accumulated load-unload cycles: 1878
Elements in grown defect list: 1073

Vendor (Seagate Cache) information
Blocks sent to initiator = 22452280036950016

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites corrected invocations   [10^9 bytes] errors
read:          0   986464         0    986464   22970810     776960.878           2
write:         0 10885558         0 10885558    3123961     244165.631           0
verify:        0        0         0         0   19274899         35.033           0

Non-medium error count:        0

SMART Self-test log
Num Test              Status                 segment LifeTime LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1 Background long   Completed                   -   43862                 - [-   -    -]

Long (extended) Self-test duration: 27182 seconds [453.0 minutes]

Background scan results log
Status: scan is active
    Accumulated power on time, hours:minutes 43878:57 [2632737 minutes]
    Number of background scans performed: 266, scan progress: 91.56%
    Number of background medium scans performed: 266

   # when        lba(hex)    [sk,asc,ascq]    reassign_status
   1 43807:33 000000015198c1d1 [3,11,0]   Require Write or Reassign Blocks command
   2 43720:18 00000000e9fbe684 [3,11,0]   Reassigned by app, has valid data
   3 43720:18 00000000e9fbde52 [3,11,0]   Reassigned by app, has valid data

Protocol Specific port log page for SAS SSP
relative target port id = 1
generation code = 20
number of phys = 1
phy identifier = 0
    attached device type: SAS or SATA device
    attached reason: loss of dword synchronization
    reason: unknown
    negotiated logical link rate: phy enabled; 6 Gbps
    attached initiator port: ssp=1 stp=1 smp=1
    attached target port: ssp=0 stp=0 smp=0
    SAS address = 0x5000cca03e2602ed
    attached SAS address = 0x500304800020a127
    attached phy identifier = 7
    Invalid DWORD count = 1388
    Running disparity error count = 1311
    Loss of DWORD synchronization = 344
    Phy reset problem = 0
    Phy event descriptors:
     Invalid word count: 1388
     Running disparity error count: 1311
     Loss of dword synchronization count: 344
     Phy reset problem count: 0
relative target port id = 2
generation code = 20
number of phys = 1
phy identifier = 1
    attached device type: no device attached
    attached reason: unknown
    reason: power on
    negotiated logical link rate: phy enabled; unknown
    attached initiator port: ssp=0 stp=0 smp=0
    attached target port: ssp=0 stp=0 smp=0
    SAS address = 0x5000cca03e2602ee
    attached SAS address = 0x0
    attached phy identifier = 0
    Invalid DWORD count = 0
    Running disparity error count = 0
    Loss of DWORD synchronization = 0
    Phy reset problem = 0
    Phy event descriptors:
     Invalid word count: 0
     Running disparity error count: 0
     Loss of dword synchronization count: 0
     Phy reset problem count: 0

Edited June 25, 2019 by Jomo

JorgeB · June 25, 2019

Long test completed, if also no errors during preclear you can add it back to the array and see how it goes, would recommend swapping cables/backplane slot with another disk to rule them out in case there are more errors in the future.

Jomo · June 25, 2019

I think this preclear is a lost cause too. It's still on the pre-read section, 21 hours later, 93%. Last time I did a preclear, the preread took 6 hours.

AgentXXL · June 25, 2019

I've been watching this thread as I am in the midst of upgrading my parity drive to a new 10TB Ironwolf. What's interesting is that it started out with speeds around 120MB/s and as the parity rebuild/sync continued it got progressively slower, dropping to 31MB/s for the lowest I saw. I suspect this is because it was taking more time to calculate the parity across all drives in the array, when dealing with actual data vs empty/zeroed space.

My array currently has 2 x 8TB, 1 x 6TB, and 4 x 4TB drives. When the parity rebuild started, it was reading across all drives and all were spun up as it had to create parity by reading the sectors from all drives. Now that the parity rebuild is at the 72% mark, I noticed that the 4TB and 6TB drives are all spun down, i.e as it's past the size of those drives for the parity rebuild, it's now only needing to read the 2 x 8TB drives. And my parity rebuild speed is back to 112MB/s. I suspect once it hits the 80% mark, it'll go even faster as I don't have any 10TB data drives in the array so the last 2TB of parity should be null.

Once the parity rebuild completes, I have 2 more new and precleared 8TB drives to add, which again shouldn't impact the parity as they are zeroed. Then I can start migrating the data from my 5 x 10TB USB drives (UD mounted) to the array. As each 10TB drive is finished copying to the array, they'll be shucked and inserted into a reserved slot in my configuration. I'll then preclear them and add them to the data pool, giving me more storage for the next 10TB drive. I have another 6TB and one more 4TB drive that has to have its data migrated as well, so in the end my data array will consist of 5 x 10TB, 4 x 8TB, 2 x 6TB and 5 x 4TB drives - 114TB of raw storage, about 80% full by my estimate.

In any case, I think the slow-down you were seeing @Jomo was related to your read errors, but overall a parity rebuild/resync will slow down while it's reading the sectors across all drives in the array. 4TB drives got dropped off the calculation at 40% and the 6TB drive at 60%. As it no longer needs to calculate parity for those drives (already completed), the speed is only that of the 2 x 8TB drives. Makes sense to me so I hope this scenario and explanation helps others understand the speed of parity rebuild/resync.

JorgeB · June 25, 2019

3 minutes ago, AgentXXL said:

What's interesting is that it started out with speeds around 120MB/s and as the parity rebuild/sync continued it got progressively slower, dropping to 31MB/s for the lowest I saw. I suspect this is because it was taking more time to calculate the parity across all drives in the array, when dealing with actual data vs empty/zeroed space.

No, parity sync speed isn't affected by data, if no controller bottlenecks and using an adequate CPU it will be limited by the slowest disk at any point of the sync, using for example recent 7200rpm disks (and even some 5400rpm) typical start speed is 200MB/s and end speed is 100MB/s.

AgentXXL · June 25, 2019

9 minutes ago, johnnie.black said:

No, parity sync speed isn't affected by data, if no controller bottlenecks and using an adequate CPU it will be limited by the slowest disk at any point of the sync, using for example recent 7200rpm disks (and even some 5400rpm) typical start speed is 200MB/s and end speed is 100MB/s.

Then I suspect my speed drops were related to the older 6TB and 4TB drives that are 5400rpm and lower cache on each than newer models. Regardless, the lower speeds were only while those smaller drives were part of the parity calculation. I am only using a 4 core i7-6700K right now, with unRAID and other dockers/tasks pinned to the 1st core (2 threads with HT), and Plex pinned to the other 3 (6 threads with HT). I'll be upgrading my motherboard, CPU and RAM when I save up enough money in a couple of months, likely to a Ryzen 2 16 core setup.

I may also replace the PCIe 2.0 LSI controller with a PCIe 3.0 variant to improve disk speed performance, but for now I'm content as I've been able to stream my highest bitrate 4K titles (86.5Mbps average according to Plex) with no issues.

Edited June 25, 2019 by AgentXXL
Clarification

Jomo · June 25, 2019

When I canceled the parity sync the 1st time, the speed had dropped to like 300 KB/s, it has an estimated remaining time of 40 ish days.

JorgeB · June 25, 2019

7 minutes ago, AgentXXL said:

Then I suspect my speed drops were related to the older 6TB and 4TB drives that are 5400rpm and lower cache on each than newer models.

You can use the diskspeed docker to give you an idea of each disk performance, sometimes disks develop a few slow sector zones.

AgentXXL · June 25, 2019

5 minutes ago, johnnie.black said:

You can use the diskspeed docker to give you an idea of each disk performance, sometimes disks develop a few slow sector zones.

Thanks! I'm OK with the slower speeds, as those older disks are mostly full now anyhow. And so far, their read speeds for use with Plex has been fine (all movies and TV, mostly remux vs re-encode). And eventually as I save up enough cash, I'll replace the older drives with newer drives of higher capacity.

Jomo · June 25, 2019

This is what I got for my drive

Vr2Io · June 25, 2019

From your SMART, writeback cache in disable !?

It shouldn't affect read speedtest, but you got a problem curve.

Edited June 25, 2019 by Benson

Jomo · June 25, 2019

11 minutes ago, Benson said:

From your SMART, writeback cache in disable !?

It shouldn't affect read speedtest, but you got a problem curve.

I"m, obviously, a noob at this stuff. So, yeah I don't know what that means

Vr2Io · June 25, 2019

16 minutes ago, Jomo said:

I"m, obviously, a noob at this stuff. So, yeah I don't know what that means

May be ref below post to confirm SAS drive writeback cache status or how to enable it.

Jomo · June 25, 2019

Thank you, though it is not clear to me what that setting does

EXTREMELY slow parity sync speed

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation