EXTREMELY slow parity sync speed


Jomo

Recommended Posts

1 hour ago, Jomo said:

no idea why it would have drastically changed so much

You were getting read errors on parity:

 

Jun 23 23:38:29 Tower kernel: sd 8:0:7:0: [sdi] tag#5 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Jun 23 23:38:29 Tower kernel: sd 8:0:7:0: [sdi] tag#5 Sense Key : 0x3 [current] [descriptor]
Jun 23 23:38:29 Tower kernel: sd 8:0:7:0: [sdi] tag#5 ASC=0x11 ASCQ=0x0
Jun 23 23:38:29 Tower kernel: sd 8:0:7:0: [sdi] tag#5 CDB: opcode=0x88 88 00 00 00 00 00 e9 fb dc 40 00 00 04 00 00 00
Jun 23 23:38:29 Tower kernel: print_req_error: critical medium error, dev sdi, sector 3925597778
Jun 23 23:38:29 Tower kernel: md: disk0 read error, sector=3925597712
Jun 23 23:38:29 Tower kernel: md: disk0 read error, sector=3925597720
Jun 23 23:38:29 Tower kernel: md: disk0 read error, sector=3925597728
Jun 23 23:38:29 Tower kernel: md: disk0 read error, sector=3925597736
Jun 23 23:38:29 Tower kernel: md: disk0 read error, sector=3925597744

 

Link to comment

I forgot to format the 2 drives I added (in addition to the parity drive) before I initiated that first sync. So I did the format mid sync, could that have caused the errors? If not, no idea what would have caused them nor why I am currently not getting any

Link to comment

Oh, I hit start for the SMART extended test, then it immediately said errors found, check SMART report, so that's when I downloaded the report and attached it. Thought it was done (though I was confused about how quick "extended test" was)

Link to comment

Now that it is about 70% done, the speed has drastically dropped again.

 

Also, I am unable to check the results of the SMART test for the drive with errors, it is just spinning at "Last SMART test results".

Guessing at this point I just need to remove the drive and just deal with one parity drive for now.

Link to comment

This is what I got:

Note that since yesterday, I unassigned the drive, ran the parity sync with my other parity drive just fine, and then I started a preclear of this drive to see if that got any errors too, so far no but it is still on the preread (~19 hours later, 91% done...)

Quote

smartctl 7.0 2018-12-30 r4883 [x86_64-linux-4.19.41-Unraid] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               HITACHI
Product:              H7230AS60SUN3.0T
Revision:             A6C0
Compliance:           SPC-4
User Capacity:        3,000,592,982,016 bytes [3.00 TB]
Logical block size:   512 bytes
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000cca03e2602ec
Serial number:        001237RNX1ND        YVGNX1ND
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Tue Jun 25 09:23:33 2019 EDT
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled
Read Cache is:        Enabled
Writeback Cache is:   Disabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     33 C
Drive Trip Temperature:        85 C

Manufactured in week 37 of year 2012
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  63
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  1878
Elements in grown defect list: 1073

Vendor (Seagate Cache) information
  Blocks sent to initiator = 22452280036950016

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0   986464         0    986464   22970810     776960.878           2
write:         0 10885558         0  10885558    3123961     244165.631           0
verify:        0        0         0         0   19274899         35.033           0

Non-medium error count:        0

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background long   Completed                   -   43862                 - [-   -    -]

Long (extended) Self-test duration: 27182 seconds [453.0 minutes]

Background scan results log
  Status: scan is active
    Accumulated power on time, hours:minutes 43878:57 [2632737 minutes]
    Number of background scans performed: 266,  scan progress: 91.56%
    Number of background medium scans performed: 266

   #  when        lba(hex)    [sk,asc,ascq]    reassign_status
   1 43807:33  000000015198c1d1  [3,11,0]   Require Write or Reassign Blocks command
   2 43720:18  00000000e9fbe684  [3,11,0]   Reassigned by app, has valid data
   3 43720:18  00000000e9fbde52  [3,11,0]   Reassigned by app, has valid data

Protocol Specific port log page for SAS SSP
relative target port id = 1
  generation code = 20
  number of phys = 1
  phy identifier = 0
    attached device type: SAS or SATA device
    attached reason: loss of dword synchronization
    reason: unknown
    negotiated logical link rate: phy enabled; 6 Gbps
    attached initiator port: ssp=1 stp=1 smp=1
    attached target port: ssp=0 stp=0 smp=0
    SAS address = 0x5000cca03e2602ed
    attached SAS address = 0x500304800020a127
    attached phy identifier = 7
    Invalid DWORD count = 1388
    Running disparity error count = 1311
    Loss of DWORD synchronization = 344
    Phy reset problem = 0
    Phy event descriptors:
     Invalid word count: 1388
     Running disparity error count: 1311
     Loss of dword synchronization count: 344
     Phy reset problem count: 0
relative target port id = 2
  generation code = 20
  number of phys = 1
  phy identifier = 1
    attached device type: no device attached
    attached reason: unknown
    reason: power on
    negotiated logical link rate: phy enabled; unknown
    attached initiator port: ssp=0 stp=0 smp=0
    attached target port: ssp=0 stp=0 smp=0
    SAS address = 0x5000cca03e2602ee
    attached SAS address = 0x0
    attached phy identifier = 0
    Invalid DWORD count = 0
    Running disparity error count = 0
    Loss of DWORD synchronization = 0
    Phy reset problem = 0
    Phy event descriptors:
     Invalid word count: 0
     Running disparity error count: 0
     Loss of dword synchronization count: 0
     Phy reset problem count: 0

 

Edited by Jomo
Link to comment

I've been watching this thread as I am in the midst of upgrading my parity drive to a new 10TB Ironwolf. What's interesting is that it started out with speeds around 120MB/s and as the parity rebuild/sync continued it got progressively slower, dropping to 31MB/s for the lowest I saw. I suspect this is because it was taking more time to calculate the parity across all drives in the array, when dealing with actual data vs empty/zeroed space.

 

My array currently has 2 x 8TB, 1 x 6TB, and 4 x 4TB drives. When the parity rebuild started, it was reading across all drives and all were spun up as it had to create parity by reading the sectors from all drives. Now that the parity rebuild is at the 72% mark, I noticed that the 4TB and 6TB drives are all spun down, i.e as it's past the size of those drives for the parity rebuild, it's now only needing to read the 2 x 8TB drives. And my parity rebuild speed is back to 112MB/s. I suspect once it hits the 80% mark, it'll go even faster as I don't have any 10TB data drives in the array so the last 2TB of parity should be null.

 

Once the parity rebuild completes, I have 2 more new and precleared 8TB drives to add, which again shouldn't impact the parity as they are zeroed. Then I can start migrating the data from my 5 x 10TB USB drives (UD mounted) to the array. As each 10TB drive is finished copying to the array, they'll be shucked and inserted into a reserved slot in my configuration. I'll then preclear them and add them to the data pool, giving me more storage for the next 10TB drive. I have another 6TB and one more 4TB drive that has to have its data migrated as well, so in the end my data array will consist of 5 x 10TB, 4 x 8TB, 2 x 6TB and 5 x 4TB drives - 114TB of raw storage, about 80% full by my estimate.

 

In any case, I think the slow-down you were seeing @Jomo was related to your read errors, but overall a parity rebuild/resync will slow down while it's reading the sectors across all drives in the array. 4TB drives got dropped off the calculation at 40% and the 6TB drive at 60%. As it no longer needs to calculate parity for those drives (already completed), the speed is only that of the 2 x 8TB drives. Makes sense to me so I hope this scenario and explanation helps others understand the speed of parity rebuild/resync.

 

Link to comment
3 minutes ago, AgentXXL said:

What's interesting is that it started out with speeds around 120MB/s and as the parity rebuild/sync continued it got progressively slower, dropping to 31MB/s for the lowest I saw. I suspect this is because it was taking more time to calculate the parity across all drives in the array, when dealing with actual data vs empty/zeroed space.

No, parity sync speed isn't affected by data, if no controller bottlenecks and using an adequate CPU it will be limited by the slowest disk at any point of the sync, using for example recent 7200rpm disks (and even some 5400rpm) typical start speed is 200MB/s and end speed is 100MB/s.

Link to comment
9 minutes ago, johnnie.black said:

No, parity sync speed isn't affected by data, if no controller bottlenecks and using an adequate CPU it will be limited by the slowest disk at any point of the sync, using for example recent 7200rpm disks (and even some 5400rpm) typical start speed is 200MB/s and end speed is 100MB/s.

Then I suspect my speed drops were related to the older 6TB and 4TB drives that are 5400rpm and lower cache on each than newer models. Regardless, the lower speeds were only while those smaller drives were part of the parity calculation. I am only using a 4 core i7-6700K right now, with unRAID and other dockers/tasks pinned to the 1st core (2 threads with HT), and Plex pinned to the other 3 (6 threads with HT). I'll be upgrading my motherboard, CPU and RAM when I save up enough money in a couple of months, likely to a Ryzen 2 16 core setup.

 

I may also replace the PCIe 2.0 LSI controller with a PCIe 3.0 variant to improve disk speed performance, but for now I'm content as I've been able to stream my highest bitrate 4K titles (86.5Mbps average according to Plex) with no issues.

 

 

Edited by AgentXXL
Clarification
Link to comment
5 minutes ago, johnnie.black said:

You can use the diskspeed docker to give you an idea of each disk performance, sometimes disks develop a few slow sector zones.

Thanks! I'm OK with the slower speeds, as those older disks are mostly full now anyhow. And so far, their read speeds for use with Plex has been fine (all movies and TV, mostly remux vs re-encode).  And eventually as I save up enough cash, I'll replace the older drives with newer drives of higher capacity.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.