Jomo Posted June 24, 2019 Share Posted June 24, 2019 (edited) The issue: I'm doing a parity sync and the speed keeps dropping really low, downwards to ~1 MB/s.. EDIT: Speed is slowing even further to ~300 KB/s EDIT 2: Canceled the parity sync, the new one is running at 150 MB/s, no idea why it would have drastically changed so much Unraid version: 6.7.0 Attached is the diagnostic tower-diagnostics-20190624-1124.zip Edited June 24, 2019 by Jomo Quote Link to comment
JorgeB Posted June 24, 2019 Share Posted June 24, 2019 1 hour ago, Jomo said: no idea why it would have drastically changed so much You were getting read errors on parity: Jun 23 23:38:29 Tower kernel: sd 8:0:7:0: [sdi] tag#5 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Jun 23 23:38:29 Tower kernel: sd 8:0:7:0: [sdi] tag#5 Sense Key : 0x3 [current] [descriptor] Jun 23 23:38:29 Tower kernel: sd 8:0:7:0: [sdi] tag#5 ASC=0x11 ASCQ=0x0 Jun 23 23:38:29 Tower kernel: sd 8:0:7:0: [sdi] tag#5 CDB: opcode=0x88 88 00 00 00 00 00 e9 fb dc 40 00 00 04 00 00 00 Jun 23 23:38:29 Tower kernel: print_req_error: critical medium error, dev sdi, sector 3925597778 Jun 23 23:38:29 Tower kernel: md: disk0 read error, sector=3925597712 Jun 23 23:38:29 Tower kernel: md: disk0 read error, sector=3925597720 Jun 23 23:38:29 Tower kernel: md: disk0 read error, sector=3925597728 Jun 23 23:38:29 Tower kernel: md: disk0 read error, sector=3925597736 Jun 23 23:38:29 Tower kernel: md: disk0 read error, sector=3925597744 Quote Link to comment
Jomo Posted June 24, 2019 Author Share Posted June 24, 2019 I forgot to format the 2 drives I added (in addition to the parity drive) before I initiated that first sync. So I did the format mid sync, could that have caused the errors? If not, no idea what would have caused them nor why I am currently not getting any Quote Link to comment
JorgeB Posted June 24, 2019 Share Posted June 24, 2019 Formatting would't cause read errors, besides this was parity, parity isn't formatted, check SMART for that disk (it«s not on the diags) and run a long test. Quote Link to comment
Jomo Posted June 24, 2019 Author Share Posted June 24, 2019 tower-smart-20190624-0919.zip Quote Link to comment
JorgeB Posted June 24, 2019 Share Posted June 24, 2019 SAS SMART is not as easy for me to analyze as ATA SMART, but there a couple of uncorrected errors which is not a good sign and what appear to be sectors needing reallocation, still wait for the long test to finish. Quote Link to comment
Jomo Posted June 24, 2019 Author Share Posted June 24, 2019 Oh, I hit start for the SMART extended test, then it immediately said errors found, check SMART report, so that's when I downloaded the report and attached it. Thought it was done (though I was confused about how quick "extended test" was) Quote Link to comment
JorgeB Posted June 24, 2019 Share Posted June 24, 2019 Long test is still in progress, I believe the GUI test buttons don't work 100% with SAS devices. Quote Link to comment
Jomo Posted June 24, 2019 Author Share Posted June 24, 2019 How can I tell when it is done? Quote Link to comment
Jomo Posted June 24, 2019 Author Share Posted June 24, 2019 Now that it is about 70% done, the speed has drastically dropped again. Also, I am unable to check the results of the SMART test for the drive with errors, it is just spinning at "Last SMART test results". Guessing at this point I just need to remove the drive and just deal with one parity drive for now. Quote Link to comment
JorgeB Posted June 25, 2019 Share Posted June 25, 2019 Use the console to get the smart report: smartctl -x /dev/sdX Quote Link to comment
Jomo Posted June 25, 2019 Author Share Posted June 25, 2019 (edited) This is what I got: Note that since yesterday, I unassigned the drive, ran the parity sync with my other parity drive just fine, and then I started a preclear of this drive to see if that got any errors too, so far no but it is still on the preread (~19 hours later, 91% done...) Quote smartctl 7.0 2018-12-30 r4883 [x86_64-linux-4.19.41-Unraid] (local build) Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Vendor: HITACHI Product: H7230AS60SUN3.0T Revision: A6C0 Compliance: SPC-4 User Capacity: 3,000,592,982,016 bytes [3.00 TB] Logical block size: 512 bytes Rotation Rate: 7200 rpm Form Factor: 3.5 inches Logical Unit id: 0x5000cca03e2602ec Serial number: 001237RNX1ND YVGNX1ND Device type: disk Transport protocol: SAS (SPL-3) Local Time is: Tue Jun 25 09:23:33 2019 EDT SMART support is: Available - device has SMART capability. SMART support is: Enabled Temperature Warning: Enabled Read Cache is: Enabled Writeback Cache is: Disabled === START OF READ SMART DATA SECTION === SMART Health Status: OK Current Drive Temperature: 33 C Drive Trip Temperature: 85 C Manufactured in week 37 of year 2012 Specified cycle count over device lifetime: 50000 Accumulated start-stop cycles: 63 Specified load-unload count over device lifetime: 600000 Accumulated load-unload cycles: 1878 Elements in grown defect list: 1073 Vendor (Seagate Cache) information Blocks sent to initiator = 22452280036950016 Error counter log: Errors Corrected by Total Correction Gigabytes Total ECC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 0 986464 0 986464 22970810 776960.878 2 write: 0 10885558 0 10885558 3123961 244165.631 0 verify: 0 0 0 0 19274899 35.033 0 Non-medium error count: 0 SMART Self-test log Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ] Description number (hours) # 1 Background long Completed - 43862 - [- - -] Long (extended) Self-test duration: 27182 seconds [453.0 minutes] Background scan results log Status: scan is active Accumulated power on time, hours:minutes 43878:57 [2632737 minutes] Number of background scans performed: 266, scan progress: 91.56% Number of background medium scans performed: 266 # when lba(hex) [sk,asc,ascq] reassign_status 1 43807:33 000000015198c1d1 [3,11,0] Require Write or Reassign Blocks command 2 43720:18 00000000e9fbe684 [3,11,0] Reassigned by app, has valid data 3 43720:18 00000000e9fbde52 [3,11,0] Reassigned by app, has valid data Protocol Specific port log page for SAS SSP relative target port id = 1 generation code = 20 number of phys = 1 phy identifier = 0 attached device type: SAS or SATA device attached reason: loss of dword synchronization reason: unknown negotiated logical link rate: phy enabled; 6 Gbps attached initiator port: ssp=1 stp=1 smp=1 attached target port: ssp=0 stp=0 smp=0 SAS address = 0x5000cca03e2602ed attached SAS address = 0x500304800020a127 attached phy identifier = 7 Invalid DWORD count = 1388 Running disparity error count = 1311 Loss of DWORD synchronization = 344 Phy reset problem = 0 Phy event descriptors: Invalid word count: 1388 Running disparity error count: 1311 Loss of dword synchronization count: 344 Phy reset problem count: 0 relative target port id = 2 generation code = 20 number of phys = 1 phy identifier = 1 attached device type: no device attached attached reason: unknown reason: power on negotiated logical link rate: phy enabled; unknown attached initiator port: ssp=0 stp=0 smp=0 attached target port: ssp=0 stp=0 smp=0 SAS address = 0x5000cca03e2602ee attached SAS address = 0x0 attached phy identifier = 0 Invalid DWORD count = 0 Running disparity error count = 0 Loss of DWORD synchronization = 0 Phy reset problem = 0 Phy event descriptors: Invalid word count: 0 Running disparity error count: 0 Loss of dword synchronization count: 0 Phy reset problem count: 0 Edited June 25, 2019 by Jomo Quote Link to comment
JorgeB Posted June 25, 2019 Share Posted June 25, 2019 Long test completed, if also no errors during preclear you can add it back to the array and see how it goes, would recommend swapping cables/backplane slot with another disk to rule them out in case there are more errors in the future. Quote Link to comment
Jomo Posted June 25, 2019 Author Share Posted June 25, 2019 I think this preclear is a lost cause too. It's still on the pre-read section, 21 hours later, 93%. Last time I did a preclear, the preread took 6 hours. Quote Link to comment
AgentXXL Posted June 25, 2019 Share Posted June 25, 2019 I've been watching this thread as I am in the midst of upgrading my parity drive to a new 10TB Ironwolf. What's interesting is that it started out with speeds around 120MB/s and as the parity rebuild/sync continued it got progressively slower, dropping to 31MB/s for the lowest I saw. I suspect this is because it was taking more time to calculate the parity across all drives in the array, when dealing with actual data vs empty/zeroed space. My array currently has 2 x 8TB, 1 x 6TB, and 4 x 4TB drives. When the parity rebuild started, it was reading across all drives and all were spun up as it had to create parity by reading the sectors from all drives. Now that the parity rebuild is at the 72% mark, I noticed that the 4TB and 6TB drives are all spun down, i.e as it's past the size of those drives for the parity rebuild, it's now only needing to read the 2 x 8TB drives. And my parity rebuild speed is back to 112MB/s. I suspect once it hits the 80% mark, it'll go even faster as I don't have any 10TB data drives in the array so the last 2TB of parity should be null. Once the parity rebuild completes, I have 2 more new and precleared 8TB drives to add, which again shouldn't impact the parity as they are zeroed. Then I can start migrating the data from my 5 x 10TB USB drives (UD mounted) to the array. As each 10TB drive is finished copying to the array, they'll be shucked and inserted into a reserved slot in my configuration. I'll then preclear them and add them to the data pool, giving me more storage for the next 10TB drive. I have another 6TB and one more 4TB drive that has to have its data migrated as well, so in the end my data array will consist of 5 x 10TB, 4 x 8TB, 2 x 6TB and 5 x 4TB drives - 114TB of raw storage, about 80% full by my estimate. In any case, I think the slow-down you were seeing @Jomo was related to your read errors, but overall a parity rebuild/resync will slow down while it's reading the sectors across all drives in the array. 4TB drives got dropped off the calculation at 40% and the 6TB drive at 60%. As it no longer needs to calculate parity for those drives (already completed), the speed is only that of the 2 x 8TB drives. Makes sense to me so I hope this scenario and explanation helps others understand the speed of parity rebuild/resync. Quote Link to comment
JorgeB Posted June 25, 2019 Share Posted June 25, 2019 3 minutes ago, AgentXXL said: What's interesting is that it started out with speeds around 120MB/s and as the parity rebuild/sync continued it got progressively slower, dropping to 31MB/s for the lowest I saw. I suspect this is because it was taking more time to calculate the parity across all drives in the array, when dealing with actual data vs empty/zeroed space. No, parity sync speed isn't affected by data, if no controller bottlenecks and using an adequate CPU it will be limited by the slowest disk at any point of the sync, using for example recent 7200rpm disks (and even some 5400rpm) typical start speed is 200MB/s and end speed is 100MB/s. Quote Link to comment
AgentXXL Posted June 25, 2019 Share Posted June 25, 2019 (edited) 9 minutes ago, johnnie.black said: No, parity sync speed isn't affected by data, if no controller bottlenecks and using an adequate CPU it will be limited by the slowest disk at any point of the sync, using for example recent 7200rpm disks (and even some 5400rpm) typical start speed is 200MB/s and end speed is 100MB/s. Then I suspect my speed drops were related to the older 6TB and 4TB drives that are 5400rpm and lower cache on each than newer models. Regardless, the lower speeds were only while those smaller drives were part of the parity calculation. I am only using a 4 core i7-6700K right now, with unRAID and other dockers/tasks pinned to the 1st core (2 threads with HT), and Plex pinned to the other 3 (6 threads with HT). I'll be upgrading my motherboard, CPU and RAM when I save up enough money in a couple of months, likely to a Ryzen 2 16 core setup. I may also replace the PCIe 2.0 LSI controller with a PCIe 3.0 variant to improve disk speed performance, but for now I'm content as I've been able to stream my highest bitrate 4K titles (86.5Mbps average according to Plex) with no issues. Edited June 25, 2019 by AgentXXL Clarification Quote Link to comment
Jomo Posted June 25, 2019 Author Share Posted June 25, 2019 When I canceled the parity sync the 1st time, the speed had dropped to like 300 KB/s, it has an estimated remaining time of 40 ish days. Quote Link to comment
JorgeB Posted June 25, 2019 Share Posted June 25, 2019 7 minutes ago, AgentXXL said: Then I suspect my speed drops were related to the older 6TB and 4TB drives that are 5400rpm and lower cache on each than newer models. You can use the diskspeed docker to give you an idea of each disk performance, sometimes disks develop a few slow sector zones. Quote Link to comment
AgentXXL Posted June 25, 2019 Share Posted June 25, 2019 5 minutes ago, johnnie.black said: You can use the diskspeed docker to give you an idea of each disk performance, sometimes disks develop a few slow sector zones. Thanks! I'm OK with the slower speeds, as those older disks are mostly full now anyhow. And so far, their read speeds for use with Plex has been fine (all movies and TV, mostly remux vs re-encode). And eventually as I save up enough cash, I'll replace the older drives with newer drives of higher capacity. Quote Link to comment
Jomo Posted June 25, 2019 Author Share Posted June 25, 2019 This is what I got for my drive Quote Link to comment
Vr2Io Posted June 25, 2019 Share Posted June 25, 2019 (edited) From your SMART, writeback cache in disable !? It shouldn't affect read speedtest, but you got a problem curve. Edited June 25, 2019 by Benson Quote Link to comment
Jomo Posted June 25, 2019 Author Share Posted June 25, 2019 11 minutes ago, Benson said: From your SMART, writeback cache in disable !? It shouldn't affect read speedtest, but you got a problem curve. I"m, obviously, a noob at this stuff. So, yeah I don't know what that means Quote Link to comment
Vr2Io Posted June 25, 2019 Share Posted June 25, 2019 16 minutes ago, Jomo said: I"m, obviously, a noob at this stuff. So, yeah I don't know what that means May be ref below post to confirm SAS drive writeback cache status or how to enable it. Quote Link to comment
Jomo Posted June 25, 2019 Author Share Posted June 25, 2019 Thank you, though it is not clear to me what that setting does Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.