mcascio Posted January 9, 2012 Posted January 9, 2012 I recently ran a pre-clear on a WD20EARX 2 TB drive and it took around 25-30 hours. I just recieved a replacement WD20EARS 2 TB drive that I am currently running preclear. It's been 70 hours and the Post-Read is at 12% complete and 2.4 MB/s. Is this normal? Here's a smart report I did while the drive was still going through pre-clear: smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build) Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Device Model: WDC WD20EARS-60MVWB0 Serial Number: WD-WCAZAC335127 Firmware Version: 51.0AB51 User Capacity: 2,000,398,934,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Mon Jan 9 15:33:21 2012 Local time zone must be set--see zic m SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x80) Offline data collection activity was never started. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (40200) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 255) minutes. SCT capabilities: (0x303d) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 100 253 021 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 5 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002f 100 253 051 Pre-fail Always - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 71 10 Spin_Retry_Count 0x0033 100 253 051 Pre-fail Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 4 184 End-to-End_Error 0x0033 100 100 097 Pre-fail Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 075 070 040 Old_age Always - 25 (Lifetime Min/Max 23/30) 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 2 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 2 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
elkay14 Posted January 9, 2012 Posted January 9, 2012 That is way too slow. Are there any relevant lines in your syslog?
mcascio Posted January 9, 2012 Author Posted January 9, 2012 Thanks for chiming in. Here's my syslog. BTW, I also had this same problem with another Seagate 1.5TB drive that was reconditioned from the factory. syslog-2012-01-09.zip
mcascio Posted January 9, 2012 Author Posted January 9, 2012 It's plugged into an iStar BPA350 5 bay chassis which then connects to the Motherboard.
mcascio Posted January 9, 2012 Author Posted January 9, 2012 I found these errors in the syslog for the disk - note the erros in bold. Does anyone know what these mean? Jan 6 16:46:00 MLDataServer kernel: ------------[ cut here ]------------ (Drive related) Jan 6 16:46:00 MLDataServer kernel: ---[ end trace 4eaa2a86a8e2da22 ]--- (Drive related) Jan 6 16:46:00 MLDataServer kernel: scsi4 : ahci (Drive related) Jan 6 16:46:00 MLDataServer kernel: ata4: SATA max UDMA/133 abar m1024@0xfeb4f000 port 0xfeb4f280 irq 19 (Drive related) Jan 6 16:46:00 MLDataServer kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300) (Drive related)Jan 6 16:46:00 MLDataServer kernel: ata4.00: ATA-8: WDC WD20EARS-60MVWB0, 51.0AB51, max UDMA/100 (Drive related) Jan 6 16:46:00 MLDataServer kernel: ata4.00: 3907029168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA (Drive related) Jan 6 16:46:00 MLDataServer kernel: ata4.00: configured for UDMA/100 (Drive related) Jan 6 16:46:00 MLDataServer kernel: scsi 4:0:0:0: Direct-Access ATA WDC WD20EARS-60M 51.0 PQ: 0 ANSI: 5 (Drive related) Jan 6 16:46:00 MLDataServer kernel: sd 4:0:0:0: [sdd] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB) (Drive related) Jan 6 16:46:00 MLDataServer kernel: sd 4:0:0:0: [sdd] 4096-byte physical blocks (Drive related) Jan 6 16:46:00 MLDataServer kernel: sd 4:0:0:0: [sdd] Write Protect is off (Drive related) Jan 6 16:46:00 MLDataServer kernel: sd 4:0:0:0: [sdd] Mode Sense: 00 3a 00 00 (Drive related) Jan 6 16:46:00 MLDataServer kernel: sd 4:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA (Drive related) Jan 6 16:46:00 MLDataServer kernel: sdd: (Drive related) Jan 6 16:46:00 MLDataServer kernel: sd 4:0:0:0: [sdd] Attached SCSI disk (Drive related) Jan 6 16:46:00 MLDataServer emhttp: pci-0000:00:11.0-scsi-3:0:0:0 host4 (sdd) WDC_WD20EARS-60MVWB0_WD-WCAZAC335127 (Drive related) Jan 6 16:46:00 MLDataServer kernel: md: unRAID driver 1.1.1 installed (System) Jan 6 16:46:00 MLDataServer kernel: md: import disk0: [3,0] (hda) WDC WD20EARS-00S8B1 WD-WCAVY3915117 size: 1953514552 (Drive related) Jan 6 16:46:00 MLDataServer kernel: md: import disk1: [8,80] (sdf) ST31500541AS 6XW020BT size: 1465138552 (Drive related) Jan 6 16:46:00 MLDataServer kernel: md: import disk2: [8,96] (sdg) ST31500541AS 6XW00EZS size: 1465138552 (Drive related) Jan 6 16:46:00 MLDataServer kernel: md: import disk3: [8,112] (sdh) ST31500541AS 6XW020P2 size: 1465138552 (Drive related) Jan 6 16:46:00 MLDataServer kernel: md: import disk4: [8,128] (sdi) ST31500541AS 6XW00HSV size: 1465138552 (Drive related) Jan 6 16:46:00 MLDataServer kernel: md: import disk5: [8,16] (sdb) WDC WD20EARX-00P WD-WMAZA5535730 size: 1953514552 (Drive related) Jan 6 16:46:00 MLDataServer kernel: md: recovery thread woken up ... (Drive related) Jan 6 16:46:00 MLDataServer kernel: md: recovery thread has nothing to resync (Drive related) Jan 6 16:56:31 MLDataServer kernel: md: recovery thread woken up ... (Drive related) Jan 6 16:56:31 MLDataServer kernel: md: recovery thread has nothing to resync (Drive related) Jan 6 16:57:13 MLDataServer kernel: sdd: unknown partition table (Drive related) Jan 7 07:01:56 MLDataServer kernel: sdd: sdd1 (Drive related) Jan 9 19:42:02 MLDataServer kernel: ata4.00: exception Emask 0x10 SAct 0x1 SErr 0x40d0202 action 0xe frozen (Errors) Jan 9 19:42:02 MLDataServer kernel: ata4.00: irq_stat 0x00400040, connection status changed (Drive related) Jan 9 19:42:02 MLDataServer kernel: ata4: SError: { RecovComm Persist PHYRdyChg CommWake 10B8B DevExch } (Errors) Jan 9 19:42:02 MLDataServer kernel: ata4.00: failed command: READ FPDMA QUEUED (Minor Issues) Jan 9 19:42:02 MLDataServer kernel: ata4.00: cmd 60/00:00:38:3e:77/02:00:1e:00:00/40 tag 0 ncq 262144 in (Drive related) Jan 9 19:42:02 MLDataServer kernel: ata4.00: status: { DRDY } (Drive related)Jan 9 19:42:02 MLDataServer kernel: ata4: hard resetting link (Minor Issues) Jan 9 19:42:03 MLDataServer kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300) (Drive related) Jan 9 19:42:03 MLDataServer kernel: ata4.00: configured for UDMA/100 (Drive related) Jan 9 19:42:03 MLDataServer kernel: ata4: EH complete (Drive related)
mcascio Posted January 9, 2012 Author Posted January 9, 2012 I have this power supply: http://www.thermaltakeusa.com/Product.aspx?C=1264&ID=1854 Thermaltake XT 750 W
dgaschk Posted January 9, 2012 Posted January 9, 2012 Probably a loose or bad cable. See here: http://lime-technology.com/wiki/index.php?title=The_Analysis_of_Drive_Issues#Drive_Interface_Issues It's not the power supply unless you have a lot of drives.
elkay14 Posted January 9, 2012 Posted January 9, 2012 Yeah, or the backplane connection on the cage is flakey. Try another slot in the cage or directly connect it.
vca Posted January 11, 2012 Posted January 11, 2012 I've seen this sort of thing twice with brand new WD 2TB drives. Both times there were no issues in the SMART or syslogs, so I took them back to my dealer and got a replacement. I think some people have stated that once the first preclear pass was done the second pass ran at normal speed, but that was not the case for me. Regards, Stephen
mcascio Posted January 11, 2012 Author Posted January 11, 2012 I tried all sorts of hardware changes and different cables. None of which worked. I noticed an "Disabling IRQ #19" error being reported. Which led me to do some more searching. I went into the Bios and set up the Sata as AHCI and also set the CPU to run at maximum performance. This last Pre-Clear on the same drive is doing much better. I'm running at 98 MB/s with 24% of the Post-Read complete. It's on about 17 Hours thus far. So I'm hoping one of those two things solved the problem. Fingers crossed! If this one works, I'll try running the 2nd pre-clear on this drive and the Seagate that also had a slow pre-clear.
elkay14 Posted January 12, 2012 Posted January 12, 2012 From my experience the ports should always be set for AHCI for unRAID use. The AHCI Linux driver seems solid; some of the others might not be so good.
mcascio Posted January 12, 2012 Author Posted January 12, 2012 Doh. Woke up this morning to the Pre-Clear Post-Read at 77% complete and 2.5 MB/s. I have 9 drives in the system including parity and cache.
mcascio Posted January 12, 2012 Author Posted January 12, 2012 I just updated the Bios on my Asus E35M1-M Pro from 1002 to 1502: E35M1-M-PRO BIOS 1502 http://www.asus.com/Motherboards/AMD_CPU_on_Board/E35M1M_PRO/#download Running Pre-Clear again. When did hard hard drives become such a complicated thing.
mcascio Posted January 14, 2012 Author Posted January 14, 2012 Well, I'm happy to report that I finally made it through the PreClear at normal speeds after updating the BIOS to the latest version. So hopefully that was it. I'm running it a 2nd time to see if it makes it through again.
mcascio Posted January 17, 2012 Author Posted January 17, 2012 Ok. Spoke to soon. The 2nd Pre-Clear on the same drive slowed to 2.4 MB/s after 30% complete on the Post Read. It was about 30 hrs in. So I decided to try a brand new WD20EARX in the same slot. It too suffered the same fate and slowed to 2.4 MB/S during the Post-Read. It seems a lot of users are having similar problems. Are we sure there isn't something bigger at work here? Is there another BIOS setting that could need changing? Or even possibly something with the Pre-Clear software? Can I still use these drives if I just let them make their way through the Pre-Clear? Here are some of the errors in the syslog: Jan 16 14:47:33 MLDataServer kernel: ACPI Error: No handler for Region [sACS] (f74c21b8) [PCI_Config] (20090903/evregion-319) (Errors) Jan 16 14:47:33 MLDataServer kernel: ACPI Error: Region PCI_Config(2) has no handler (20090903/exfldio-295) (Errors) Jan 16 14:47:33 MLDataServer kernel: ACPI Error (psparse-0537): Method parse/execution failed [\PRID.P_D0._STA] (Node f741e7b0), AE_NOT_EXIST (Minor Issues) Jan 16 14:47:33 MLDataServer kernel: ACPI Error (uteval-0250): Method execution failed [\PRID.P_D0._STA] (Node f741e7b0), AE_NOT_EXIST (Minor Issues) Jan 16 14:47:33 MLDataServer kernel: ACPI Error: No handler for Region [sACS] (f74c21b8) [PCI_Config] (20090903/evregion-319) (Errors) Jan 16 14:47:33 MLDataServer kernel: ACPI Error: Region PCI_Config(2) has no handler (20090903/exfldio-295) (Errors) Jan 16 14:47:33 MLDataServer kernel: ACPI Error (psparse-0537): Method parse/execution failed [\PRID.P_D1._STA] (Node f741e858), AE_NOT_EXIST (Minor Issues) Jan 16 14:47:33 MLDataServer kernel: ACPI Error (uteval-0250): Method execution failed [\PRID.P_D1._STA] (Node f741e858), AE_NOT_EXIST (Minor Issues) Jan 16 14:47:33 MLDataServer kernel: ACPI Error: No handler for Region [sACS] (f74c21b8) [PCI_Config] (20090903/evregion-319) (Errors) Jan 16 14:47:33 MLDataServer kernel: ACPI Error: Region PCI_Config(2) has no handler (20090903/exfldio-295) (Errors) Jan 16 14:47:33 MLDataServer kernel: ACPI Error (psparse-0537): Method parse/execution failed [\SECD.S_D0._STA] (Node f741e9c0), AE_NOT_EXIST (Minor Issues) Jan 16 14:47:33 MLDataServer kernel: ACPI Error (uteval-0250): Method execution failed [\SECD.S_D0._STA] (Node f741e9c0), AE_NOT_EXIST (Minor Issues) Jan 16 14:47:33 MLDataServer kernel: ACPI Error: No handler for Region [sACS] (f74c21b8) [PCI_Config] (20090903/evregion-319) (Errors) Jan 16 14:47:33 MLDataServer kernel: ACPI Error: Region PCI_Config(2) has no handler (20090903/exfldio-295) (Errors) Jan 16 14:47:33 MLDataServer kernel: ACPI Error (psparse-0537): Method parse/execution failed [\SECD.S_D1._STA] (Node f741ea68), AE_NOT_EXIST (Minor Issues) Jan 16 14:47:33 MLDataServer kernel: ACPI Error (uteval-0250): Method execution failed [\SECD.S_D1._STA] (Node f741ea68), AE_NOT_EXIST (Minor Issues) Jan 17 11:37:13 MLDataServer kernel: irq 19: nobody cared (try booting with the "irqpoll" option) (Errors) Jan 17 11:37:13 MLDataServer kernel: Pid: 24180, comm: sum Tainted: G W 2.6.32.9-unRAID #8 (Errors) Jan 17 11:37:13 MLDataServer kernel: Call Trace: (Errors) Jan 17 11:37:13 MLDataServer kernel: [<c10451cf>] __report_bad_irq+0x2e/0x6f (Errors) Jan 17 11:37:13 MLDataServer kernel: [<c1045305>] note_interrupt+0xf5/0x13c (Errors) Jan 17 11:37:13 MLDataServer kernel: [<c1045a14>] handle_fasteoi_irq+0x5f/0x9d (Errors) Jan 17 11:37:13 MLDataServer kernel: [<c1004a82>] handle_irq+0x1a/0x24 (Errors) Jan 17 11:37:13 MLDataServer kernel: [<c1004285>] do_IRQ+0x40/0x96 (Errors) Jan 17 11:37:13 MLDataServer kernel: [<c1002f29>] common_interrupt+0x29/0x30 (Errors)
Hoopster Posted January 17, 2012 Posted January 17, 2012 You and I have the same issue. See my thread in this forum for a detailed account of my "disabling IRQ #x" saga and everything I have tried. I have a different MB on order with a different BIOS and NIC. We'll see if that improves things. http://lime-technology.com/forum/index.php?topic=17823.0
Hoopster Posted January 17, 2012 Posted January 17, 2012 Your problem has nothing to do with the hard drives themselves. It has to do with whatever is messing with the IRQs and disabling the IRQ associated with the drives. In my case, all the drives on the PCIe SATA controller get assigned to IRQ #16 as that is where the sata_mv Linux driver gets assigned by the BIOS. Something happens to disable that IRQ and then performance tanks. Just like in your case, I go from >100 MB/s preclears to 2.6 MB/s preclears when the IRQ gets disabled. A BIOS update did nothing for me, but, at least (after many BIOS tweaks) the cause for me seems to be identifiable as related to switching video inputs on my monitor. Why that affects IRQs I don't know but, it is now 100% repeatable.
mcascio Posted January 18, 2012 Author Posted January 18, 2012 Interesting. I'm anxious to see what you come up with to solve the problem. I've already completely rebuilt my system due to other issues and I was hoping it was going to be smooth sailing. Apparently it's extremely difficult to find pieces that play nicely together. As a side note, I've disabled the Realtek NIC card in the BIOS. I didn't think it mattered since I was using a PCI-E Rosewill Nic card already based upon the forum saying that the Realtek has issues. I'm about 74% through the Post-Read with speads of 72.3 MB/S currently. I'm hoping it makes it through this Pre-Clear and 2 more. I have my unraid system connected to a 100 MB switch (while it sits on my desk out of the rack) which runs to a Gigabit switch...that wouldn't be causing issues would it?
mcascio Posted January 19, 2012 Author Posted January 19, 2012 Ok. I've successfully finished the first Pre-Clear at normal speeds since disabling the RealTek NIC. I'm currently on the 2nd Pre-Clear and it is still running at 100 MB/S through the Post-read and 25% complete. Fingers crossed.
mcascio Posted January 20, 2012 Author Posted January 20, 2012 I made it through the 2nd Pre-Clear successfully. On to the third and will report back.
mcascio Posted January 22, 2012 Author Posted January 22, 2012 I successfully pre-cleared the same drive 3x. Then I tried another 2TB drive and it slowed down again. Are all 2 TB drives supposed to start on sector 64, or if it's older, you still start on sector 63? I'm running unraid 4.7.
dgaschk Posted January 23, 2012 Posted January 23, 2012 All new drives should be 4k-aligned. Existing non-AF drives can be left non-aligned. There is no reason to change them.
mcascio Posted January 26, 2012 Author Posted January 26, 2012 All new drives should be 4k-aligned. Existing non-AF drives can be left non-aligned. There is no reason to change them. Does it matter though if they are 4k aligned? I have several that are older that have come back refurbished. How do I tell if they are non-AF drives? I guess would it do any harm if all future drives that get added to the system are 4k aligned since that's the default setting I have for Pre-Clear?
mcascio Posted January 26, 2012 Author Posted January 26, 2012 Can anyone tell me if this Pre-clear is good? It says No SMART attributes are failing_now. But I do see Spin_Retry_Count, Ene-To_end_Error, and Airflow_Temperature_Cel reporting as Near-Thresh. preclear_finish__6XW1DY91_2012-01-25.txt
Recommended Posts
Archived
This topic is now archived and is closed to further replies.