dogbowl Posted June 17, 2015 Share Posted June 17, 2015 I've got a server with 10 drives that has been running well for a number of years. Based on a supermicro motherboard with 2 AOC-SASLP-MV8 cards and running 5.0.5 Recently swapped out a drive for a larger one and I've been hit with very slow rebuild, running around 2.5 MB/sec. There are some SMART errors that could be the and some issues in the log too. I'm not sure what I need to be concerned with and what should be tackled first. My gut is to just leave it to let it finish and then resolve the problems. On boot up, I get this: Is this something to correct immediately? Jun 16 22:43:49 Tower kernel: ata10.00: HPA detected: current 3907027055, native 3907029168 (Errors) That same drive (the one I just added) is throwing the UDMA CRC errors seen in the screenshot, however that number is not increasing over time. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 188 180 021 Pre-fail Always - 5575 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 52 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 079 079 000 Old_age Always - 16056 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 36 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 25 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 26 194 Temperature_Celsius 0x0022 121 108 000 Old_age Always - 29 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 33956 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0 On an older drive, I'm getting this multi_zone_error. Is that an indicator that the drive is failing -- and could this be the cause of the slow rebuild? SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 2 3 Spin_Up_Time 0x0027 175 172 021 Pre-fail Always - 4233 4 Start_Stop_Count 0x0032 097 097 000 Old_age Always - 3741 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0 9 Power_On_Hours 0x0032 069 069 000 Old_age Always - 22896 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 61 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 35 193 Load_Cycle_Count 0x0032 199 199 000 Old_age Always - 3707 194 Temperature_Celsius 0x0022 115 107 000 Old_age Always - 32 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 9 syslog-2015-06-17.txt Link to comment
dgaschk Posted June 17, 2015 Share Posted June 17, 2015 http://lime-technology.com/forum/index.php?topic=9880.0 Link to comment
dogbowl Posted June 17, 2015 Author Share Posted June 17, 2015 I'm not sure I follow. I've listed my version and attached my full syslog. Are you suggesting I need to upgrade to 5.0.6 before posting? Link to comment
dgaschk Posted June 17, 2015 Share Posted June 17, 2015 Check for BIOS and SATA card firmware updates. Link to comment
dogbowl Posted June 18, 2015 Author Share Posted June 18, 2015 Is it okay to power off the server while a data rebuild is in place? I don't recall the firmware versions I've got off hand, that would be the only way to check. This isn't something I would suspect as the only thing I've changed is to to replace a 1tb drive with a 2tb drive... After 24 hours, I'm only at 10% complete. Is this stressing my other drives? Link to comment
dogbowl Posted June 18, 2015 Author Share Posted June 18, 2015 Latest syslog attached. Not much added since the initial boot up. My plan for now is to wait out this slow rebuild and then replace the drive with multi_zone_errors. Once I've rebuilt from that, then I'll address the HPA size mismatch. Thanks for any help syslog-2015-06-18.txt Link to comment
dogbowl Posted June 24, 2015 Author Share Posted June 24, 2015 Updating my posts here. After 6 days of rebuilding (with 4 more to go) I couldn't take it any longer. Stopped the rebuild and reboot into safe mode and my speed came back to normal (80 Mb/sec). My GO file has some suspicious entries, so I've now commented those out. My data-restore finally finished and I first corrected the hd size with the hdparam -N command. I was able to do that through the console. Currently rebuilding the data again and once that is done will address the SMART errors that have come up through all this disk thrashing. I'm guessing my slowness had something to do with an old version of Samba being loaded. I can't recall why that was in my GO file. Link to comment
dogbowl Posted June 25, 2015 Author Share Posted June 25, 2015 Problems -- I replaced the drive that showed crc_errors (with a larger drive). That data rebuild ran overnight and completed this afternoon. The raid is up and running as expected however I've now got a notification that "Parity updated 192062696 times to address sync errors" Disk 2(The one I ran hdparm against) shows 192062778 "errors". At this point, my best guess is to run a parity check and see how it turns out. My syslog is 130 megs uncompressed (compressed as 7zip but attached with .zip extension) update - when I attempted to run the parity check, unraid immediately brought that drive offline. Says "Disabled, old disk present". So, looks like 3rd drive swap is in store for me.... syslog-2015-06-25_small.zip Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.