June 17, 201511 yr I've got a server with 10 drives that has been running well for a number of years. Based on a supermicro motherboard with 2 AOC-SASLP-MV8 cards and running 5.0.5 Recently swapped out a drive for a larger one and I've been hit with very slow rebuild, running around 2.5 MB/sec. There are some SMART errors that could be the and some issues in the log too. I'm not sure what I need to be concerned with and what should be tackled first. My gut is to just leave it to let it finish and then resolve the problems. On boot up, I get this: Is this something to correct immediately? Jun 16 22:43:49 Tower kernel: ata10.00: HPA detected: current 3907027055, native 3907029168 (Errors) That same drive (the one I just added) is throwing the UDMA CRC errors seen in the screenshot, however that number is not increasing over time. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 188 180 021 Pre-fail Always - 5575 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 52 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 079 079 000 Old_age Always - 16056 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 36 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 25 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 26 194 Temperature_Celsius 0x0022 121 108 000 Old_age Always - 29 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 33956 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0 On an older drive, I'm getting this multi_zone_error. Is that an indicator that the drive is failing -- and could this be the cause of the slow rebuild? SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 2 3 Spin_Up_Time 0x0027 175 172 021 Pre-fail Always - 4233 4 Start_Stop_Count 0x0032 097 097 000 Old_age Always - 3741 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0 9 Power_On_Hours 0x0032 069 069 000 Old_age Always - 22896 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 61 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 35 193 Load_Cycle_Count 0x0032 199 199 000 Old_age Always - 3707 194 Temperature_Celsius 0x0022 115 107 000 Old_age Always - 32 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 9 syslog-2015-06-17.txt
June 17, 201511 yr Author I'm not sure I follow. I've listed my version and attached my full syslog. Are you suggesting I need to upgrade to 5.0.6 before posting?
June 18, 201511 yr Author Is it okay to power off the server while a data rebuild is in place? I don't recall the firmware versions I've got off hand, that would be the only way to check. This isn't something I would suspect as the only thing I've changed is to to replace a 1tb drive with a 2tb drive... After 24 hours, I'm only at 10% complete. Is this stressing my other drives?
June 18, 201511 yr Author Latest syslog attached. Not much added since the initial boot up. My plan for now is to wait out this slow rebuild and then replace the drive with multi_zone_errors. Once I've rebuilt from that, then I'll address the HPA size mismatch. Thanks for any help syslog-2015-06-18.txt
June 24, 201511 yr Author Updating my posts here. After 6 days of rebuilding (with 4 more to go) I couldn't take it any longer. Stopped the rebuild and reboot into safe mode and my speed came back to normal (80 Mb/sec). My GO file has some suspicious entries, so I've now commented those out. My data-restore finally finished and I first corrected the hd size with the hdparam -N command. I was able to do that through the console. Currently rebuilding the data again and once that is done will address the SMART errors that have come up through all this disk thrashing. I'm guessing my slowness had something to do with an old version of Samba being loaded. I can't recall why that was in my GO file.
June 25, 201511 yr Author Problems -- I replaced the drive that showed crc_errors (with a larger drive). That data rebuild ran overnight and completed this afternoon. The raid is up and running as expected however I've now got a notification that "Parity updated 192062696 times to address sync errors" Disk 2(The one I ran hdparm against) shows 192062778 "errors". At this point, my best guess is to run a parity check and see how it turns out. My syslog is 130 megs uncompressed (compressed as 7zip but attached with .zip extension) update - when I attempted to run the parity check, unraid immediately brought that drive offline. Says "Disabled, old disk present". So, looks like 3rd drive swap is in store for me.... syslog-2015-06-25_small.zip
Archived
This topic is now archived and is closed to further replies.