Jump to content

Slow rebuild - advice needed


Recommended Posts

I've got a server with 10 drives that has been running well for a number of years.  Based on a supermicro motherboard with 2 AOC-SASLP-MV8 cards and running 5.0.5

Recently swapped out a drive for a larger one and I've been hit with very slow rebuild, running around 2.5 MB/sec.

 

There are some SMART errors that could be the and some issues in the log too.  I'm not sure what I need to be concerned with and what should be tackled first.  My gut is to just leave it to let it finish and then resolve the problems.

 

On boot up, I get this:  Is this something to correct immediately?

Jun 16 22:43:49 Tower kernel: ata10.00: HPA detected: current 3907027055, native 3907029168 (Errors)

 

That same drive (the one I just added) is throwing the UDMA CRC errors seen in the screenshot, however that number is not increasing over time.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   188   180   021    Pre-fail  Always       -       5575
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       52
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   079   079   000    Old_age   Always       -       16056
10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       36
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       25
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       26
194 Temperature_Celsius     0x0022   121   108   000    Old_age   Always       -       29
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       33956
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

 

On an older drive, I'm getting this multi_zone_error.  Is that an indicator that the drive is failing -- and could this be the cause of the slow rebuild?

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       2
  3 Spin_Up_Time            0x0027   175   172   021    Pre-fail  Always       -       4233
  4 Start_Stop_Count        0x0032   097   097   000    Old_age   Always       -       3741
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   069   069   000    Old_age   Always       -       22896
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       61
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       35
193 Load_Cycle_Count        0x0032   199   199   000    Old_age   Always       -       3707
194 Temperature_Celsius     0x0022   115   107   000    Old_age   Always       -       32
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       9

 

syslog-2015-06-17.txt

unraid.PNG.b6a9efb61438ed509bc903edb6b97427.PNG

Link to comment

Is it okay to power off the server while a data rebuild is in place?  I don't recall the firmware versions I've got off hand, that would be the only way to check.

This isn't something I would suspect as the only thing I've changed is to to replace a 1tb drive with a 2tb drive...

 

After 24 hours, I'm only at 10% complete.  Is this stressing my other drives?

Link to comment

Updating my posts here.

 

After 6 days of rebuilding (with 4 more to go) I couldn't take it any longer.  Stopped the rebuild and reboot into safe mode and my speed came back to normal (80 Mb/sec).  My GO file has some suspicious entries, so I've now commented those out.

 

My data-restore finally finished and I first corrected the hd size with the hdparam -N command.  I was able to do that through the console.

Currently rebuilding the data again and once that is done will address the SMART errors that have come up through all this disk thrashing.

 

I'm guessing my slowness had something to do with an old version of Samba being loaded.  I can't recall why that was in my GO file.

Link to comment

Problems --

 

I replaced the drive that showed crc_errors (with a larger drive).  That data rebuild ran overnight and completed this afternoon.  The raid is up and running as expected however I've now got a notification that "Parity updated 192062696  times to address sync errors"

Disk 2(The one I ran hdparm against) shows 192062778 "errors".

 

At this point, my best guess is to run a parity check and see how it turns out.  My syslog is 130 megs uncompressed (compressed as 7zip but attached with .zip extension)

 

update - when I attempted to run the parity check, unraid immediately brought that drive offline.  Says "Disabled, old disk present".  So, looks like  3rd drive swap is in store for me....

syslog-2015-06-25_small.zip

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...