January 20, 201412 yr Been running Unraid 4.7 for a few years. 6x2TB (WD Black Drives). Until a few hours ago, everything was working great (coincidentally I had run a parity check last night and everything was fine). All of a sudden, performance is very slow. I stopped the array, shutdown the server and rebooted. Still slow. Running a parity check now and it is 900kb/sec. I've grabbed the syslog (attached) but it is meaningless to me. I am guessing drive 3 has gone bad based on the Unmenu Smart Report (below). I do have a spare HD on hand to swap. Parity check says it is going to take another 18,781 minutes--so any help before that is appreciated. Short Smart Test Result from Drive 3. Agree I should replace this drive? SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 192 192 051 Pre-fail Always - 67336 3 Spin_Up_Time 0x0027 253 253 021 Pre-fail Always - 7650 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 29 5 Reallocated_Sector_Ct 0x0033 077 077 140 Pre-fail Always FAILING_NOW 983 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 070 070 000 Old_age Always - 22078 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 21 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 20 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 8 194 Temperature_Celsius 0x0022 103 096 000 Old_age Always - 49 196 Reallocated_Event_Count 0x0032 001 001 000 Old_age Always - 951 197 Current_Pending_Sector 0x0032 198 198 000 Old_age Always - 827 198 Offline_Uncorrectable 0x0030 200 198 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 183 001 000 Old_age Offline - 3521 Thanks Chad syslog.txt
January 20, 201412 yr Author Thank you. Replaced the drive and array rebuild in process. Speed appears to be back to normal.
January 20, 201412 yr Glad to hear that. I have never seen a hard drive give a Failing_Now indicator on the Reallocated_Sector_Ct before! That looks like one angry hard drive! If you haven't already, you may be able to RMA that drive since it should have a 5 year warranty.
January 20, 201412 yr Keep an eye out on the following attributes on all your drives: Reallocated_Sector_Ct Reallocated_Event_Count Current_Pending_Sector Offline_Uncorrectable If any are showing a non-zero value, replace that drive ASAP. If UDMA_CRC_Error_Count is showing a non-zero value and is incrementing, change out your SATA cable and watch to see if the value stabilizes.
January 20, 201412 yr It's fortunate that yesterday's parity check was good -- that means the rebuild should be fine. As noted above, be sure to check the failed drive's serial number on WD's site => it may very well still be in warranty. The "bad" news is that they're very likely to replace it with a larger drive ... so once your rebuild finishes, and you do a confirming parity check (always a good idea after a rebuild), you should consider upgrading to v5.0.4, so you'll be ready to swap your parity drive for a larger one, if indeed WD sends you a 3 or 4 TB drive
January 21, 201412 yr Author Yep--under warranty. Keep an eye out on the following attributes on all your drives: Reallocated_Sector_Ct Reallocated_Event_Count Current_Pending_Sector Offline_Uncorrectable If any are showing a non-zero value, replace that drive ASAP. If UDMA_CRC_Error_Count is showing a non-zero value and is incrementing, change out your SATA cable and watch to see if the value stabilizes. Well, good news bad news. 4 of my remaining drives pass this test. But one does not. I guess once the array rebuild is complete, it is time to swap another drive. Woo Hoo! Hopefully no issues with the rebuild or Parity check. Thank you all for the assistance.
January 21, 201412 yr Well the good news is with WD you can do the advanced replacement where they send you the replacement drive first before you send in the bad one. Keep an eye out on the following attributes on all your drives: Reallocated_Sector_Ct Reallocated_Event_Count Current_Pending_Sector Offline_Uncorrectable If any are showing a non-zero value, replace that drive ASAP. Also, If your drive has a "Reallocated_Sector_Ct, Reallocated_Event_Count, or Offline_Uncorrectable" of (1 - 10) that is not the end of the world. I have had a drive with a value of 1 for the entire time I have used unRAID and the drive is fine. Just make sure that the count does not increase over a short period of time. If it does then replace ASAP.
January 21, 201412 yr Reallocated sectors are no big deal as long as the count doesn't continually increase. Modern drives are DESIGNED to automatically re-allocated bad sectors ... that's why they have a bunch of spare sectors. Several of my oldest drives (some of which have over 50,000 hours of runtime) have a few reallocated sectors ... but they've been working perfectly for years and continue to do so. I'd do an advance replacement for the drive you're now rebuilding; then when you get the replacement drive for it upgrade your parity if it's a > 2TB drive; or just replace the other failing drive if it's 2TB. Then do another advance replacement for the 2nd drive.
January 21, 201412 yr Author Here is the Smart Report for the other drive with errors. Not nearly as bad as the first one, but..... Current_Pending_Sector is "6" Offline_Uncorrectable is "1" I'll watch it and see what happens. Just want to get the unraid array rebuilt without issues.... SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 253 253 021 Pre-fail Always - 8766 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 32 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 070 070 000 Old_age Always - 22058 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 19 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 17 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 14 194 Temperature_Celsius 0x0022 110 097 000 Old_age Always - 42 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 6 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 1 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 9
January 21, 201412 yr CHANGES are much more significant than a few non-zero numbers ... so you've got the right idea (do frequent SMART checks and see if it's changing). Also pay attention to the warranty expiration date ... if it's close, and there are issues, I'd go ahead and do an advance RMA and get the drive replaced.
Archived
This topic is now archived and is closed to further replies.