February 17, 201313 yr I've been running unRAID for about a year, without any major issues till main menu issues a week ago which have since been fixed. Since then I transfered my unRAID setup into a new case, with a recommended seasonic psu, a new cache drive and reassigned the old one after another preclear. Now it wasn't showing any errors initially, but now the story starts - which probably would've been picked up sooner had I inspected smart report. But anyways... The 1st parity check says 0 sync errors but ~2170 errors (usually 0). 2nd parity check says 0 errors but ~270 errors (less errors, but still there). I note the appearance of "current_pending_sector=7" in "myMain" - "[smart]" after both parity checks. In the logs I'm getting many disk errors on the parity drive. These errors look like this: Feb 17 13:21:42 Fridge kernel: handle_stripe read error: 969418664/0, count: 1 (Errors) Feb 17 13:21:42 Fridge kernel: md: disk0 read error (Errors) Here is the results of the smart test: === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 121) The previous self-test completed having the read element of the test failed. Total time to complete Offline data collection: (51480) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x3035) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 152 148 021 Pre-fail Always - 9400 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 958 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 089 089 000 Old_age Always - 8566 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 83 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 49 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 2965 194 Temperature_Celsius 0x0022 118 102 000 Old_age Always - 34 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 7 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 2 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed: read failure 90% 8566 969308424 # 2 Short offline Completed: read failure 90% 8565 969308424 # 3 Short offline Completed: read failure 90% 1004 519639880 # 4 Short offline Completed: read failure 90% 1004 519639880 # 5 Short offline Completed without error 00% 442 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. From the looks of Smart report the errors on the drive may have been around from LifeTime of 1004 hours or earlier but I hadn't noticed, latest report is at 8566 hours. I'm seeking advice on what to do? If it had completed without error, from what I've read, I could rebuild parity which should remove the pending sector count and fix the issue. But this isn't the case here. Parity shows as valid.
February 18, 201313 yr Author You should be able to fix that with a correcting parity check. Ok I'll try this. It'll take at least 12 hours to complete, will report back after.
February 18, 201313 yr Author Edit: Removed previous message. Decided to use "New Config" and remove a HDD I suspected was the problem (drive only had 4GB data on it, and had passed preclear). Parity rebuild finished with zero errors and current pending sector count is gone. I'm running another pre-clear on the drive, but guess I may just have to stop using it? Edit: Update. Turns out the 3TB parity drive was to cause of the parity errors.
Archived
This topic is now archived and is closed to further replies.