rsbonini Posted April 19, 2021 Share Posted April 19, 2021 New to setting up and using unRAID so please bear with me. I initially had a 4TB data drive and a 6TB drive for parity. I added two 8TB drives as parity drives to expand to 10TB of storage with additional redundancy, and to allow for storage expansion using 8TB drives. The steps I followed were: 1) Added an 8TB drive (sde) to the array as Parity 2, and had the system build parity on it. 2) Unassigned the 6TB drive (sdb), reassigned it as Disk 2, and had the array preclear it. 3) Added the second 8TB drive (sdd) to the array and started building parity on it as well. About halfway through the parity build on sdd, the system posted a notification that there were errors on sde and disabled the drive. After the parity build on sdd completed I ran an extended self test on sde. The SMART test results (following this post) give health status of PASS, and I believe the issue to be a single bad sector. I plan to shutdown and clean/re-seat all connections just in case. Sdd (Parity), sdb (Disk 2), and sdc (Disk 1) currently show normal-operation/active, while sde shows disabled. From what i can tell my options are either: a) unassign/reassign the sde drive to Parity 2 which will then be entirely rebuild; or b) perform a new config, and run a parity check with "Write corrections to parity" enabled wich will fix the problems on sde (Parity 2). Are there any other options or steps to be aware of? What are the relative pros and cons of the forgoing options? I'd prefer not to have to rebuild parity but if there is a distinct advantage to it, I'm happy to do so. Thanks! === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 121) The previous self-test completed having the read element of the test failed. Total time to complete Offline data collection: ( 0) seconds. Offline data collection capabilities: (0x73) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. No Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 991) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x30a5) SCT Status supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-- 100 064 006 - 5750 3 Spin_Up_Time PO---- 092 091 000 - 0 4 Start_Stop_Count -O--CK 098 098 020 - 2650 5 Reallocated_Sector_Ct PO--CK 100 100 010 - 0 7 Seek_Error_Rate POSR-- 081 060 045 - 141031912 9 Power_On_Hours -O--CK 087 087 000 - 11851 (133 112 0) 10 Spin_Retry_Count PO--C- 100 100 097 - 0 12 Power_Cycle_Count -O--CK 100 100 020 - 673 183 Runtime_Bad_Block -O--CK 100 100 000 - 0 184 End-to-End_Error -O--CK 100 100 099 - 0 187 Reported_Uncorrect -O--CK 099 099 000 - 1 188 Command_Timeout -O--CK 099 099 000 - 0 0 3 189 High_Fly_Writes -O-RCK 100 100 000 - 0 190 Airflow_Temperature_Cel -O---K 073 055 040 - 27 (Min/Max 23/35) 191 G-Sense_Error_Rate -O--CK 100 100 000 - 0 192 Power-Off_Retract_Count -O--CK 100 100 000 - 388 193 Load_Cycle_Count -O--CK 097 097 000 - 6260 194 Temperature_Celsius -O---K 027 045 000 - 27 (0 21 0 0 0) 195 Hardware_ECC_Recovered -O-RC- 100 064 000 - 5750 197 Current_Pending_Sector -O--C- 100 100 000 - 8 198 Offline_Uncorrectable ----C- 100 100 000 - 8 199 UDMA_CRC_Error_Count -OSRCK 200 200 000 - 0 240 Head_Flying_Hours ------ 100 253 000 - 7958h+18m+28.883s 241 Total_LBAs_Written ------ 100 253 000 - 82290957455 242 Total_LBAs_Read ------ 100 253 000 - 68402183133 ||||||_ K auto-keep |||||__ C event count ||||___ R error rate |||____ S speed/performance ||_____ O updated online |______ P prefailure warning General Purpose Log Directory Version 1 SMART Log Directory Version 1 [multi-sector log support] Address Access R/W Size Description 0x00 GPL,SL R/O 1 Log Directory 0x01 SL R/O 1 Summary SMART error log 0x02 SL R/O 5 Comprehensive SMART error log 0x03 GPL R/O 5 Ext. Comprehensive SMART error log 0x04 GPL,SL R/O 8 Device Statistics log 0x06 SL R/O 1 SMART self-test log 0x07 GPL R/O 1 Extended self-test log 0x08 GPL R/O 2 Power Conditions log 0x09 SL R/W 1 Selective self-test log 0x0c GPL R/O 2048 Pending Defects log 0x10 GPL R/O 1 NCQ Command Error log 0x11 GPL R/O 1 SATA Phy Event Counters log 0x21 GPL R/O 1 Write stream error log 0x22 GPL R/O 1 Read stream error log 0x24 GPL R/O 512 Current Device Internal Status Data log 0x30 GPL,SL R/O 9 IDENTIFY DEVICE data log 0x80-0x9f GPL,SL R/W 16 Host vendor specific log 0xa1 GPL,SL VS 24 Device vendor specific log 0xa2 GPL VS 8160 Device vendor specific log 0xa6 GPL VS 192 Device vendor specific log 0xa8-0xa9 GPL,SL VS 136 Device vendor specific log 0xab GPL VS 1 Device vendor specific log 0xb0 GPL VS 9048 Device vendor specific log 0xbd GPL VS 8 Device vendor specific log 0xbe-0xbf GPL VS 65535 Device vendor specific log 0xc0 GPL,SL VS 1 Device vendor specific log 0xc1 GPL,SL VS 16 Device vendor specific log 0xc3 GPL,SL VS 8 Device vendor specific log 0xc4 GPL,SL VS 24 Device vendor specific log 0xd1 GPL VS 264 Device vendor specific log 0xd3 GPL VS 1920 Device vendor specific log 0xe0 GPL,SL R/W 1 SCT Command/Status 0xe1 GPL,SL R/W 1 SCT Data Transfer SMART Extended Comprehensive Error Log Version: 1 (5 sectors) Device Error Count: 1 CR = Command Register FEATR = Features Register COUNT = Count (was: Sector Count) Register LBA_48 = Upper bytes of LBA High/Mid/Low Registers ] ATA-8 LH = LBA High (was: Cylinder High) Register ] LBA LM = LBA Mid (was: Cylinder Low) Register ] Register LL = LBA Low (was: Sector Number) Register ] DV = Device (was: Device/Head) Register DC = Device Control Register ER = Error register ST = Status register Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 1 [0] occurred at disk power-on lifetime: 11840 hours (493 days + 8 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 00 00 00 01 7a 54 10 08 00 00 Error: UNC at LBA = 0x17a541008 = 6347296776 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 60 00 00 04 00 00 01 7a 54 0e c0 40 00 1d+07:28:33.300 READ FPDMA QUEUED 60 00 00 04 00 00 01 7a 54 0a c0 40 00 1d+07:28:33.296 READ FPDMA QUEUED 60 00 00 03 b8 00 01 7a 3f b6 c0 40 00 1d+07:28:29.005 READ FPDMA QUEUED 60 00 00 04 00 00 01 7a 3f b2 c0 40 00 1d+07:28:29.002 READ FPDMA QUEUED 60 00 00 04 00 00 01 7a 3f ae c0 40 00 1d+07:28:28.998 READ FPDMA QUEUED SMART Extended Self-test Log Version: 1 (1 sectors) Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 90% 11851 6347296776 # 2 Short offline Aborted by host 90% 11841 - # 3 Short offline Completed without error 00% 11835 - Quote Link to comment
itimpi Posted April 19, 2021 Share Posted April 19, 2021 Not much difference between your options as both of them require every sector on the disabled parity disk to be accessed. The only difference is that in one you are reading every sector and in the other you are writing them. I personally would go with rebuilding parity as since you have already had a write to the drive fail (which is why it was disabled in the first place) you now want to know if you can reliably write to that drive without errors as if not the drive will need replacing. Quote Link to comment
rsbonini Posted April 19, 2021 Author Share Posted April 19, 2021 So from what I can tell, it was a failure on read as it occurred while building the other parity drive. I don't think this changes your point (and barring any other input I'll take your advice), but wanted to clarify. Quote Link to comment
itimpi Posted April 19, 2021 Share Posted April 19, 2021 Just now, rsbonini said: So from what I can tell, it was a failure on read as it occurred while building the other parity drive. I don't think this changes your point (and barring any other input I'll take your advice), but wanted to clarify. No - it was a write failure as that is the only time unRaid disables a drive. The write failure could have been triggered by a read failure which subsequently caused unRaid to then try and correct it by rewriting the sector it had just failed to read and that write failed. Quote Link to comment
rsbonini Posted April 19, 2021 Author Share Posted April 19, 2021 (edited) Ok, interesting. Thank you for the info, much appreciated. Edited April 19, 2021 by rsbonini Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.