mrcrlee Posted May 8, 2016 Share Posted May 8, 2016 Hello, I have attached my syslog and smartctl reports for the drive in question and a screen capture ( ) of my array configuration. I think the relevant syslog entry starts at Apr 24 20:45:42 This drive also had a similar issue in January, referenced here: http://lime-technology.com/forum/index.php?topic=45375.msg433138 I did reboot the system and ran the smartctl (long report attached) report which reports the second instance of the error but states things have PASSED. I have not restarted the array. I believe there is enough space on 8/9 to put the data there from disk 7, if that is a smart thing to do. I have a precleared drive in slot /dev/sdl if replacing sdh (the bad one) in the array is the best course of action. Any opinions on what the professionals would do is greatly appreciated. -Chris smart.txt syslog.txt Link to comment
JorgeB Posted May 8, 2016 Share Posted May 8, 2016 SMART look ok but there are some recent sector errors that would make me replace this disk: 9 Power_On_Hours 0x0012 095 095 000 Old_age Always - 39564 Error 2 occurred at disk power-on lifetime: 39264 hours (1636 days + 0 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 08 90 4e 1f 02 Error: UNC 8 sectors at LBA = 0x021f4e90 = 35606160 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 08 90 4e 1f e0 00 2d+01:50:22.486 READ DMA EXT ca 00 08 58 1d 00 e0 00 2d+01:50:19.012 WRITE DMA c8 00 08 58 1d 00 e0 00 2d+01:50:19.012 READ DMA ea 00 00 00 00 00 a0 00 2d+01:50:19.002 FLUSH CACHE EXT ca 00 98 c0 1c 00 e0 00 2d+01:50:19.002 WRITE DMA Error 1 occurred at disk power-on lifetime: 36555 hours (1523 days + 3 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 09 67 e6 40 01 Error: UNC 9 sectors at LBA = 0x0140e667 = 21030503 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 18 58 e6 40 e1 00 05:41:15.646 READ DMA c8 00 08 50 e6 40 e1 00 05:41:15.646 READ DMA c8 00 20 30 e6 40 e1 00 05:41:15.646 READ DMA c8 00 08 28 e6 40 e1 00 05:41:15.645 READ DMA c8 00 08 20 e6 40 e1 00 05:41:15.645 READ DMA Link to comment
mrcrlee Posted May 8, 2016 Author Share Posted May 8, 2016 SMART look ok but there are some recent sector errors that would make me replace this disk: Thank you for the assessment and advice. The unused drive, /dev/sdl, is precleared and ready to use. Is this the proper protocol to follow? 1. Stop the array 2. Unassign the old drive from disk 7 (/dev/sdh). 3. Assign the new drive in the slot of the old drive (it is already installed and precleared) 4. Go to the Main -> Array Operation section 5. Put a check in the Yes, I'm sure checkbox (next to the information indicating the drive will be rebuilt), and click the Start button The rebuild will begin, with hefty disk activity on all drives, lots of writes on the new drive and lots of reads on all other drives All of the contents of the old drive will be copied onto the new drive, making it an exact replacement, except possibly with more capacity than the old drive. Link to comment
JorgeB Posted May 8, 2016 Share Posted May 8, 2016 Looks ok, before starting the rebuild check SMART for the other disks, make sure there are no pending sectors, and keep the old disk intact until the rebuild is finished. Link to comment
mrcrlee Posted May 8, 2016 Author Share Posted May 8, 2016 Looks ok, before starting the rebuild check SMART for the other disks, make sure there are no pending sectors, and keep the old disk intact until the rebuild is finished. Thank you. Is the pending sector check a long test or short test? Link to comment
JorgeB Posted May 8, 2016 Share Posted May 8, 2016 Just check the SMART attributes: SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0 2 Throughput_Performance 0x0005 134 134 054 Pre-fail Offline - 111 3 Spin_Up_Time 0x0007 125 125 024 Pre-fail Always - 559 (Average 554) 4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 3085 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 132 132 020 Pre-fail Offline - 32 9 Power_On_Hours 0x0012 095 095 000 Old_age Always - 39564 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 39 192 Power-Off_Retract_Count 0x0032 098 098 000 Old_age Always - 3144 193 Load_Cycle_Count 0x0012 098 098 000 Old_age Always - 3144 194 Temperature_Celsius 0x0002 253 253 000 Old_age Always - 23 (Min/Max 11/33) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0 Link to comment
mrcrlee Posted May 8, 2016 Author Share Posted May 8, 2016 Alright. I checked all disk smartctl attributes and for all disks, 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0. I will move forward with the replacement and rebuild. On another note, is there a threshold for amount of "power on hours" you typically use before you replace a disk? Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.