Salzgablah Posted March 2, 2023 Share Posted March 2, 2023 Today I received a warning that disk3 of my array had an error and is now disabled. The below is from the syslog in the diagnostics. Mar 1 14:41:09 Tower kernel: sd 6:0:0:0: [sde] tag#9163 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=19s Mar 1 14:41:09 Tower kernel: sd 6:0:0:0: [sde] tag#9163 Sense Key : 0x2 [current] Mar 1 14:41:09 Tower kernel: sd 6:0:0:0: [sde] tag#9163 ASC=0x4 ASCQ=0x0 Mar 1 14:41:09 Tower kernel: sd 6:0:0:0: [sde] tag#9163 CDB: opcode=0x88 88 00 00 00 00 03 93 cc 06 60 00 00 00 40 00 00 Mar 1 14:41:09 Tower kernel: I/O error, dev sde, sector 15364523616 op 0x0:(READ) flags 0x0 phys_seg 8 prio class 0 Mar 1 14:41:09 Tower kernel: md: disk3 read error, sector=15364523552 Mar 1 14:41:09 Tower kernel: md: disk3 read error, sector=15364523560 Mar 1 14:41:09 Tower kernel: md: disk3 read error, sector=15364523568 Mar 1 14:41:09 Tower kernel: md: disk3 read error, sector=15364523576 Mar 1 14:41:09 Tower kernel: md: disk3 read error, sector=15364523584 Mar 1 14:41:09 Tower kernel: md: disk3 read error, sector=15364523592 Mar 1 14:41:09 Tower kernel: md: disk3 read error, sector=15364523600 Mar 1 14:41:09 Tower kernel: md: disk3 read error, sector=15364523608 Mar 1 14:41:09 Tower kernel: sd 6:0:0:0: [sde] tag#9164 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=19s Mar 1 14:41:09 Tower kernel: sd 6:0:0:0: [sde] tag#9164 Sense Key : 0x2 [current] Mar 1 14:41:09 Tower kernel: sd 6:0:0:0: [sde] tag#9164 ASC=0x4 ASCQ=0x0 Mar 1 14:41:09 Tower kernel: sd 6:0:0:0: [sde] tag#9164 CDB: opcode=0x88 88 00 00 00 00 03 93 cc 06 c0 00 00 00 60 00 00 Mar 1 14:41:09 Tower kernel: I/O error, dev sde, sector 15364523712 op 0x0:(READ) flags 0x0 phys_seg 12 prio class 0 Mar 1 14:41:09 Tower kernel: md: disk3 read error, sector=15364523648 Mar 1 14:41:09 Tower kernel: md: disk3 read error, sector=15364523656 Mar 1 14:41:09 Tower kernel: md: disk3 read error, sector=15364523664 Mar 1 14:41:09 Tower kernel: md: disk3 read error, sector=15364523672 Mar 1 14:41:09 Tower kernel: md: disk3 read error, sector=15364523680 Mar 1 14:41:09 Tower kernel: md: disk3 read error, sector=15364523688 Mar 1 14:41:09 Tower kernel: md: disk3 read error, sector=15364523696 Mar 1 14:41:09 Tower kernel: md: disk3 read error, sector=15364523704 Mar 1 14:41:09 Tower kernel: md: disk3 read error, sector=15364523712 Mar 1 14:41:09 Tower kernel: md: disk3 read error, sector=15364523720 Mar 1 14:41:09 Tower kernel: md: disk3 read error, sector=15364523728 Mar 1 14:41:09 Tower kernel: md: disk3 read error, sector=15364523736 Mar 1 14:41:09 Tower emhttpd: read SMART /dev/sde Mar 1 14:41:09 Tower kernel: sd 6:0:0:0: [sde] tag#9173 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=0s Mar 1 14:41:09 Tower kernel: sd 6:0:0:0: [sde] tag#9173 Sense Key : 0x2 [current] Mar 1 14:41:09 Tower kernel: sd 6:0:0:0: [sde] tag#9173 ASC=0x4 ASCQ=0x0 Mar 1 14:41:09 Tower kernel: sd 6:0:0:0: [sde] tag#9173 CDB: opcode=0x8a 8a 00 00 00 00 03 93 cc 06 60 00 00 00 40 00 00 Mar 1 14:41:09 Tower kernel: I/O error, dev sde, sector 15364523616 op 0x1:(WRITE) flags 0x0 phys_seg 8 prio class 0 Mar 1 14:41:09 Tower kernel: md: disk3 write error, sector=15364523552 Mar 1 14:41:09 Tower kernel: md: disk3 write error, sector=15364523560 Mar 1 14:41:09 Tower kernel: md: disk3 write error, sector=15364523568 Mar 1 14:41:09 Tower kernel: md: disk3 write error, sector=15364523576 Mar 1 14:41:09 Tower kernel: md: disk3 write error, sector=15364523584 Mar 1 14:41:09 Tower kernel: md: disk3 write error, sector=15364523592 Mar 1 14:41:09 Tower kernel: md: disk3 write error, sector=15364523600 Mar 1 14:41:09 Tower kernel: md: disk3 write error, sector=15364523608 Mar 1 14:41:09 Tower kernel: sd 6:0:0:0: [sde] tag#9179 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=0s Mar 1 14:41:09 Tower kernel: sd 6:0:0:0: [sde] tag#9179 Sense Key : 0x2 [current] Mar 1 14:41:09 Tower kernel: sd 6:0:0:0: [sde] tag#9179 ASC=0x4 ASCQ=0x0 Mar 1 14:41:09 Tower kernel: sd 6:0:0:0: [sde] tag#9179 CDB: opcode=0x8a 8a 00 00 00 00 03 93 cc 06 c0 00 00 00 60 00 00 Mar 1 14:41:09 Tower kernel: I/O error, dev sde, sector 15364523712 op 0x1:(WRITE) flags 0x0 phys_seg 12 prio class 0 Mar 1 14:41:09 Tower kernel: md: disk3 write error, sector=15364523648 Mar 1 14:41:09 Tower kernel: md: disk3 write error, sector=15364523656 Mar 1 14:41:09 Tower kernel: md: disk3 write error, sector=15364523664 Mar 1 14:41:09 Tower kernel: md: disk3 write error, sector=15364523672 Mar 1 14:41:09 Tower kernel: md: disk3 write error, sector=15364523680 Mar 1 14:41:09 Tower kernel: md: disk3 write error, sector=15364523688 Mar 1 14:41:09 Tower kernel: md: disk3 write error, sector=15364523696 Mar 1 14:41:09 Tower kernel: md: disk3 write error, sector=15364523704 Mar 1 14:41:09 Tower kernel: md: disk3 write error, sector=15364523712 Mar 1 14:41:09 Tower kernel: md: disk3 write error, sector=15364523720 Mar 1 14:41:09 Tower kernel: md: disk3 write error, sector=15364523728 Mar 1 14:41:09 Tower kernel: md: disk3 write error, sector=15364523736 I've also got the SMART results from the diag package, see below and attached. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-- 074 064 044 - 26762032 3 Spin_Up_Time PO---- 082 080 000 - 0 4 Start_Stop_Count -O--CK 099 099 020 - 1587 5 Reallocated_Sector_Ct PO--CK 100 100 010 - 0 7 Seek_Error_Rate POSR-- 082 060 045 - 175723966 9 Power_On_Hours -O--CK 080 080 000 - 17990 10 Spin_Retry_Count PO--C- 100 100 097 - 0 12 Power_Cycle_Count -O--CK 100 100 020 - 97 18 Head_Health PO-R-- 100 100 050 - 0 187 Reported_Uncorrect -O--CK 100 100 000 - 0 188 Command_Timeout -O--CK 099 099 000 - 4295032833 190 Airflow_Temperature_Cel -O---K 066 055 040 - 34 (Min/Max 27/40) 192 Power-Off_Retract_Count -O--CK 100 100 000 - 27 193 Load_Cycle_Count -O--CK 090 090 000 - 20911 194 Temperature_Celsius -O---K 034 045 000 - 34 (0 23 0 0 0) 195 Hardware_ECC_Recovered -O-RC- 074 064 000 - 26762032 197 Current_Pending_Sector -O--C- 100 100 000 - 0 198 Offline_Uncorrectable ----C- 100 100 000 - 0 199 UDMA_CRC_Error_Count -OSRCK 200 200 000 - 0 240 Head_Flying_Hours ------ 100 253 000 - 6332h+12m+55.571s 241 Total_LBAs_Written ------ 100 253 000 - 29052003840 242 Total_LBAs_Read ------ 100 253 000 - 555730377236 ||||||_ K auto-keep |||||__ C event count ||||___ R error rate |||____ S speed/performance ||_____ O updated online |______ P prefailure warning This drive is through an LSI 9207 HBA SAS card with 3 other drives. I've reset the SAS to SATA cables and all power cords. I'm now running an extended SMART test on the drive. If that comes back clean, and with the above info, do you think the drive is OK or should I start an RMA with Seagate? If you think it's still got some life left, what is the process to re-enable the drive? Couldn't find much in the wiki and links in past posts went 404. ST8000VN004-2M2101-20230301-1548 disk3 (sde) - DISK_DSBL.txt Quote Link to comment
JorgeB Posted March 2, 2023 Share Posted March 2, 2023 Please post the diagnostics. Quote Link to comment
Salzgablah Posted March 2, 2023 Author Share Posted March 2, 2023 See attached. The disk in question is disk3 but the SMART file already has DISK_DSBL in the filename. tower-diagnostics-20230301-1548.zip Quote Link to comment
Solution JorgeB Posted March 2, 2023 Solution Share Posted March 2, 2023 It's not logged as disk problem , if possible try connecting that disk (and the other one from the same model) to the onboard SATA controller, this can also help: 1 Quote Link to comment
Salzgablah Posted March 2, 2023 Author Share Posted March 2, 2023 Very interesting. I'll connect it to the MOBO through the SATA port later today instead of the HBA card. I think I have one port left. wild that it impacts a specific HD model like that. How do i initiate a rebuild using the same drive? The "Replacing a Data Drive" seems to be using a new disk that hasn't been in the array before. Quote The procedure If you are running a very old version of unRAID, such as v4.7 or older, skip down to the next section. Stop the array Unassign the old drive if still assigned (to unassign, set it to No Device) Power down [ Optional ] Pull the old drive (you may want to leave it installed for Preclearing or testing) Install the new drive Power on Assign the new drive in the slot of the old drive Go to the Main -> Array Operation section Put a check in the Yes, I'm sure checkbox (next to the information indicating the drive will be rebuilt), and click the Start button The rebuild will begin, with hefty disk activity on all drives, lots of writes on the new drive and lots of reads on all other drives I actually picked up a new 16TB drive to increase parity, which would allow me to replace this 8TB with my 14TB parity. But it sounds like i'll need to rebuild the Disk3 before i can replace and rebuild parity. Quote Link to comment
JorgeB Posted March 2, 2023 Share Posted March 2, 2023 https://wiki.unraid.net/Manual/Storage_Management#Rebuilding_a_drive_onto_itself 1 Quote Link to comment
Salzgablah Posted March 2, 2023 Author Share Posted March 2, 2023 I'll come back here if i need anything else but looks like I've got everything to move forward. Thanks a bunch for your help JorgeB! 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.