Device is Disabled - Disk3 is in Error


Go to solution Solved by JorgeB,

Recommended Posts

Today I received a warning that disk3 of my array had an error and is now disabled. The below is from the syslog in the diagnostics.

 

Mar  1 14:41:09 Tower kernel: sd 6:0:0:0: [sde] tag#9163 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=19s
Mar  1 14:41:09 Tower kernel: sd 6:0:0:0: [sde] tag#9163 Sense Key : 0x2 [current] 
Mar  1 14:41:09 Tower kernel: sd 6:0:0:0: [sde] tag#9163 ASC=0x4 ASCQ=0x0 
Mar  1 14:41:09 Tower kernel: sd 6:0:0:0: [sde] tag#9163 CDB: opcode=0x88 88 00 00 00 00 03 93 cc 06 60 00 00 00 40 00 00
Mar  1 14:41:09 Tower kernel: I/O error, dev sde, sector 15364523616 op 0x0:(READ) flags 0x0 phys_seg 8 prio class 0
Mar  1 14:41:09 Tower kernel: md: disk3 read error, sector=15364523552
Mar  1 14:41:09 Tower kernel: md: disk3 read error, sector=15364523560
Mar  1 14:41:09 Tower kernel: md: disk3 read error, sector=15364523568
Mar  1 14:41:09 Tower kernel: md: disk3 read error, sector=15364523576
Mar  1 14:41:09 Tower kernel: md: disk3 read error, sector=15364523584
Mar  1 14:41:09 Tower kernel: md: disk3 read error, sector=15364523592
Mar  1 14:41:09 Tower kernel: md: disk3 read error, sector=15364523600
Mar  1 14:41:09 Tower kernel: md: disk3 read error, sector=15364523608
Mar  1 14:41:09 Tower kernel: sd 6:0:0:0: [sde] tag#9164 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=19s
Mar  1 14:41:09 Tower kernel: sd 6:0:0:0: [sde] tag#9164 Sense Key : 0x2 [current] 
Mar  1 14:41:09 Tower kernel: sd 6:0:0:0: [sde] tag#9164 ASC=0x4 ASCQ=0x0 
Mar  1 14:41:09 Tower kernel: sd 6:0:0:0: [sde] tag#9164 CDB: opcode=0x88 88 00 00 00 00 03 93 cc 06 c0 00 00 00 60 00 00
Mar  1 14:41:09 Tower kernel: I/O error, dev sde, sector 15364523712 op 0x0:(READ) flags 0x0 phys_seg 12 prio class 0
Mar  1 14:41:09 Tower kernel: md: disk3 read error, sector=15364523648
Mar  1 14:41:09 Tower kernel: md: disk3 read error, sector=15364523656
Mar  1 14:41:09 Tower kernel: md: disk3 read error, sector=15364523664
Mar  1 14:41:09 Tower kernel: md: disk3 read error, sector=15364523672
Mar  1 14:41:09 Tower kernel: md: disk3 read error, sector=15364523680
Mar  1 14:41:09 Tower kernel: md: disk3 read error, sector=15364523688
Mar  1 14:41:09 Tower kernel: md: disk3 read error, sector=15364523696
Mar  1 14:41:09 Tower kernel: md: disk3 read error, sector=15364523704
Mar  1 14:41:09 Tower kernel: md: disk3 read error, sector=15364523712
Mar  1 14:41:09 Tower kernel: md: disk3 read error, sector=15364523720
Mar  1 14:41:09 Tower kernel: md: disk3 read error, sector=15364523728
Mar  1 14:41:09 Tower kernel: md: disk3 read error, sector=15364523736
Mar  1 14:41:09 Tower  emhttpd: read SMART /dev/sde
Mar  1 14:41:09 Tower kernel: sd 6:0:0:0: [sde] tag#9173 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=0s
Mar  1 14:41:09 Tower kernel: sd 6:0:0:0: [sde] tag#9173 Sense Key : 0x2 [current] 
Mar  1 14:41:09 Tower kernel: sd 6:0:0:0: [sde] tag#9173 ASC=0x4 ASCQ=0x0 
Mar  1 14:41:09 Tower kernel: sd 6:0:0:0: [sde] tag#9173 CDB: opcode=0x8a 8a 00 00 00 00 03 93 cc 06 60 00 00 00 40 00 00
Mar  1 14:41:09 Tower kernel: I/O error, dev sde, sector 15364523616 op 0x1:(WRITE) flags 0x0 phys_seg 8 prio class 0
Mar  1 14:41:09 Tower kernel: md: disk3 write error, sector=15364523552
Mar  1 14:41:09 Tower kernel: md: disk3 write error, sector=15364523560
Mar  1 14:41:09 Tower kernel: md: disk3 write error, sector=15364523568
Mar  1 14:41:09 Tower kernel: md: disk3 write error, sector=15364523576
Mar  1 14:41:09 Tower kernel: md: disk3 write error, sector=15364523584
Mar  1 14:41:09 Tower kernel: md: disk3 write error, sector=15364523592
Mar  1 14:41:09 Tower kernel: md: disk3 write error, sector=15364523600
Mar  1 14:41:09 Tower kernel: md: disk3 write error, sector=15364523608
Mar  1 14:41:09 Tower kernel: sd 6:0:0:0: [sde] tag#9179 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=0s
Mar  1 14:41:09 Tower kernel: sd 6:0:0:0: [sde] tag#9179 Sense Key : 0x2 [current] 
Mar  1 14:41:09 Tower kernel: sd 6:0:0:0: [sde] tag#9179 ASC=0x4 ASCQ=0x0 
Mar  1 14:41:09 Tower kernel: sd 6:0:0:0: [sde] tag#9179 CDB: opcode=0x8a 8a 00 00 00 00 03 93 cc 06 c0 00 00 00 60 00 00
Mar  1 14:41:09 Tower kernel: I/O error, dev sde, sector 15364523712 op 0x1:(WRITE) flags 0x0 phys_seg 12 prio class 0
Mar  1 14:41:09 Tower kernel: md: disk3 write error, sector=15364523648
Mar  1 14:41:09 Tower kernel: md: disk3 write error, sector=15364523656
Mar  1 14:41:09 Tower kernel: md: disk3 write error, sector=15364523664
Mar  1 14:41:09 Tower kernel: md: disk3 write error, sector=15364523672
Mar  1 14:41:09 Tower kernel: md: disk3 write error, sector=15364523680
Mar  1 14:41:09 Tower kernel: md: disk3 write error, sector=15364523688
Mar  1 14:41:09 Tower kernel: md: disk3 write error, sector=15364523696
Mar  1 14:41:09 Tower kernel: md: disk3 write error, sector=15364523704
Mar  1 14:41:09 Tower kernel: md: disk3 write error, sector=15364523712
Mar  1 14:41:09 Tower kernel: md: disk3 write error, sector=15364523720
Mar  1 14:41:09 Tower kernel: md: disk3 write error, sector=15364523728
Mar  1 14:41:09 Tower kernel: md: disk3 write error, sector=15364523736

 

I've also got the SMART results from the diag package, see below and attached.

 

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR--   074   064   044    -    26762032
  3 Spin_Up_Time            PO----   082   080   000    -    0
  4 Start_Stop_Count        -O--CK   099   099   020    -    1587
  5 Reallocated_Sector_Ct   PO--CK   100   100   010    -    0
  7 Seek_Error_Rate         POSR--   082   060   045    -    175723966
  9 Power_On_Hours          -O--CK   080   080   000    -    17990
 10 Spin_Retry_Count        PO--C-   100   100   097    -    0
 12 Power_Cycle_Count       -O--CK   100   100   020    -    97
 18 Head_Health             PO-R--   100   100   050    -    0
187 Reported_Uncorrect      -O--CK   100   100   000    -    0
188 Command_Timeout         -O--CK   099   099   000    -    4295032833
190 Airflow_Temperature_Cel -O---K   066   055   040    -    34 (Min/Max 27/40)
192 Power-Off_Retract_Count -O--CK   100   100   000    -    27
193 Load_Cycle_Count        -O--CK   090   090   000    -    20911
194 Temperature_Celsius     -O---K   034   045   000    -    34 (0 23 0 0 0)
195 Hardware_ECC_Recovered  -O-RC-   074   064   000    -    26762032
197 Current_Pending_Sector  -O--C-   100   100   000    -    0
198 Offline_Uncorrectable   ----C-   100   100   000    -    0
199 UDMA_CRC_Error_Count    -OSRCK   200   200   000    -    0
240 Head_Flying_Hours       ------   100   253   000    -    6332h+12m+55.571s
241 Total_LBAs_Written      ------   100   253   000    -    29052003840
242 Total_LBAs_Read         ------   100   253   000    -    555730377236
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

 

This drive is through an LSI 9207 HBA SAS card with 3 other drives. I've reset the SAS to SATA cables and all power cords. I'm now running an extended SMART test on the drive. If that comes back clean, and with the above info, do you think the drive is OK or should I start an RMA with Seagate? If you think it's still got some life left, what is the process to re-enable the drive? Couldn't find much in the wiki and links in past posts went 404.

ST8000VN004-2M2101-20230301-1548 disk3 (sde) - DISK_DSBL.txt

Link to comment

Very interesting. I'll connect it to the MOBO through the SATA port later today instead of the HBA card. I think I have one port left. wild that it impacts a specific HD model like that. How do i initiate a rebuild using the same drive? The "Replacing a Data Drive" seems to be using a new disk that hasn't been in the array before.
 

Quote

 

The procedure

If you are running a very old version of unRAID, such as v4.7 or older, skip down to the next section.

Stop the array

Unassign the old drive if still assigned (to unassign, set it to No Device)

Power down

[ Optional ] Pull the old drive (you may want to leave it installed for Preclearing or testing)

Install the new drive

Power on

Assign the new drive in the slot of the old drive

Go to the Main -> Array Operation section

Put a check in the Yes, I'm sure checkbox (next to the information indicating the drive will be rebuilt), and click the Start button

The rebuild will begin, with hefty disk activity on all drives, lots of writes on the new drive and lots of reads on all other drives

 

 

I actually picked up a new 16TB drive to increase parity, which would allow me to replace this 8TB with my 14TB parity. But it sounds like i'll need to rebuild the Disk3 before i can replace and rebuild parity.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.