Red balled drive advice


Recommended Posts

Hi

I had a drive redballed yesterday.

The syslog contains the following lines (the redballed drive is sdb):

 

Mar 18 15:51:41 shortie kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Mar 18 15:51:41 shortie kernel: ata2.00: failed command: READ DMA EXT
Mar 18 15:51:41 shortie kernel: ata2.00: cmd 25/00:c0:c8:e7:f3/00:00:5a:01:00/e0 tag 0 dma 98304 in
Mar 18 15:51:41 shortie kernel:          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Mar 18 15:51:41 shortie kernel: ata2.00: status: { DRDY }
Mar 18 15:51:41 shortie kernel: ata2: hard resetting link
Mar 18 15:51:51 shortie kernel: ata2: softreset failed (device not ready)
Mar 18 15:51:51 shortie kernel: ata2: hard resetting link
Mar 18 15:52:01 shortie kernel: ata2: softreset failed (device not ready)
Mar 18 15:52:01 shortie kernel: ata2: hard resetting link
Mar 18 15:52:12 shortie kernel: ata2: link is slow to respond, please be patient (ready=0)
Mar 18 15:52:36 shortie kernel: ata2: softreset failed (device not ready)
Mar 18 15:52:36 shortie kernel: ata2: limiting SATA link speed to 1.5 Gbps
Mar 18 15:52:36 shortie kernel: ata2: hard resetting link
Mar 18 15:52:41 shortie kernel: ata2: softreset failed (device not ready)
Mar 18 15:52:41 shortie kernel: ata2: reset failed, giving up
Mar 18 15:52:41 shortie kernel: ata2.00: disabled
Mar 18 15:52:41 shortie kernel: ata2.00: device reported invalid CHS sector 0
Mar 18 15:52:41 shortie kernel: ata2: EH complete
Mar 18 15:52:41 shortie kernel: sd 1:0:0:0: [sdb] Unhandled error code
Mar 18 15:52:41 shortie kernel: sd 1:0:0:0: [sdb]  Result: hostbyte=0x04 driverbyte=0x00
Mar 18 15:52:41 shortie kernel: sd 1:0:0:0: [sdb] CDB: cdb[0]=0x88: 88 00 00 00 00 01 5a f3 e7 c8 00 00 00 c0 00 00
Mar 18 15:52:41 shortie kernel: end_request: I/O error, dev sdb, sector 5820901320
Mar 18 15:52:41 shortie kernel: sd 1:0:0:0: [sdb] Unhandled error code
Mar 18 15:52:41 shortie kernel: sd 1:0:0:0: [sdb]  Result: hostbyte=0x04 driverbyte=0x00
Mar 18 15:52:41 shortie kernel: sd 1:0:0:0: [sdb] CDB: cdb[0]=0x88: 88 00 00 00 00 01 5a f3 e8 88 00 00 00 48 00 00
Mar 18 15:52:41 shortie kernel: end_request: I/O error, dev sdb, sector 5820901512
Mar 18 15:52:41 shortie kernel: sd 1:0:0:0: [sdb] Unhandled error code
Mar 18 15:52:41 shortie kernel: sd 1:0:0:0: [sdb]  Result: hostbyte=0x04 driverbyte=0x00
Mar 18 15:52:41 shortie kernel: sd 1:0:0:0: [sdb] CDB: cdb[0]=0x8a: 8a 00 00 00 00 01 5a f3 e7 b8 00 00 00 10 00 00
Mar 18 15:52:41 shortie kernel: end_request: I/O error, dev sdb, sector 5820901304
Mar 18 15:52:41 shortie kernel: md: disk1 read error
Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901256/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 read error
Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901264/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 read error
Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901272/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 read error
Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901280/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 read error
Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901288/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 read error
Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901296/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 read error
Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901304/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 read error
Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901312/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 read error
Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901320/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 read error
Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901328/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 read error
Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901336/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 read error
Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901344/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 read error
Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901352/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 read error
Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901360/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 read error
Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901368/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 read error
Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901376/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 read error
Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901384/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 read error
Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901392/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 read error
Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901400/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 read error
Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901408/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 read error
Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901416/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 read error
Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901424/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 read error
Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901432/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 read error
Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901440/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 read error
Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901448/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 read error
Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901456/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 read error
Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901464/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 read error
Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901472/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 read error
Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901480/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 read error
Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901488/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 read error
Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901496/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 read error
Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901504/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 read error
Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901512/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 write error
Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901240/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 write error
Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901248/1, count: 1
Mar 18 15:52:41 shortie kernel: md: recovery thread woken up ...
Mar 18 15:52:41 shortie kernel: md: recovery thread has nothing to resync
Mar 18 15:52:41 shortie kernel: sd 1:0:0:0: [sdb] Unhandled error code
Mar 18 15:52:41 shortie kernel: sd 1:0:0:0: [sdb]  Result: hostbyte=0x04 driverbyte=0x00
Mar 18 15:52:41 shortie kernel: sd 1:0:0:0: [sdb] CDB: cdb[0]=0x8a: 8a 00 00 00 00 01 5a f3 e7 c8 00 00 01 08 00 00
Mar 18 15:52:41 shortie kernel: end_request: I/O error, dev sdb, sector 5820901320
Mar 18 15:52:41 shortie kernel: md: disk1 write error
Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901256/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 write error
Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901264/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 write error
Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901272/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 write error
Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901280/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 write error
Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901288/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 write error
Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901296/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 write error
Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901304/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 write error
Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901312/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 write error
Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901320/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 write error
Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901328/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 write error
Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901336/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 write error
Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901344/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 write error
Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901352/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 write error
Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901360/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 write error
Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901368/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 write error
Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901376/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 write error
Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901384/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 write error
Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901392/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 write error
Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901400/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 write error
Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901408/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 write error
Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901416/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 write error
Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901424/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 write error
Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901432/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 write error
Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901440/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 write error
Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901448/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 write error
Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901456/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 write error
Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901464/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 write error
Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901472/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 write error
Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901480/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 write error
Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901488/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 write error
Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901496/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 write error
Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901504/1, count: 1
Mar 18 15:52:41 shortie kernel: md: disk1 write error
Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901512/1, count: 1

 

When I tried to run a smart test on the drive it failed with

 

ioctl error: -5

 

A clean powerdown, a quick reseat of the drives and the drive is accessable but obviously still redballed.

 

The smart results look clean to me (but I'm a rookie):

 

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   135   135   054    Pre-fail  Offline      -       86
  3 Spin_Up_Time            0x0007   126   126   024    Pre-fail  Always       -       615 (Average 615)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       967
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   135   135   020    Pre-fail  Offline      -       26
  9 Power_On_Hours          0x0012   098   098   000    Old_age   Always       -       14099
10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       63
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       1085
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       1085
194 Temperature_Celsius     0x0002   176   176   000    Old_age   Always       -       34 (Min/Max 22/47)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

 

I've bought a couple of replacement drives (one to rebuild onto tonite; one to preclear and keep as a hotswap). I'll obviously stress test the the redballed drive when the array is protected again but given the above is this likely a drive issue or a problem with the PC/controller?

 

The server is built in a HP proliant microserver (36L) disks are Hitachi 7K3000s. Unraid is 5.0 beta 8.

 

Full syslog attached as zipfile.

 

Thanks

 

Eric

syslog-20130318-203007.zip

Link to comment

Thanks, I did a couple of long smart tests and the results looked identical to me. I've swapped out the drive (for a Toshiba 3Tb equivalent) and the array is fault tolerant again.

 

Am now preclearing a second new 3Tb disk to keep in a box as a spare and the original is on a 4x preclear stress test in my backup server.

 

As an aside I'm noticing how much quicker the new 1Tb platter drives at sequental reads and writes are than my Hitachi 7K3000s; its really visible on the preclears which I started about the same time.

 

What is ioctl error? google doesn't seem to help much.

 

Eric

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.