Jump to content

Syslog errors - controller or drive problem? preclear isssues, too.


Recommended Posts

So for those of you that have been following along with my saga, here is the next installment.

 

Short summary- had a red-balled disk (call it sdn), copied data from it and was running it through preclear when system hung. After reboot, now 2 red-balled disks. Second disk is FUBAR, cannot be read by any machine on any connection. Reinitialized array without these 2 disks, rebuilt parity all is happy. 2 new HD to replace red-balled disks. Preclearing 3 disks right now, 2 new ones (sdl and sdm) and the original red-ball (sdn). All drives 2TB.

 

2 things of note:

 

1. two of the preclears are MUCH slower than the other one. For example this morning, all 3 drives are on step 2 (clearing the disk). But, sdm, and sdn are at 35% complete, whereas sdl is at 94%.

 

2. there are a ton of (seemingly) drive related errors at the end of the syslog. I am hoping they are related to sdn as I would expec that drive to be in a strange state after the system hung during preclear. But, I'm not exactly sure how to decipher the errors. A snippet is in this code box, and the whole syslog is attached.

 

Oct 16 03:17:18 Tower kernel: ata11.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 (Errors)
Oct 16 03:17:18 Tower kernel: ata11.01: failed command: READ DMA EXT (Minor Issues)
Oct 16 03:17:18 Tower kernel: ata11.01: cmd 25/00:00:00:81:f6/00:01:8c:00:00/f0 tag 0 dma 131072 in (Drive related)
Oct 16 03:17:18 Tower kernel:          res 51/40:5f:98:81:f6/40:00:8c:00:00/f0 Emask 0x9 (media error) (Errors)
Oct 16 03:17:18 Tower kernel: ata11.01: status: { DRDY ERR } (Drive related)
Oct 16 03:17:18 Tower kernel: ata11.01: error: { UNC } (Errors)
Oct 16 03:17:18 Tower kernel: ata11.00: configured for UDMA/133 (Drive related)
Oct 16 03:17:18 Tower kernel: ata11.01: configured for UDMA/133 (Drive related)
Oct 16 03:17:18 Tower kernel: ata11: EH complete (Drive related)
Oct 16 03:17:21 Tower kernel: ata11.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 (Errors)
Oct 16 03:17:21 Tower kernel: ata11.01: failed command: READ DMA EXT (Minor Issues)
Oct 16 03:17:21 Tower kernel: ata11.01: cmd 25/00:00:00:81:f6/00:01:8c:00:00/f0 tag 0 dma 131072 in (Drive related)
Oct 16 03:17:21 Tower kernel:          res 51/40:5f:a0:81:f6/40:00:8c:00:00/f0 Emask 0x9 (media error) (Errors)
Oct 16 03:17:21 Tower kernel: ata11.01: status: { DRDY ERR } (Drive related)
Oct 16 03:17:21 Tower kernel: ata11.01: error: { UNC } (Errors)
Oct 16 03:17:21 Tower kernel: ata11.00: configured for UDMA/133 (Drive related)
Oct 16 03:17:21 Tower kernel: ata11.01: configured for UDMA/133 (Drive related)
Oct 16 03:17:21 Tower kernel: ata11: EH complete (Drive related)
Oct 16 03:17:24 Tower kernel: ata11.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 (Errors)
Oct 16 03:17:24 Tower kernel: ata11.01: failed command: READ DMA EXT (Minor Issues)
Oct 16 03:17:24 Tower kernel: ata11.01: cmd 25/00:00:00:81:f6/00:01:8c:00:00/f0 tag 0 dma 131072 in (Drive related)
Oct 16 03:17:24 Tower kernel:          res 51/40:5f:a0:81:f6/40:00:8c:00:00/f0 Emask 0x9 (media error) (Errors)
Oct 16 03:17:24 Tower kernel: ata11.01: status: { DRDY ERR } (Drive related)
Oct 16 03:17:24 Tower kernel: ata11.01: error: { UNC } (Errors)
Oct 16 03:17:24 Tower kernel: ata11.00: configured for UDMA/133 (Drive related)
Oct 16 03:17:24 Tower kernel: ata11.01: configured for UDMA/133 (Drive related)
Oct 16 03:17:24 Tower kernel: ata11: EH complete (Drive related)
Oct 16 03:17:27 Tower kernel: ata11.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 (Errors)
Oct 16 03:17:27 Tower kernel: ata11.01: failed command: READ DMA EXT (Minor Issues)
Oct 16 03:17:27 Tower kernel: ata11.01: cmd 25/00:00:00:81:f6/00:01:8c:00:00/f0 tag 0 dma 131072 in (Drive related)
Oct 16 03:17:27 Tower kernel:          res 51/40:5f:a0:81:f6/40:00:8c:00:00/f0 Emask 0x9 (media error) (Errors)
Oct 16 03:17:27 Tower kernel: ata11.01: status: { DRDY ERR } (Drive related)
Oct 16 03:17:27 Tower kernel: ata11.01: error: { UNC } (Errors)
Oct 16 03:17:27 Tower kernel: ata11.00: configured for UDMA/133 (Drive related)
Oct 16 03:17:27 Tower kernel: ata11.01: configured for UDMA/133 (Drive related)
Oct 16 03:17:27 Tower kernel: ata11: EH complete (Drive related)
Oct 16 03:17:30 Tower kernel: ata11.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 (Errors)
Oct 16 03:17:30 Tower kernel: ata11.01: failed command: READ DMA EXT (Minor Issues)
Oct 16 03:17:30 Tower kernel: ata11.01: cmd 25/00:00:00:81:f6/00:01:8c:00:00/f0 tag 0 dma 131072 in (Drive related)
Oct 16 03:17:30 Tower kernel:          res 51/40:5f:98:81:f6/40:00:8c:00:00/f0 Emask 0x9 (media error) (Errors)
Oct 16 03:17:30 Tower kernel: ata11.01: status: { DRDY ERR } (Drive related)
Oct 16 03:17:30 Tower kernel: ata11.01: error: { UNC } (Errors)
Oct 16 03:17:30 Tower kernel: ata11.00: configured for UDMA/133 (Drive related)
Oct 16 03:17:30 Tower kernel: ata11.01: configured for UDMA/133 (Drive related)
Oct 16 03:17:30 Tower kernel: sd 4:0:1:0: [sdn] Unhandled sense code (Drive related)
Oct 16 03:17:30 Tower kernel: sd 4:0:1:0: [sdn] Result: hostbyte=0x00 driverbyte=0x08 (System)
Oct 16 03:17:30 Tower kernel: sd 4:0:1:0: [sdn] Sense Key : 0x3 [current] [descriptor] (Drive related)
Oct 16 03:17:30 Tower kernel: Descriptor sense data with sense descriptors (in hex):
Oct 16 03:17:30 Tower kernel:         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
Oct 16 03:17:30 Tower kernel:         8c f6 81 98 
Oct 16 03:17:30 Tower kernel: sd 4:0:1:0: [sdn] ASC=0x11 ASCQ=0x4 (Drive related)
Oct 16 03:17:30 Tower kernel: sd 4:0:1:0: [sdn] CDB: cdb[0]=0x28: 28 00 8c f6 81 00 00 01 00 00 (Drive related)
Oct 16 03:17:30 Tower kernel: end_request: I/O error, dev sdn, sector 2364965272 (Errors)
Oct 16 03:17:30 Tower kernel: Buffer I/O error on device sdn, logical block 295620659 (Errors)
Oct 16 03:17:30 Tower kernel: Buffer I/O error on device sdn, logical block 295620660 (Errors)
Oct 16 03:17:30 Tower kernel: Buffer I/O error on device sdn, logical block 295620661 (Errors)
Oct 16 03:17:30 Tower kernel: Buffer I/O error on device sdn, logical block 295620662 (Errors)
Oct 16 03:17:30 Tower kernel: Buffer I/O error on device sdn, logical block 295620663 (Errors)
Oct 16 03:17:30 Tower kernel: Buffer I/O error on device sdn, logical block 295620664 (Errors)
Oct 16 03:17:30 Tower kernel: Buffer I/O error on device sdn, logical block 295620665 (Errors)
Oct 16 03:17:30 Tower kernel: Buffer I/O error on device sdn, logical block 295620666 (Errors)
Oct 16 03:17:30 Tower kernel: Buffer I/O error on device sdn, logical block 295620667 (Errors)
Oct 16 03:17:30 Tower kernel: Buffer I/O error on device sdn, logical block 295620668 (Errors)
Oct 16 03:17:30 Tower kernel: ata11: EH complete (Drive related)
Oct 16 03:17:33 Tower kernel: ata11.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 (Errors)
Oct 16 03:17:33 Tower kernel: ata11.01: failed command: READ DMA EXT (Minor Issues)
Oct 16 03:17:33 Tower kernel: ata11.01: cmd 25/00:08:98:81:f6/00:00:8c:00:00/f0 tag 0 dma 4096 in (Drive related)
Oct 16 03:17:33 Tower kernel:          res 51/40:08:98:81:f6/40:00:8c:00:00/f0 Emask 0x9 (media error) (Errors)
Oct 16 03:17:33 Tower kernel: ata11.01: status: { DRDY ERR } (Drive related)
Oct 16 03:17:33 Tower kernel: ata11.01: error: { UNC } (Errors)
Oct 16 03:17:33 Tower kernel: ata11.00: configured for UDMA/133 (Drive related)
Oct 16 03:17:33 Tower kernel: ata11.01: configured for UDMA/133 (Drive related)
Oct 16 03:17:33 Tower kernel: ata11: EH complete (Drive related)

syslog-2011-10-16.txt.zip

Link to comment

In my reading of the syslog these errors seem to be all related to sdn, correct? Is it worth preclearing this drive or are these errors the seal of death?

 

Any ideas about the disparate preclear speeds? All are 2TB EARS drives with the same firmware.

We are not mind readers.... Odds are you are not either.... a SMART report of the disk will tell you a lot more of its health.

It could just be a few sectors, and there are typically several thousand available for re-allocation on a 2TB disk these days.  If it is just a few, no issue.  If the number (re-allocated or pending re-allocation) keeps climbing each time you pre-clear it, time to RMA it.

 

As far a disparate times...    disks with issues are dealt with by re-reading/resetting the controller.  That takes time.  Could be that, or could be the one port is attached to a different disk controller chipset.

Link to comment

The media errors are likely the cause of any other problems. Try to pre-clear this drive and then rebuild.

 

Unfortunately, this was one of 2 simultaneous disk failures. No way to rebuild. I did have all data backed up on one of them. This one is just a full loss.

 

2 new drives, and this one are well into the second cycle of preclear, hopefully done tonight.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...