errors in syslog - exception Emask 0x10 SAct 0x0 SErr 0x180000 action 0x6 frozen

P_K · March 16, 2011

Hi,

Just noticed several errors in my syslog, I did some research but not exactly sure what it means. Would appreciate if somebody could have a look.

Syslog + smart reports of drive1 and cache drive :

http://dl.dropbox.com/u/3121169/UnraidLogs.zip

Some background : Server is running fine for half a year or so (once a failed disk some months ago but I removed that one). Two days ago, I stopped and powered down my server to add a cache drive to it (150Gb WD Raptor which was gathering some dust in a closet). Maybe important, maybe not : before doing the above I tried to mount that disk but couldn't (always said something about incorrect filesystem, tried ntfs, vfat, msdos). Tried also to connect it to both a Windows PC and a Mac : both failed. I thought there was something on the disk in the past (Windows Vista if I remember well) but not too sure about that. Anyway, I added it as a cache drive and it got formatted...

Now, since this operation, I notice several of following in the syslog :

Mar 15 22:50:53 Tower kernel: ata6: link is slow to respond, please be patient (ready=0) (Drive related)

Mar 15 22:50:54 Tower kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) (Drive related)

Mar 15 22:50:54 Tower kernel: ata6.00: configured for UDMA/133 (Drive related)

Mar 15 22:50:54 Tower kernel: ata6: EH complete (Drive related)

Mar 16 00:14:30 Tower kernel: ata6: exception Emask 0x10 SAct 0x0 SErr 0x180000 action 0x6 frozen (Errors)

Mar 16 00:14:30 Tower kernel: ata6: edma_err_cause=00000020 pp_flags=00000001, SError=00180000 (Errors)

Mar 16 00:14:30 Tower kernel: ata6: SError: { 10B8B Dispar } (Errors)

Mar 16 00:14:30 Tower kernel: ata6: hard resetting link (Minor Issues)

--> I think this ata6 is my cache drive, not sure though.

From a google search this might be related to a bad sata cable. I'm thinking I should maybe open the server again and unplug/replug the sata cable of this disk.

Also, I see now following (which worries me more) :

Mar 16 04:01:14 Tower kernel: md: disk1 read error (Errors)

Mar 16 04:01:14 Tower kernel: handle_stripe read error: 24800/1, count: 1 (Errors)

I've included a smart report for this disk in the file on dropbox. I see indeed several errors in it but not entirely sure how to read this report.

I will wait doing _anything_ on the server for now. Would really appreciate if somebody could have a look and tell me if there's anything I can/should do.

Thanks,

Peter.

P_K · March 17, 2011

24h later now and no additional errors appeared in the log. I'll watch it further but wondering if those errors are something to be concerned about..

P_K · March 21, 2011

Sorry to bump this up but situation has got worse now. This morning I noticed that disk1 got disabled. Checking the syslog I see again a lot of those messages :

Mar 21 01:47:16 Tower kernel: ata6: exception Emask 0x10 SAct 0x0 SErr 0x180000 action 0x6 frozen

Mar 21 01:47:16 Tower kernel: ata6: edma_err_cause=00000020 pp_flags=00000001, SError=00180000

Mar 21 01:47:16 Tower kernel: ata6: SError: { 10B8B Dispar }

Mar 21 01:47:16 Tower kernel: ata6: hard resetting link

Mar 21 01:47:22 Tower kernel: ata6: link is slow to respond, please be patient (ready=0)

Mar 21 01:47:23 Tower kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

Mar 21 01:47:23 Tower kernel: ata6.00: configured for UDMA/133

Mar 21 01:47:23 Tower kernel: ata6: EH complete

Followed by :

Mar 21 03:09:01 Tower kernel: sd 6:0:0:0: [sde] Unhandled error code

Mar 21 03:09:01 Tower kernel: sd 6:0:0:0: [sde] Result: hostbyte=0x04 driverbyte=0x00

Mar 21 03:09:01 Tower kernel: sd 6:0:0:0: [sde] CDB: cdb[0]=0x28: 28 00 00 00 83 df 00 00 28 00

Mar 21 03:09:01 Tower kernel: end_request: I/O error, dev sde, sector 33759