errors in syslog - exception Emask 0x10 SAct 0x0 SErr 0x180000 action 0x6 frozen


Recommended Posts

Hi,

 

Just noticed several errors in my syslog, I did some research but not exactly sure what it means. Would appreciate if somebody could have a look.

 

Syslog + smart reports of drive1 and cache drive :

http://dl.dropbox.com/u/3121169/UnraidLogs.zip

 

Some background : Server is running fine for half a year or so (once a failed disk some months ago but I removed that one).  Two days ago, I stopped and powered down my server to add a cache drive to it (150Gb WD Raptor which was gathering some dust in a closet). Maybe important, maybe not : before doing the above I tried to mount that disk but couldn't (always said something about incorrect filesystem, tried ntfs, vfat, msdos). Tried also to connect it to both a Windows PC and a Mac : both failed. I thought there was something on the disk in the past (Windows Vista if I remember well) but not too sure about that. Anyway, I added it as a cache drive and it got formatted...

 

Now, since this operation, I notice several of following in the syslog :

 

Mar 15 22:50:53 Tower kernel: ata6: link is slow to respond, please be patient (ready=0) (Drive related)

Mar 15 22:50:54 Tower kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) (Drive related)

Mar 15 22:50:54 Tower kernel: ata6.00: configured for UDMA/133 (Drive related)

Mar 15 22:50:54 Tower kernel: ata6: EH complete (Drive related)

Mar 16 00:14:30 Tower kernel: ata6: exception Emask 0x10 SAct 0x0 SErr 0x180000 action 0x6 frozen (Errors)

Mar 16 00:14:30 Tower kernel: ata6: edma_err_cause=00000020 pp_flags=00000001, SError=00180000 (Errors)

Mar 16 00:14:30 Tower kernel: ata6: SError: { 10B8B Dispar } (Errors)

Mar 16 00:14:30 Tower kernel: ata6: hard resetting link (Minor Issues)

 

--> I think this ata6 is my cache drive, not sure though.

From a google search this might be related to a bad sata cable. I'm thinking I should maybe open the server again and unplug/replug the sata cable of this disk.

 

Also, I see now following (which worries me more) :

 

Mar 16 04:01:14 Tower kernel: md: disk1 read error (Errors)

Mar 16 04:01:14 Tower kernel: handle_stripe read error: 24800/1, count: 1 (Errors)

 

I've included a smart report for this disk in the file on dropbox. I see indeed several errors in it but not entirely sure how to read this report.

 

I will wait doing _anything_ on the server for now. Would really appreciate if somebody could have a look and tell me if there's anything I can/should do.

 

Thanks,

Peter.

Link to comment

Sorry to bump this up but situation has got worse now. This morning I noticed that disk1 got disabled. Checking the syslog I see again a lot of those messages  :

 

Mar 21 01:47:16 Tower kernel: ata6: exception Emask 0x10 SAct 0x0 SErr 0x180000 action 0x6 frozen

Mar 21 01:47:16 Tower kernel: ata6: edma_err_cause=00000020 pp_flags=00000001, SError=00180000

Mar 21 01:47:16 Tower kernel: ata6: SError: { 10B8B Dispar }

Mar 21 01:47:16 Tower kernel: ata6: hard resetting link

Mar 21 01:47:22 Tower kernel: ata6: link is slow to respond, please be patient (ready=0)

Mar 21 01:47:23 Tower kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

Mar 21 01:47:23 Tower kernel: ata6.00: configured for UDMA/133

Mar 21 01:47:23 Tower kernel: ata6: EH complete

 

Followed by :

 

Mar 21 03:09:01 Tower kernel: sd 6:0:0:0: [sde] Unhandled error code

Mar 21 03:09:01 Tower kernel: sd 6:0:0:0: [sde] Result: hostbyte=0x04 driverbyte=0x00

Mar 21 03:09:01 Tower kernel: sd 6:0:0:0: [sde] CDB: cdb[0]=0x28: 28 00 00 00 83 df 00 00 28 00

Mar 21 03:09:01 Tower kernel: end_request: I/O error, dev sde, sector 33759

Mar 21 03:09:01 Tower kernel: md: disk1 read error

Mar 21 03:09:01 Tower kernel: handle_stripe read error: 33696/1, count: 1

Mar 21 03:09:01 Tower kernel: md: disk1 read error

Mar 21 03:09:01 Tower kernel: handle_stripe read error: 33704/1, count: 1

Mar 21 03:09:01 Tower kernel: md: disk1 read error

Mar 21 03:09:01 Tower kernel: handle_stripe read error: 33712/1, count: 1

Mar 21 03:09:01 Tower kernel: md: disk1 read error

Mar 21 03:09:01 Tower kernel: handle_stripe read error: 33720/1, count: 1

Mar 21 03:09:01 Tower kernel: md: disk1 read error

Mar 21 03:09:01 Tower kernel: handle_stripe read error: 33728/1, count: 1

Mar 21 03:09:01 Tower kernel: sd 6:0:0:0: [sde] Synchronizing SCSI cache

Mar 21 03:09:01 Tower kernel: sd 6:0:0:0: [sde] Result: hostbyte=0x04 driverbyte=0x00

Mar 21 03:09:01 Tower kernel: sd 6:0:0:0: [sde] Stopping disk

Mar 21 03:09:01 Tower kernel: sd 6:0:0:0: [sde] START_STOP FAILED

Mar 21 03:09:01 Tower kernel: sd 6:0:0:0: [sde] Result: hostbyte=0x04 driverbyte=0x00

Mar 21 03:09:11 Tower emhttp: disk_spinning: open: No such file or directory

 

Full syslog here  :

 

http://dl.dropbox.com/u/3121169/NewUnraidIssue.zip

 

Now, disk1 is disabled as you can see in the screenshot I put in above zip file also.

 

Smart report for that disk gives this :

 

---

smartctl -a -d ata /dev/sde

smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build)

Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

 

Smartctl open device: /dev/sde failed: No such device

---

 

Unraid version is 4.7. There is a smart report from some days ago for both disk1 and the cache disk in the file I uploaded previously :

 

http://dl.dropbox.com/u/3121169/UnraidLogs.zip

 

I have a feeling all is being triggered in the first place by ata6 (which I think is the cache disk, but not sure). Do I just try to remove the cache disk and then keep using disk1 or does disk1 have to be replaced? Would really appreciate some help to this.

Link to comment

Hi,

 

I have a very similar error in my syslog:

 

Mar 22 18:02:08 Tower kernel: ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen (Errors)

Mar 22 18:02:08 Tower kernel: ata6.00: failed command: CHECK POWER MODE (Minor Issues)

Mar 22 18:02:08 Tower kernel: ata6.00: cmd e5/00:00:00:00:00/00:00:00:00:00/00 tag 0 (Drive related)

Mar 22 18:02:08 Tower kernel:          res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) (Errors)

Mar 22 18:02:08 Tower kernel: ata6.00: status: { DRDY } (Drive related)

Mar 22 18:02:08 Tower kernel: ata6: hard resetting link (Minor Issues)

Mar 22 18:02:09 Tower kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) (Drive related)

Mar 22 18:02:09 Tower kernel: ata6.00: configured for UDMA/133 (Drive related)

Mar 22 18:02:09 Tower kernel: ata6: EH complete (Drive related)

 

I can't make much sense of it, but I've seen this error ~6 or 7 times in the 21 days of my syslog.

 

My System:

 

Unraid 4.7

ASUS M4A78L-M

SuperMicro AOC-SASLP-MV8 PCIe4x

Link to comment

Sorry to bump this up but situation has got worse now. This morning I noticed that disk1 got disabled. Checking the syslog I see again a lot of those messages  :

 

Mar 21 01:47:16 Tower kernel: ata6: exception Emask 0x10 SAct 0x0 SErr 0x180000 action 0x6 frozen

Mar 21 01:47:16 Tower kernel: ata6: edma_err_cause=00000020 pp_flags=00000001, SError=00180000

Mar 21 01:47:16 Tower kernel: ata6: SError: { 10B8B Dispar }

Mar 21 01:47:16 Tower kernel: ata6: hard resetting link

Mar 21 01:47:22 Tower kernel: ata6: link is slow to respond, please be patient (ready=0)

Mar 21 01:47:23 Tower kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

Mar 21 01:47:23 Tower kernel: ata6.00: configured for UDMA/133

Mar 21 01:47:23 Tower kernel: ata6: EH complete

 

Followed by :

 

Mar 21 03:09:01 Tower kernel: sd 6:0:0:0: [sde] Unhandled error code

Mar 21 03:09:01 Tower kernel: sd 6:0:0:0: [sde] Result: hostbyte=0x04 driverbyte=0x00

Mar 21 03:09:01 Tower kernel: sd 6:0:0:0: [sde] CDB: cdb[0]=0x28: 28 00 00 00 83 df 00 00 28 00

Mar 21 03:09:01 Tower kernel: end_request: I/O error, dev sde, sector 33759

Mar 21 03:09:01 Tower kernel: md: disk1 read error

Mar 21 03:09:01 Tower kernel: handle_stripe read error: 33696/1, count: 1

Mar 21 03:09:01 Tower kernel: md: disk1 read error

Mar 21 03:09:01 Tower kernel: handle_stripe read error: 33704/1, count: 1

Mar 21 03:09:01 Tower kernel: md: disk1 read error

Mar 21 03:09:01 Tower kernel: handle_stripe read error: 33712/1, count: 1

Mar 21 03:09:01 Tower kernel: md: disk1 read error

Mar 21 03:09:01 Tower kernel: handle_stripe read error: 33720/1, count: 1

Mar 21 03:09:01 Tower kernel: md: disk1 read error

Mar 21 03:09:01 Tower kernel: handle_stripe read error: 33728/1, count: 1

Mar 21 03:09:01 Tower kernel: sd 6:0:0:0: [sde] Synchronizing SCSI cache

Mar 21 03:09:01 Tower kernel: sd 6:0:0:0: [sde] Result: hostbyte=0x04 driverbyte=0x00

Mar 21 03:09:01 Tower kernel: sd 6:0:0:0: [sde] Stopping disk

Mar 21 03:09:01 Tower kernel: sd 6:0:0:0: [sde] START_STOP FAILED

Mar 21 03:09:01 Tower kernel: sd 6:0:0:0: [sde] Result: hostbyte=0x04 driverbyte=0x00

Mar 21 03:09:11 Tower emhttp: disk_spinning: open: No such file or directory

 

Full syslog here  :

 

http://dl.dropbox.com/u/3121169/NewUnraidIssue.zip

 

Now, disk1 is disabled as you can see in the screenshot I put in above zip file also.

 

Smart report for that disk gives this :

 

---

smartctl -a -d ata /dev/sde

smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build)

Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

 

Smartctl open device: /dev/sde failed: No such device

---

 

Unraid version is 4.7. There is a smart report from some days ago for both disk1 and the cache disk in the file I uploaded previously :

 

http://dl.dropbox.com/u/3121169/UnraidLogs.zip

 

I have a feeling all is being triggered in the first place by ata6 (which I think is the cache disk, but not sure). Do I just try to remove the cache disk and then keep using disk1 or does disk1 have to be replaced? Would really appreciate some help to this.

 

ata6 = sde = disk1

 

This disk has failed. RMA it and replace ASAP. If another disk fails in the array you will lose the data.

 

There is one more thing you can try. Shutdown the server. Check all cables to disk1 and/or replace the SATA and power connectors. Restart the server and see if sde returns (try a SMART query). un-assign disk1. If you start the array with disk1 unassigned unRAID will forget about disk1. Stop then restart the array you'll be given the option to rebuild the contents of disk1 will be rebuilt on to disk1. I'd not trust this disk unless it passes another pre-clear before it is reassigned.

Link to comment
  • 7 years later...
13 minutes ago, megna22 said:

I'm having the same problem so what's the answer? Driving me crazy! Here's my log

movies-syslog-20180512-0159.zip

Since the errors are on 3 different ports it's possibly a controller problem, but it can also be any shared cable, like a power splitter or even the PSU itself:

 

May 12 00:00:32 Movies kernel: ata8: exception Emask 0x10 SAct 0x0 SErr 0x1990000 action 0xe frozen
May 12 00:00:32 Movies kernel: ata8: irq_stat 0x00400000, PHY RDY changed
May 12 00:00:32 Movies kernel: ata8: SError: { PHYRdyChg 10B8B Dispar LinkSeq TrStaTrns }
May 12 00:00:32 Movies kernel: ata8: hard resetting link
May 12 00:00:32 Movies kernel: ata9: exception Emask 0x10 SAct 0x0 SErr 0x1910000 action 0xe frozen
May 12 00:00:32 Movies kernel: ata9: irq_stat 0x00400000, PHY RDY changed
May 12 00:00:32 Movies kernel: ata9: SError: { PHYRdyChg Dispar LinkSeq TrStaTrns }
May 12 00:00:32 Movies kernel: ata9: hard resetting link
May 12 00:00:32 Movies kernel: ata7: exception Emask 0x10 SAct 0x0 SErr 0x1990000 action 0xe frozen
May 12 00:00:32 Movies kernel: ata7: irq_stat 0x00400000, PHY RDY changed
May 12 00:00:32 Movies kernel: ata7: SError: { PHYRdyChg 10B8B Dispar LinkSeq TrStaTrns }
May 12 00:00:32 Movies kernel: ata7: hard resetting link
May 12 00:00:38 Movies kernel: ata7: link is slow to respond, please be patient (ready=0)
May 12 00:00:38 Movies kernel: ata8: link is slow to respond, please be patient (ready=0)
May 12 00:00:38 Movies kernel: ata9: link is slow to respond, please be patient (ready=0)
May 12 00:00:42 Movies kernel: ata8: COMRESET failed (errno=-16)
May 12 00:00:42 Movies kernel: ata8: hard resetting link
May 12 00:00:42 Movies kernel: ata9: COMRESET failed (errno=-16)
May 12 00:00:42 Movies kernel: ata9: hard resetting link
May 12 00:00:42 Movies kernel: ata7: COMRESET failed (errno=-16)
May 12 00:00:42 Movies kernel: ata7: hard resetting link
May 12 00:00:42 Movies kernel: ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
May 12 00:00:43 Movies kernel: ata8.00: configured for UDMA/133
May 12 00:00:43 Movies kernel: ata8: EH complete
May 12 00:00:48 Movies kernel: ata9: link is slow to respond, please be patient (ready=0)
May 12 00:00:48 Movies kernel: ata7: link is slow to respond, please be patient (ready=0)
May 12 00:00:49 Movies kernel: ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
May 12 00:00:49 Movies kernel: ata9.00: configured for UDMA/133
May 12 00:00:49 Movies kernel: ata9: EH complete
May 12 00:00:51 Movies kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
May 12 00:00:51 Movies kernel: ata7.00: configured for UDMA/133
May 12 00:00:51 Movies kernel: ata7: EH complete

 

Link to comment

Johnnie I know it's a controller problem. I have a sata controller card PCI that was conflicting with the controller in the bios. I adjusted  some settings and fixed most of the problem. I just haven't found the thing causing that dam error. The worse part is some times it makes unraid think a drive is bad and throws it out.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.