P_K Posted March 16, 2011 Share Posted March 16, 2011 Hi, Just noticed several errors in my syslog, I did some research but not exactly sure what it means. Would appreciate if somebody could have a look. Syslog + smart reports of drive1 and cache drive : http://dl.dropbox.com/u/3121169/UnraidLogs.zip Some background : Server is running fine for half a year or so (once a failed disk some months ago but I removed that one). Two days ago, I stopped and powered down my server to add a cache drive to it (150Gb WD Raptor which was gathering some dust in a closet). Maybe important, maybe not : before doing the above I tried to mount that disk but couldn't (always said something about incorrect filesystem, tried ntfs, vfat, msdos). Tried also to connect it to both a Windows PC and a Mac : both failed. I thought there was something on the disk in the past (Windows Vista if I remember well) but not too sure about that. Anyway, I added it as a cache drive and it got formatted... Now, since this operation, I notice several of following in the syslog : Mar 15 22:50:53 Tower kernel: ata6: link is slow to respond, please be patient (ready=0) (Drive related) Mar 15 22:50:54 Tower kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) (Drive related) Mar 15 22:50:54 Tower kernel: ata6.00: configured for UDMA/133 (Drive related) Mar 15 22:50:54 Tower kernel: ata6: EH complete (Drive related) Mar 16 00:14:30 Tower kernel: ata6: exception Emask 0x10 SAct 0x0 SErr 0x180000 action 0x6 frozen (Errors) Mar 16 00:14:30 Tower kernel: ata6: edma_err_cause=00000020 pp_flags=00000001, SError=00180000 (Errors) Mar 16 00:14:30 Tower kernel: ata6: SError: { 10B8B Dispar } (Errors) Mar 16 00:14:30 Tower kernel: ata6: hard resetting link (Minor Issues) --> I think this ata6 is my cache drive, not sure though. From a google search this might be related to a bad sata cable. I'm thinking I should maybe open the server again and unplug/replug the sata cable of this disk. Also, I see now following (which worries me more) : Mar 16 04:01:14 Tower kernel: md: disk1 read error (Errors) Mar 16 04:01:14 Tower kernel: handle_stripe read error: 24800/1, count: 1 (Errors) I've included a smart report for this disk in the file on dropbox. I see indeed several errors in it but not entirely sure how to read this report. I will wait doing _anything_ on the server for now. Would really appreciate if somebody could have a look and tell me if there's anything I can/should do. Thanks, Peter. Link to comment
P_K Posted March 17, 2011 Author Share Posted March 17, 2011 24h later now and no additional errors appeared in the log. I'll watch it further but wondering if those errors are something to be concerned about.. Link to comment
P_K Posted March 21, 2011 Author Share Posted March 21, 2011 Sorry to bump this up but situation has got worse now. This morning I noticed that disk1 got disabled. Checking the syslog I see again a lot of those messages : Mar 21 01:47:16 Tower kernel: ata6: exception Emask 0x10 SAct 0x0 SErr 0x180000 action 0x6 frozen Mar 21 01:47:16 Tower kernel: ata6: edma_err_cause=00000020 pp_flags=00000001, SError=00180000 Mar 21 01:47:16 Tower kernel: ata6: SError: { 10B8B Dispar } Mar 21 01:47:16 Tower kernel: ata6: hard resetting link Mar 21 01:47:22 Tower kernel: ata6: link is slow to respond, please be patient (ready=0) Mar 21 01:47:23 Tower kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Mar 21 01:47:23 Tower kernel: ata6.00: configured for UDMA/133 Mar 21 01:47:23 Tower kernel: ata6: EH complete Followed by : Mar 21 03:09:01 Tower kernel: sd 6:0:0:0: [sde] Unhandled error code Mar 21 03:09:01 Tower kernel: sd 6:0:0:0: [sde] Result: hostbyte=0x04 driverbyte=0x00 Mar 21 03:09:01 Tower kernel: sd 6:0:0:0: [sde] CDB: cdb[0]=0x28: 28 00 00 00 83 df 00 00 28 00 Mar 21 03:09:01 Tower kernel: end_request: I/O error, dev sde, sector 33759 Mar 21 03:09:01 Tower kernel: md: disk1 read error Mar 21 03:09:01 Tower kernel: handle_stripe read error: 33696/1, count: 1 Mar 21 03:09:01 Tower kernel: md: disk1 read error Mar 21 03:09:01 Tower kernel: handle_stripe read error: 33704/1, count: 1 Mar 21 03:09:01 Tower kernel: md: disk1 read error Mar 21 03:09:01 Tower kernel: handle_stripe read error: 33712/1, count: 1 Mar 21 03:09:01 Tower kernel: md: disk1 read error Mar 21 03:09:01 Tower kernel: handle_stripe read error: 33720/1, count: 1 Mar 21 03:09:01 Tower kernel: md: disk1 read error Mar 21 03:09:01 Tower kernel: handle_stripe read error: 33728/1, count: 1 Mar 21 03:09:01 Tower kernel: sd 6:0:0:0: [sde] Synchronizing SCSI cache Mar 21 03:09:01 Tower kernel: sd 6:0:0:0: [sde] Result: hostbyte=0x04 driverbyte=0x00 Mar 21 03:09:01 Tower kernel: sd 6:0:0:0: [sde] Stopping disk Mar 21 03:09:01 Tower kernel: sd 6:0:0:0: [sde] START_STOP FAILED Mar 21 03:09:01 Tower kernel: sd 6:0:0:0: [sde] Result: hostbyte=0x04 driverbyte=0x00 Mar 21 03:09:11 Tower emhttp: disk_spinning: open: No such file or directory Full syslog here : http://dl.dropbox.com/u/3121169/NewUnraidIssue.zip Now, disk1 is disabled as you can see in the screenshot I put in above zip file also. Smart report for that disk gives this : --- smartctl -a -d ata /dev/sde smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build) Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net Smartctl open device: /dev/sde failed: No such device --- Unraid version is 4.7. There is a smart report from some days ago for both disk1 and the cache disk in the file I uploaded previously : http://dl.dropbox.com/u/3121169/UnraidLogs.zip I have a feeling all is being triggered in the first place by ata6 (which I think is the cache disk, but not sure). Do I just try to remove the cache disk and then keep using disk1 or does disk1 have to be replaced? Would really appreciate some help to this. Link to comment
dgaschk Posted March 21, 2011 Share Posted March 21, 2011 Doo not remove the cache disk while disk1 is disabled. The cache drive is required to reconstruct the contents of disk1. Link to comment
Joe L. Posted March 21, 2011 Share Posted March 21, 2011 Doo not remove the cache disk while disk1 is disabled. The cache drive is required to reconstruct the contents of disk1. that is not an accurate statement. the cache drive is not part of the parity calculations at all. Link to comment
3doubled Posted March 23, 2011 Share Posted March 23, 2011 Hi, I have a very similar error in my syslog: Mar 22 18:02:08 Tower kernel: ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen (Errors) Mar 22 18:02:08 Tower kernel: ata6.00: failed command: CHECK POWER MODE (Minor Issues) Mar 22 18:02:08 Tower kernel: ata6.00: cmd e5/00:00:00:00:00/00:00:00:00:00/00 tag 0 (Drive related) Mar 22 18:02:08 Tower kernel: res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) (Errors) Mar 22 18:02:08 Tower kernel: ata6.00: status: { DRDY } (Drive related) Mar 22 18:02:08 Tower kernel: ata6: hard resetting link (Minor Issues) Mar 22 18:02:09 Tower kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) (Drive related) Mar 22 18:02:09 Tower kernel: ata6.00: configured for UDMA/133 (Drive related) Mar 22 18:02:09 Tower kernel: ata6: EH complete (Drive related) I can't make much sense of it, but I've seen this error ~6 or 7 times in the 21 days of my syslog. My System: Unraid 4.7 ASUS M4A78L-M SuperMicro AOC-SASLP-MV8 PCIe4x Link to comment
dgaschk Posted March 24, 2011 Share Posted March 24, 2011 Sorry to bump this up but situation has got worse now. This morning I noticed that disk1 got disabled. Checking the syslog I see again a lot of those messages : Mar 21 01:47:16 Tower kernel: ata6: exception Emask 0x10 SAct 0x0 SErr 0x180000 action 0x6 frozen Mar 21 01:47:16 Tower kernel: ata6: edma_err_cause=00000020 pp_flags=00000001, SError=00180000 Mar 21 01:47:16 Tower kernel: ata6: SError: { 10B8B Dispar } Mar 21 01:47:16 Tower kernel: ata6: hard resetting link Mar 21 01:47:22 Tower kernel: ata6: link is slow to respond, please be patient (ready=0) Mar 21 01:47:23 Tower kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Mar 21 01:47:23 Tower kernel: ata6.00: configured for UDMA/133 Mar 21 01:47:23 Tower kernel: ata6: EH complete Followed by : Mar 21 03:09:01 Tower kernel: sd 6:0:0:0: [sde] Unhandled error code Mar 21 03:09:01 Tower kernel: sd 6:0:0:0: [sde] Result: hostbyte=0x04 driverbyte=0x00 Mar 21 03:09:01 Tower kernel: sd 6:0:0:0: [sde] CDB: cdb[0]=0x28: 28 00 00 00 83 df 00 00 28 00 Mar 21 03:09:01 Tower kernel: end_request: I/O error, dev sde, sector 33759 Mar 21 03:09:01 Tower kernel: md: disk1 read error Mar 21 03:09:01 Tower kernel: handle_stripe read error: 33696/1, count: 1 Mar 21 03:09:01 Tower kernel: md: disk1 read error Mar 21 03:09:01 Tower kernel: handle_stripe read error: 33704/1, count: 1 Mar 21 03:09:01 Tower kernel: md: disk1 read error Mar 21 03:09:01 Tower kernel: handle_stripe read error: 33712/1, count: 1 Mar 21 03:09:01 Tower kernel: md: disk1 read error Mar 21 03:09:01 Tower kernel: handle_stripe read error: 33720/1, count: 1 Mar 21 03:09:01 Tower kernel: md: disk1 read error Mar 21 03:09:01 Tower kernel: handle_stripe read error: 33728/1, count: 1 Mar 21 03:09:01 Tower kernel: sd 6:0:0:0: [sde] Synchronizing SCSI cache Mar 21 03:09:01 Tower kernel: sd 6:0:0:0: [sde] Result: hostbyte=0x04 driverbyte=0x00 Mar 21 03:09:01 Tower kernel: sd 6:0:0:0: [sde] Stopping disk Mar 21 03:09:01 Tower kernel: sd 6:0:0:0: [sde] START_STOP FAILED Mar 21 03:09:01 Tower kernel: sd 6:0:0:0: [sde] Result: hostbyte=0x04 driverbyte=0x00 Mar 21 03:09:11 Tower emhttp: disk_spinning: open: No such file or directory Full syslog here : http://dl.dropbox.com/u/3121169/NewUnraidIssue.zip Now, disk1 is disabled as you can see in the screenshot I put in above zip file also. Smart report for that disk gives this : --- smartctl -a -d ata /dev/sde smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build) Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net Smartctl open device: /dev/sde failed: No such device --- Unraid version is 4.7. There is a smart report from some days ago for both disk1 and the cache disk in the file I uploaded previously : http://dl.dropbox.com/u/3121169/UnraidLogs.zip I have a feeling all is being triggered in the first place by ata6 (which I think is the cache disk, but not sure). Do I just try to remove the cache disk and then keep using disk1 or does disk1 have to be replaced? Would really appreciate some help to this. ata6 = sde = disk1 This disk has failed. RMA it and replace ASAP. If another disk fails in the array you will lose the data. There is one more thing you can try. Shutdown the server. Check all cables to disk1 and/or replace the SATA and power connectors. Restart the server and see if sde returns (try a SMART query). un-assign disk1. If you start the array with disk1 unassigned unRAID will forget about disk1. Stop then restart the array you'll be given the option to rebuild the contents of disk1 will be rebuilt on to disk1. I'd not trust this disk unless it passes another pre-clear before it is reassigned. Link to comment
megna22 Posted May 12, 2018 Share Posted May 12, 2018 I'm having the same problem so what's the answer? Driving me crazy! Here's my log movies-syslog-20180512-0159.zip Link to comment
JorgeB Posted May 12, 2018 Share Posted May 12, 2018 13 minutes ago, megna22 said: I'm having the same problem so what's the answer? Driving me crazy! Here's my log movies-syslog-20180512-0159.zip Since the errors are on 3 different ports it's possibly a controller problem, but it can also be any shared cable, like a power splitter or even the PSU itself: May 12 00:00:32 Movies kernel: ata8: exception Emask 0x10 SAct 0x0 SErr 0x1990000 action 0xe frozen May 12 00:00:32 Movies kernel: ata8: irq_stat 0x00400000, PHY RDY changed May 12 00:00:32 Movies kernel: ata8: SError: { PHYRdyChg 10B8B Dispar LinkSeq TrStaTrns } May 12 00:00:32 Movies kernel: ata8: hard resetting link May 12 00:00:32 Movies kernel: ata9: exception Emask 0x10 SAct 0x0 SErr 0x1910000 action 0xe frozen May 12 00:00:32 Movies kernel: ata9: irq_stat 0x00400000, PHY RDY changed May 12 00:00:32 Movies kernel: ata9: SError: { PHYRdyChg Dispar LinkSeq TrStaTrns } May 12 00:00:32 Movies kernel: ata9: hard resetting link May 12 00:00:32 Movies kernel: ata7: exception Emask 0x10 SAct 0x0 SErr 0x1990000 action 0xe frozen May 12 00:00:32 Movies kernel: ata7: irq_stat 0x00400000, PHY RDY changed May 12 00:00:32 Movies kernel: ata7: SError: { PHYRdyChg 10B8B Dispar LinkSeq TrStaTrns } May 12 00:00:32 Movies kernel: ata7: hard resetting link May 12 00:00:38 Movies kernel: ata7: link is slow to respond, please be patient (ready=0) May 12 00:00:38 Movies kernel: ata8: link is slow to respond, please be patient (ready=0) May 12 00:00:38 Movies kernel: ata9: link is slow to respond, please be patient (ready=0) May 12 00:00:42 Movies kernel: ata8: COMRESET failed (errno=-16) May 12 00:00:42 Movies kernel: ata8: hard resetting link May 12 00:00:42 Movies kernel: ata9: COMRESET failed (errno=-16) May 12 00:00:42 Movies kernel: ata9: hard resetting link May 12 00:00:42 Movies kernel: ata7: COMRESET failed (errno=-16) May 12 00:00:42 Movies kernel: ata7: hard resetting link May 12 00:00:42 Movies kernel: ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300) May 12 00:00:43 Movies kernel: ata8.00: configured for UDMA/133 May 12 00:00:43 Movies kernel: ata8: EH complete May 12 00:00:48 Movies kernel: ata9: link is slow to respond, please be patient (ready=0) May 12 00:00:48 Movies kernel: ata7: link is slow to respond, please be patient (ready=0) May 12 00:00:49 Movies kernel: ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 300) May 12 00:00:49 Movies kernel: ata9.00: configured for UDMA/133 May 12 00:00:49 Movies kernel: ata9: EH complete May 12 00:00:51 Movies kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) May 12 00:00:51 Movies kernel: ata7.00: configured for UDMA/133 May 12 00:00:51 Movies kernel: ata7: EH complete Link to comment
megna22 Posted May 13, 2018 Share Posted May 13, 2018 Johnnie I know it's a controller problem. I have a sata controller card PCI that was conflicting with the controller in the bios. I adjusted some settings and fixed most of the problem. I just haven't found the thing causing that dam error. The worse part is some times it makes unraid think a drive is bad and throws it out. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.