aspik Posted January 25, 2014 Share Posted January 25, 2014 I have an error at one of my disks, logs shows this: Jan 25 16:38:00 Tower kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jan 25 16:38:00 Tower kernel: ata3.00: irq_stat 0x40000001 Jan 25 16:38:00 Tower kernel: ata3.00: failed command: READ DMA EXT Jan 25 16:38:00 Tower kernel: ata3.00: cmd 25/00:08:60:e2:03/00:00:d6:00:00/e0 tag 0 dma 4096 in Jan 25 16:38:00 Tower kernel: res 51/40:08:60:e2:03/00:00:d6:00:00/e0 Emask 0x9 (media error) Jan 25 16:38:00 Tower kernel: ata3.00: status: { DRDY ERR } Jan 25 16:38:00 Tower kernel: ata3.00: error: { UNC } Jan 25 16:38:00 Tower kernel: ata3.00: configured for UDMA/133 Jan 25 16:38:00 Tower kernel: sd 3:0:0:0: [sdd] Unhandled sense code Jan 25 16:38:00 Tower kernel: sd 3:0:0:0: [sdd] Jan 25 16:38:00 Tower kernel: Result: hostbyte=0x00 driverbyte=0x08 Jan 25 16:38:00 Tower kernel: sd 3:0:0:0: [sdd] Jan 25 16:38:00 Tower kernel: Sense Key : 0x3 [current] [descriptor] Jan 25 16:38:00 Tower kernel: Descriptor sense data with sense descriptors (in hex): Jan 25 16:38:00 Tower kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 Jan 25 16:38:00 Tower kernel: d6 03 e2 60 Jan 25 16:38:00 Tower kernel: sd 3:0:0:0: [sdd] Jan 25 16:38:00 Tower kernel: ASC=0x11 ASCQ=0x4 Jan 25 16:38:00 Tower kernel: sd 3:0:0:0: [sdd] CDB: Jan 25 16:38:00 Tower kernel: cdb[0]=0x28: 28 00 d6 03 e2 60 00 00 08 00 Jan 25 16:38:00 Tower kernel: end_request: I/O error, dev sdd, sector 3590578784 Jan 25 16:38:00 Tower kernel: ata3: EH complete Jan 25 16:38:00 Tower kernel: md: disk1 read error, sector=3590578720 The smart test result are without errors, maybe I look wrong, I can attached them, when someone tell me witch one is the right one. After I started the smart test, the disk did some loud noises (something like buzzing), after few sec it turn to normal noise (the buzzing stops). Should I replace the disk? Thanks for help! Link to comment
dirtysanchez Posted January 25, 2014 Share Posted January 25, 2014 Post a SMART report for that drive. Link to comment
aspik Posted January 25, 2014 Author Share Posted January 25, 2014 Attached the smart logs for the disk smart_sdd.txt Link to comment
dirtysanchez Posted January 25, 2014 Share Posted January 25, 2014 SMART report looks fine. Reseat power and SATA connections to the drive and see if the problem persists. Link to comment
aspik Posted January 25, 2014 Author Share Posted January 25, 2014 Will do this later, currently I'm pre-clearing a disk so I can't power down the server. Now I see the same at the second disk and lot of read errors (130). Attached smart logs from the second disk. I suppose they also look OK. I will reset the cables on that too and see if it helps. Does this have an impact on the parity? How do I handle this errors? Should I do a parity check? smart_sdf.txt Link to comment
aspik Posted January 30, 2014 Author Share Posted January 30, 2014 Hi dirtysanchez, I disconnected and connected the power and sata cables, but unfortunately I still get this error. I have in my array 4 disks: 2x WD Red 3TB and 2x old RE4-GP 2TB, the error occurs only on those RE4 disks. Sometimes only 1 read error and sometimes even 150! I'm really afraid, that something might go wrong on both disks and I loose the data on both disks... I've attached syslog, do you have any other ideas, what's wrong? syslog.txt Link to comment
dirtysanchez Posted January 30, 2014 Share Posted January 30, 2014 I'm not a syslog and drive error expert, but according to this that error is not a problem and can be ignored. Hopefully an expert can chime in. Link to comment
aspik Posted January 31, 2014 Author Share Posted January 31, 2014 Thanks for the link! Which error do you mean? Because the log say: Jan 30 20:27:50 Tower kernel: ata3.00: cmd 25/00:08:40:00:0c/00:00:6e:00:00/e0 tag 0 dma 4096 in Jan 30 20:27:50 Tower kernel: res 51/40:08:40:00:0c/00:00:6e:00:00/e0 Emask 0x9 (media error) Jan 30 20:27:50 Tower kernel: ata3.00: status: { DRDY ERR } Jan 30 20:27:50 Tower kernel: ata3.00: error: { UNC } And this looks like a Drive media issue #1, which is not good at all… Link to comment
itimpi Posted January 31, 2014 Share Posted January 31, 2014 The drive indicated is not fatal per se as the system recovered but it would not occur on a well-behaving system. However it does suggest there is an underlying problem - probably with the cabling or power supply. Link to comment
aspik Posted January 31, 2014 Author Share Posted January 31, 2014 This is strange.. I use brand new Delock straight/straigh 30cm cables (Item No. 82676). The PSU is also a new Corsair CX430M. Everything is in the Q25 case from Lian Li. Could it be something in the HDD cage from the case? As the cables are not directly connected to the disk, but to the HDD cage. Link to comment
DaleWilliams Posted January 31, 2014 Share Posted January 31, 2014 This is strange.. I use brand new Delock straight/straigh 30cm cables (Item No. 82676). The PSU is also a new Corsair CX430M. Everything is in the Q25 case from Lian Li. Could it be something in the HDD cage from the case? As the cables are not directly connected to the disk, but to the HDD cage. Sounds like a very nice build! I'd try changing the SATA cable routing, first. Unless they're shielded (which are hard to find), the SATA cables can interfere with each other. Don't tie them together, even though it does make a clean looking build. Also swap the cables around. If the 'problem' drive moves from sdd to sddx then I'd replace the cable. Link to comment
dgaschk Posted January 31, 2014 Share Posted January 31, 2014 Paste the SMART report for disk 1. Link to comment
dirtysanchez Posted January 31, 2014 Share Posted January 31, 2014 Thanks for the link! Which error do you mean? Because the log say: Jan 30 20:27:50 Tower kernel: ata3.00: cmd 25/00:08:40:00:0c/00:00:6e:00:00/e0 tag 0 dma 4096 in Jan 30 20:27:50 Tower kernel: res 51/40:08:40:00:0c/00:00:6e:00:00/e0 Emask 0x9 (media error) Jan 30 20:27:50 Tower kernel: ata3.00: status: { DRDY ERR } Jan 30 20:27:50 Tower kernel: ata3.00: error: { UNC } And this looks like a Drive media issue #1, which is not good at all… I was referring to the DRDY ERR. Also, I have the same case and as you state it has a backplane. You can always try removing and reseating the drive. Link to comment
aspik Posted February 10, 2014 Author Share Posted February 10, 2014 Unfortunately the error still occurs. What I've done already: - resetet power and data cables: no effect - removed the disks from the backplate and inserted it again: no effect - changed the data cables to other new ones: no effect. Currently I have 5 Disks (with parity) in the array and the errors occurs only on those RE4 disks. Attached syslog and smart report for disk1. Anyone have an Idea? syslog.txt smart_disk1.txt Link to comment
snowboardjoe Posted February 10, 2014 Share Posted February 10, 2014 How often do you get the DRDY ERR? Sort of sounds like a condition where the disk is spun down and an event comes along to trigger it to spin up? Just a guess at this point. Could be certain hardware just reports this. Do you see the error for other drives? Do you know if the disk was normally spun down before the error? Hopefully some others can chime in here on their experiences. EDITED: Also found this... http://lime-technology.com/wiki/index.php/The_Analysis_of_Drive_Issues#Physical_Drive_Issues That indicates it's a true error or interface problem. Could be a long SMART test is in order here. Link to comment
aspik Posted February 10, 2014 Author Share Posted February 10, 2014 Thanks for the replay snowboardjoe. Do you see the error for other drives? As I said before, it occurs only on the wd re4 disks. I have 2 of them in my array. Do you know if the disk was normally spun down before the error? Yes, indeed, it happens when the disk was spun down and I started to play a file on the htpc. That indicates it's a true error or interface problem. Could be a long SMART test is in order here. This is what worries me, an physicial drive issue:( I already done a long smart test from the web-gui, the result was without errors. Link to comment
aspik Posted April 19, 2014 Author Share Posted April 19, 2014 FYI: if anyone else stumble here with similar problems, I solved it finally. Turns out it was bad cable management and not the right cables. I used the straight/straight SATA cables, after closing the side panel the cables were too squeezed and I was getting the errors. When I left the case open (without the side panel) the errors where gone. A solution for the problem is to buy the down/straight cables and do a better cable management... Link to comment
DaleWilliams Posted April 19, 2014 Share Posted April 19, 2014 FYI: if anyone else stumble here with similar problems, I solved it finally. Turns out it was bad cable management and not the right cables. I used the straight/straight SATA cables, after closing the side panel the cables were too squeezed and I was getting the errors. When I left the case open (without the side panel) the errors where gone. A solution for the problem is to buy the down/straight cables and do a better cable management... There's a new one for the wiki! Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.