Jump to content

Hard drive - controller incompatibility?


shihchiun

Recommended Posts

I'm running an Intel D915GAG with an Intel 915G chipset. I'm currently only using the four onboard SATA ports. After I attached my WD Black Caviar Drive (WDC WD1001FALS-00J7B0), unRAID began crashing during data transfer. It was able to make it through parity checks fine, but not data transfer. This has been happening whether I used the drive as the parity drive or as a data drive. I've changed SATA cables, SATA ports, even gone through the RMA process to get a replacement drive - it's still happening. I've got four other drives (on SATA as well as PATA ports) that have no issues.

 

I get the following over and over until unRAID hangs and I have to do a hard reboot (full syslog attached):

May 27 07:19:39 Donburi kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
May 27 07:19:39 Donburi kernel: ata2.00: BMDMA stat 0x66
May 27 07:19:39 Donburi kernel: ata2.00: failed command: WRITE DMA EXT
May 27 07:19:39 Donburi kernel: ata2.00: cmd 35/00:00:0f:59:73/00:04:68:00:00/e0 tag 0 dma 524288 out
May 27 07:19:39 Donburi kernel: res 51/84:0d:02:5b:73/84:02:68:00:00/e0 Emask 0x30 (host bus error)
May 27 07:19:39 Donburi kernel: ata2.00: status: { DRDY ERR }
May 27 07:19:39 Donburi kernel: ata2.00: error: { ICRC ABRT }
May 27 07:19:39 Donburi kernel: ata2: soft resetting link
May 27 07:19:40 Donburi kernel: ata2.00: configured for UDMA/33
May 27 07:19:40 Donburi kernel: ata2.01: configured for UDMA/133
May 27 07:19:40 Donburi kernel: ata2: EH complete

 

The only thing I think of is there being some kind of random incompatibility between the hard drive and the controller. Any ideas?

syslog.txt

Link to comment

The error is a CRC error, which is usually a cabling or noise issue, but could also be a port on the controller or the physical disk itself.

 

You might try a different power connection to the drive. (It sounds as if you've tried most everything else) It could be noise on the power supply line causing the CRC failure while writing to the drive.

 

See this thread describing a similar error: http://www.gossamer-threads.com/lists/linux/kernel/932495

 

Have you gotten a SMART report on the drive to see if it shows anything?  (although you said you replaced the drive, so that should eliminate that as an issue)

 

The other possibility, if the controller pairs the disk with another, simulating how IDE drives paired disks, is the OTHER disk in the pair is causing the error.

 

Looking in your syslog, I see that the drive is paired with your SAMSUNG drive.

May 26 19:27:20 Donburi emhttp: pci-0000:00:1f.1-ide-0:0 ide0 (hda) WDC_WD3000JB-00KFA0_WD-WMAMR1178167

May 26 19:27:20 Donburi emhttp: pci-0000:00:1f.1-ide-0:1 ide0 (hdb) HDS722540VLAT20_VN614PCADWB06C

May 26 19:27:20 Donburi emhttp: pci-0000:00:1f.2-scsi-0:0:0:0 host1 (sda) WDC_WD2500BEVS-00VAT0_WD-WXEY08V88960

May 26 19:27:20 Donburi emhttp: pci-0000:00:1f.2-scsi-1:0:0:0 host2 (sdb) WDC_WD1001FALS-00J7B0_WD-WMATV7366139

May 26 19:27:20 Donburi emhttp: pci-0000:00:1f.2-scsi-1:0:1:0 host2 (sdc) SAMSUNG_HD103SJ_S246J1KZ420985

 

You might try moving the SAMSUNG drive to a different port.

 

Joe L.

Link to comment

Thanks for the reply.

 

It turns out the problem is related to my controller (Intel ICH6) not supporting SATA300. After adding a jumper to force the drive to SATA150, I'm not seeing any more errors after copying several GB of data to the drive. Hopefully I'm not just getting my hopes up.

 

I suppose it's irrelevant now, but I neglected to mention that I did recently switch power supplies (Antec Truepower 480 > Corsair HX620). It had no effect on the hard drive error.

Link to comment

Thanks for the reply.

 

It turns out the problem is related to my controller (Intel ICH6) not supporting SATA300. After adding a jumper to force the drive to SATA150, I'm not seeing any more errors after copying several GB of data to the drive. Hopefully I'm not just getting my hopes up.

 

I suppose it's irrelevant now, but I neglected to mention that I did recently switch power supplies (Antec Truepower 480 > Corsair HX620). It had no effect on the hard drive error.

Glad you figured it out.  I'll remember that for the next time a similar error report is posted.

 

Joe L.

Link to comment

I spoke too soon. The jumper didn't really change anything.

 

I don't think my Samsung drive is the issue. It is new, but this problem has been happening since before I installed it. I've been running the array without the WD and it's been fine.

 

I guess my only options at this point are to 1) ditch this drive and replace it with a different model, or 2) buy an add-on controller and attach it to that.

Link to comment

I spoke too soon. The jumper didn't really change anything.

 

I don't think my Samsung drive is the issue. It is new, but this problem has been happening since before I installed it. I've been running the array without the WD and it's been fine.

 

I guess my only options at this point are to 1) ditch this drive and replace it with a different model, or 2) buy an add-on controller and attach it to that.

I would again try a completely different SATA cable, now that you've lowered the speed to SATA150.  That is still my first suspect. 

 

Joe L.

Link to comment

I can't even find that option in the BIOS. The board is pretty old - I bought it cheap on eBay.

 

I've ordered new cables and a new SATA controller. It's inevitable that I'll need it, so even if it doesn't help (though I think it will) I've lost nothing.

Link to comment

Quick update: installed a PCI-E controller card (Jmicron) and swapped out the cables with some Monoprice cables. The WD Black drive is working without errors now.

 

I still think there's some kind of incompatibility between that drive and the onboard controller, but I can't really rule out a cable issue either.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...