Jump to content

<SOLVED> Syslog Error on ata19, which is that drive?


642

Recommended Posts

Posted

Tower is currently checking parity, like every month.

There are some errors in the syslog, something about BadCRC, so it may be a cable (data or power) issue.

Dec 14 21:13:02 Tower kernel: ata19.00: exception Emask 0x50 SAct 0x0 SErr 0x280900 action 0x6 frozen (Errors)

Dec 14 21:13:02 Tower kernel: ata18.00: exception Emask 0x50 SAct 0x0 SErr 0x280900 action 0x6 frozen (Errors)

Dec 14 21:13:02 Tower kernel: ata18.00: irq_stat 0x08000000, interface fatal error (Errors)

Dec 14 21:13:02 Tower kernel: ata18: SError: { UnrecovData HostInt 10B8B BadCRC } (Errors)

Dec 14 21:13:02 Tower kernel: ata19.00: irq_stat 0x08000000, interface fatal error (Errors)

Dec 14 21:13:02 Tower kernel: ata18.00: failed command: READ DMA EXT (Minor Issues)

Dec 14 21:13:02 Tower kernel: ata18.00: cmd 25/00:a8:97:39:c9/00:01:07:00:00/e0 tag 0 dma 217088 in (Drive related)

Dec 14 21:13:02 Tower kernel:          res 50/00:00:96:39:c9/00:00:00:00:00/e7 Emask 0x50 (ATA bus error) (Errors)

Dec 14 21:13:02 Tower kernel: ata18.00: status: { DRDY } (Drive related)

Dec 14 21:13:02 Tower kernel: ata19: SError: { UnrecovData HostInt 10B8B BadCRC } (Errors)

Dec 14 21:13:02 Tower kernel: ata18: hard resetting link (Minor Issues)

Dec 14 21:13:02 Tower kernel: ata19.00: failed command: READ DMA EXT (Minor Issues)

Dec 14 21:13:02 Tower kernel: ata19.00: cmd 25/00:28:48:3a:c9/00:01:07:00:00/e0 tag 0 dma 151552 in (Drive related)

Dec 14 21:13:02 Tower kernel:          res 50/00:00:47:3a:c9/00:00:07:00:00/e7 Emask 0x50 (ATA bus error) (Errors)

 

My question is : it seems that the errors are related to ata18 and ata19. Which Hdds are on ata18 and ata19, and how may I know that?

 

Complete Syslog attached.

 

Thanks

syslog-2011-12-15.txt

Posted

You can tell what types of drives these are:

 

ata19.00: ATA-8: SAMSUNG HD204UI, 1AQ10003, max UDMA/133
Dec 14 17:19:39 Tower kernel: ata19.00: 3907029168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
Dec 14 17:19:39 Tower kernel: ata19.00: configured for UDMA/133
Dec 14 17:19:39 Tower kernel: ata18.00: ATA-8: ST31500341AS, CC4H, max UDMA/133
Dec 14 17:19:39 Tower kernel: ata18.00: 2930277168 sectors, multi 0: LBA48 NCQ (depth 31/32)

Posted

Also note that both those drives are throwing errors at the same (exact) time. That seems unusual; maybe there's a clue for you ... cabling? power?  ???

 

--UhClem

 

Posted

Thanks a lot for your answers.

But I'm a little confused, because for ata19, all my Samsung Drive have the firmware 1AQ10003, and I have 5 of them.

And for ata18, I have 7 drives ST31500341AS.

 

Is there any chance to point the correct disk, with the serial, or diskxx, something I can be recognize?

 

Help, I have a problem, on a disk (or two) but I don't know which one...

 

BTW, the parity check is done, without error.

 

 

Posted

It seems very confusing.

There are (at least) diskxx, ataxx, mdxx, sdx, scsi-x, sas-x, simply x (like in spindown x)... all of this seems very random.

Any method to make a table of what is connected to what by what?

Posted

ata19 is your parity disk (currently) sds - attached to port 5(or 6 - in the syslog they are numbered 0 to 5, on the MB they are probably labeled 1 to 6) on the motherboard

 

I believe ata 18 is the Seagate 9VS2N6RH - sdr - attached to the MB port 4 (probably labeled as 5 on the motherboard)

Posted

ata19 is your parity disk (currently) sds - attached to port 5(or 6 - in the syslog they are numbered 0 to 5, on the MB they are probably labeled 1 to 6) on the motherboard

 

I believe ata 18 is the Seagate 9VS2N6RH - sdr - attached to the MB port 4 (probably labeled as 5 on the motherboard)

 

Thanks a lot for your guess.

Can you give some explanation about the methodology that gives you those infos?  ;)

Posted

Thanks a lot for your guess. Can you give some explanation about the methodology that gives you those infos?  ;)

 

it is easy on all versions up to 4.7 but not so in 5.

 

Anyways start from the inventory

Dec 14 17:19:40 Tower emhttp: Device inventory:

Dec 14 17:19:40 Tower emhttp: pci-0000:00:11.0-scsi-0:0:0:0 host3 (sdo) ST31500341AS_9VS2NKCP

Dec 14 17:19:40 Tower emhttp: pci-0000:00:11.0-scsi-1:0:0:0 host4 (sdp) ST31500341AS_9VS2B9ZW

Dec 14 17:19:40 Tower emhttp: pci-0000:00:11.0-scsi-2:0:0:0 host5 (sdq) ST31500341AS_9VS299T1

Dec 14 17:19:40 Tower emhttp: pci-0000:00:11.0-scsi-4:0:0:0 host7 (sdr) ST31500341AS_9VS2N6RH

Dec 14 17:19:40 Tower emhttp: pci-0000:00:11.0-scsi-5:0:0:0 host8 (sds) SAMSUNG_HD204UI_S2HFJ9FZA01378

Dec 14 17:19:40 Tower emhttp: pci-0000:02:00.0-sas-phy0:1-0x0000000000000000:0-lun0 host0 (sda) WDC_WD10EACS-00ZJB0_WD-WCASJ1699821

Dec 14 17:19:40 Tower emhttp: pci-0000:02:00.0-sas-phy1:1-0x0100000000000000:1-lun0 host0 (sdb) SAMSUNG_HD103SIS1VSJ90S573052

Dec 14 17:19:40 Tower emhttp: pci-0000:02:00.0-sas-phy2:1-0x0200000000000000:2-lun0 host0 (sdc) WDC_WD20EARS-00_WD-WCAZA2920888

Dec 14 17:19:40 Tower emhttp: pci-0000:02:00.0-sas-phy3:1-0x0300000000000000:3-lun0 host0 (sdd) ST31500341AS_9VS2N1C3

Dec 14 17:19:40 Tower emhttp: pci-0000:02:00.0-sas-phy4:1-0x0400000000000000:4-lun0 host0 (sde) SAMSUNG_HD204UIS2HGJ90Z906573

Dec 14 17:19:40 Tower emhttp: pci-0000:02:00.0-sas-phy5:1-0x0500000000000000:5-lun0 host0 (sdf) SAMSUNG_HD103SIS1VSJ90S573490

Dec 14 17:19:40 Tower emhttp: pci-0000:02:00.0-sas-phy6:1-0x0600000000000000:6-lun0 host0 (sdg) SAMSUNG_HD204UIS2HFJ9CZA00326

Dec 14 17:19:40 Tower emhttp: pci-0000:02:00.0-sas-phy7:1-0x0700000000000000:7-lun0 host0 (sdh) ST31500341AS_9VS2BC09

Dec 14 17:19:40 Tower emhttp: pci-0000:03:00.0-sas-phy0:1-0x0000000000000000:0-lun0 host2 (sdj) ST32000542AS_5XW0E6Z4

Dec 14 17:19:40 Tower emhttp: pci-0000:03:00.0-sas-phy4:1-0x0400000000000000:4-lun0 host2 (sdk) SAMSUNG_HD204UIS2HGJ9FZ900400

Dec 14 17:19:40 Tower emhttp: pci-0000:03:00.0-sas-phy5:1-0x0500000000000000:5-lun0 host2 (sdl) WDC_WD10EACS-00_WD-WCASJ1695946

Dec 14 17:19:40 Tower emhttp: pci-0000:03:00.0-sas-phy6:1-0x0600000000000000:6-lun0 host2 (sdm) SAMSUNG_HD204UIS2HFJ9BZA00817

Dec 14 17:19:40 Tower emhttp: pci-0000:03:00.0-sas-phy7:1-0x0700000000000000:7-lun0 host2 (sdn) ST31500341AS_9VS2NKHC

 

and the the Unraid slot assignments:

Dec 14 17:19:40 Tower kernel: md: unRAID driver 1.1.1 installed

Dec 14 17:19:40 Tower kernel: md: import disk0: [65,32] (sds) SAMSUNG HD204UI  S2HFJ9FZA01378       size: 1953514552

Dec 14 17:19:40 Tower kernel: md: import disk1: [8,160] (sdk) SAMSUNG HD204UI  S2HGJ9FZ900400       size: 1953514552

Dec 14 17:19:40 Tower kernel: md: import disk2: [8,96] (sdg) SAMSUNG HD204UI  S2HFJ9CZA00326       size: 1953514552

Dec 14 17:19:40 Tower kernel: md: import disk3: [8,176] (sdl) WDC WD10EACS-00Z WD-WCASJ1695946 size: 976762552

Dec 14 17:19:40 Tower kernel: md: import disk4: [8,192] (sdm) SAMSUNG HD204UI  S2HFJ9BZA00817       size: 1953514552

Dec 14 17:19:40 Tower kernel: md: import disk5: [8,64] (sde) SAMSUNG HD204UI  S2HGJ90Z906573       size: 1953514552

Dec 14 17:19:40 Tower kernel: md: import disk6: [8,0] (sda) WDC WD10EACS-00Z WD-WCASJ1699821 size: 976762552

Dec 14 17:19:40 Tower kernel: md: import disk7: [8,32] (sdc) WDC WD20EARS-00M WD-WCAZA2920888 size: 1953514552

Dec 14 17:19:40 Tower kernel: md: import disk8: [8,16] (sdb) SAMSUNG HD103SI  S1VSJ90S573052       size: 976762552

Dec 14 17:19:40 Tower kernel: md: import disk9: [8,80] (sdf) SAMSUNG HD103SI  S1VSJ90S573490       size: 976762552

Dec 14 17:19:40 Tower kernel: md: import disk10: [8,112] (sdh) ST31500341AS     9VS2BC09 size: 1465138552

Dec 14 17:19:40 Tower kernel: md: import disk11: [65,0] (sdq) ST31500341AS     9VS299T1 size: 1465138552

Dec 14 17:19:40 Tower kernel: md: import disk12: [8,240] (sdp) ST31500341AS     9VS2B9ZW size: 1465138552

Dec 14 17:19:40 Tower kernel: md: import disk13: [8,208] (sdn) ST31500341AS     9VS2NKHC size: 1465138552

Dec 14 17:19:40 Tower kernel: md: import disk14: [8,48] (sdd) ST31500341AS     9VS2N1C3 size: 1465138552

Dec 14 17:19:40 Tower kernel: md: import disk15: [65,16] (sdr) ST31500341AS     9VS2N6RH size: 1465138552

Dec 14 17:19:40 Tower kernel: md: import disk16: [8,224] (sdo) ST31500341AS     9VS2NKCP size: 1465138552

Dec 14 17:19:40 Tower kernel: md: import disk17: [8,144] (sdj) ST32000542AS     5XW0E6Z4 size: 1953514552

 

You can observe yourself some correlations there.

Then you search the syslog

I will start with devices attached to pci bus pci-0000:02:00.0  - these are (in your case and it may change if you change your configuration, especially adding anew controller) sda to sdh. From the inventory you can see where each drive is connected (to physical ports - the cable should have labels on them). These are your host0. Once you search the syslog for pci-0000:02:00.0 you will find their ataXX connection.

In the syslog it starts from here:

Dec 14 17:19:39 Tower kernel: mvsas 0000:02:00.0: mvsas: driver version 0.8.4

all the way to:

Dec 14 17:19:39 Tower kernel: sd 0:0:7:0: [sdh] Attached SCSI disk

These are your ata1 to ata8

 

Then we move to the HD attached to pci-0000:03:00.0 These are attached to your second SM SASLP card -one on the primary channel and 4 on the secondary. These are sdj to sdn. These are also host2

The interesting entries for pci-0000:03:00.0 starts from here:

Dec 14 17:19:39 Tower kernel: mvsas 0000:03:00.0: mvsas: driver version 0.8.4

and goes down to:

Dec 14 17:19:39 Tower kernel: sd 2:0:4:0: [sdn] Attached SCSI disk

You will see that these are ata9 to ata13

 

The devices attached to pci-0000:00:11.0 are your motherboard SATA ports. These are sdo to sds. They are individual hosts.

You have used your MB sata1 to sata3 ports, sata4 is not used and then sata5 and 6 are used again.

These correspond to your ata14 to ata19. ata17 is unused

 

Hope that clears it a bit for you.

 

in your case

sda is ata1 is md6 is disk6 is 6

 

All these "assignments" from the current syslog can change especially if you add a new controller.

Posted

Waow.

This is An explanation, thanks a lot.

If you didn't explain it, I didn't had a chance to find it by myself.

 

It remains of course some dark region, like the two lists scsi3-scsi4-scsi5-scsi6-scsi7-scsi8 and ata14-ata15-ata16-ata17-ata18-ata19 that should be accepted as if they were bounded (if I correctly understood it), but if it's the way it works, the first of the first list with the first of the second list...why not?

 

It's time now to see what's the matter with those two drives, ata18 and ata19, the cables etc...

 

I will soon made some changes in my tower, but just after I will write a complete table of the drive after that (sdx - ataxx - mdx - diskx - x - 4lastID).

 

 

And again, Thanks a lot.

 

 

Posted

In your case:

scsi0 are the drives on the first SASLP card

 

scsi1 is your flash drive (or to be exact in your case a flash reader...)

 

scsi2 are the drives on the second SASLP

 

scsi3 to scsi8 are the drives attached to the motherboard SATA ports

===========================================================

 

Do not sweat that much as Unraid 5 will mask some of these with the purpose that once you add a HD then you can change motherboards, controllers, etc but your system will know what slot this drive is supposed to be.

Posted

As long as version 5 includes better syslog error documentation so we can find which drive is kicking out errors. I've never had a problem matching drive serial numbers to unraid slot assignments, it's matching serial numbers to bus errors in the syslog that tends to be a pain.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...