642 Posted December 15, 2011 Posted December 15, 2011 Tower is currently checking parity, like every month. There are some errors in the syslog, something about BadCRC, so it may be a cable (data or power) issue. Dec 14 21:13:02 Tower kernel: ata19.00: exception Emask 0x50 SAct 0x0 SErr 0x280900 action 0x6 frozen (Errors) Dec 14 21:13:02 Tower kernel: ata18.00: exception Emask 0x50 SAct 0x0 SErr 0x280900 action 0x6 frozen (Errors) Dec 14 21:13:02 Tower kernel: ata18.00: irq_stat 0x08000000, interface fatal error (Errors) Dec 14 21:13:02 Tower kernel: ata18: SError: { UnrecovData HostInt 10B8B BadCRC } (Errors) Dec 14 21:13:02 Tower kernel: ata19.00: irq_stat 0x08000000, interface fatal error (Errors) Dec 14 21:13:02 Tower kernel: ata18.00: failed command: READ DMA EXT (Minor Issues) Dec 14 21:13:02 Tower kernel: ata18.00: cmd 25/00:a8:97:39:c9/00:01:07:00:00/e0 tag 0 dma 217088 in (Drive related) Dec 14 21:13:02 Tower kernel: res 50/00:00:96:39:c9/00:00:00:00:00/e7 Emask 0x50 (ATA bus error) (Errors) Dec 14 21:13:02 Tower kernel: ata18.00: status: { DRDY } (Drive related) Dec 14 21:13:02 Tower kernel: ata19: SError: { UnrecovData HostInt 10B8B BadCRC } (Errors) Dec 14 21:13:02 Tower kernel: ata18: hard resetting link (Minor Issues) Dec 14 21:13:02 Tower kernel: ata19.00: failed command: READ DMA EXT (Minor Issues) Dec 14 21:13:02 Tower kernel: ata19.00: cmd 25/00:28:48:3a:c9/00:01:07:00:00/e0 tag 0 dma 151552 in (Drive related) Dec 14 21:13:02 Tower kernel: res 50/00:00:47:3a:c9/00:00:07:00:00/e7 Emask 0x50 (ATA bus error) (Errors) My question is : it seems that the errors are related to ata18 and ata19. Which Hdds are on ata18 and ata19, and how may I know that? Complete Syslog attached. Thanks syslog-2011-12-15.txt
dgaschk Posted December 15, 2011 Posted December 15, 2011 You can tell what types of drives these are: ata19.00: ATA-8: SAMSUNG HD204UI, 1AQ10003, max UDMA/133 Dec 14 17:19:39 Tower kernel: ata19.00: 3907029168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA Dec 14 17:19:39 Tower kernel: ata19.00: configured for UDMA/133 Dec 14 17:19:39 Tower kernel: ata18.00: ATA-8: ST31500341AS, CC4H, max UDMA/133 Dec 14 17:19:39 Tower kernel: ata18.00: 2930277168 sectors, multi 0: LBA48 NCQ (depth 31/32)
UhClem Posted December 15, 2011 Posted December 15, 2011 Also note that both those drives are throwing errors at the same (exact) time. That seems unusual; maybe there's a clue for you ... cabling? power? --UhClem
642 Posted December 15, 2011 Author Posted December 15, 2011 Thanks a lot for your answers. But I'm a little confused, because for ata19, all my Samsung Drive have the firmware 1AQ10003, and I have 5 of them. And for ata18, I have 7 drives ST31500341AS. Is there any chance to point the correct disk, with the serial, or diskxx, something I can be recognize? Help, I have a problem, on a disk (or two) but I don't know which one... BTW, the parity check is done, without error.
642 Posted December 15, 2011 Author Posted December 15, 2011 It seems very confusing. There are (at least) diskxx, ataxx, mdxx, sdx, scsi-x, sas-x, simply x (like in spindown x)... all of this seems very random. Any method to make a table of what is connected to what by what?
bcbgboy13 Posted December 15, 2011 Posted December 15, 2011 ata19 is your parity disk (currently) sds - attached to port 5(or 6 - in the syslog they are numbered 0 to 5, on the MB they are probably labeled 1 to 6) on the motherboard I believe ata 18 is the Seagate 9VS2N6RH - sdr - attached to the MB port 4 (probably labeled as 5 on the motherboard)
642 Posted December 15, 2011 Author Posted December 15, 2011 ata19 is your parity disk (currently) sds - attached to port 5(or 6 - in the syslog they are numbered 0 to 5, on the MB they are probably labeled 1 to 6) on the motherboard I believe ata 18 is the Seagate 9VS2N6RH - sdr - attached to the MB port 4 (probably labeled as 5 on the motherboard) Thanks a lot for your guess. Can you give some explanation about the methodology that gives you those infos?
bcbgboy13 Posted December 15, 2011 Posted December 15, 2011 Thanks a lot for your guess. Can you give some explanation about the methodology that gives you those infos? it is easy on all versions up to 4.7 but not so in 5. Anyways start from the inventory Dec 14 17:19:40 Tower emhttp: Device inventory: Dec 14 17:19:40 Tower emhttp: pci-0000:00:11.0-scsi-0:0:0:0 host3 (sdo) ST31500341AS_9VS2NKCP Dec 14 17:19:40 Tower emhttp: pci-0000:00:11.0-scsi-1:0:0:0 host4 (sdp) ST31500341AS_9VS2B9ZW Dec 14 17:19:40 Tower emhttp: pci-0000:00:11.0-scsi-2:0:0:0 host5 (sdq) ST31500341AS_9VS299T1 Dec 14 17:19:40 Tower emhttp: pci-0000:00:11.0-scsi-4:0:0:0 host7 (sdr) ST31500341AS_9VS2N6RH Dec 14 17:19:40 Tower emhttp: pci-0000:00:11.0-scsi-5:0:0:0 host8 (sds) SAMSUNG_HD204UI_S2HFJ9FZA01378 Dec 14 17:19:40 Tower emhttp: pci-0000:02:00.0-sas-phy0:1-0x0000000000000000:0-lun0 host0 (sda) WDC_WD10EACS-00ZJB0_WD-WCASJ1699821 Dec 14 17:19:40 Tower emhttp: pci-0000:02:00.0-sas-phy1:1-0x0100000000000000:1-lun0 host0 (sdb) SAMSUNG_HD103SIS1VSJ90S573052 Dec 14 17:19:40 Tower emhttp: pci-0000:02:00.0-sas-phy2:1-0x0200000000000000:2-lun0 host0 (sdc) WDC_WD20EARS-00_WD-WCAZA2920888 Dec 14 17:19:40 Tower emhttp: pci-0000:02:00.0-sas-phy3:1-0x0300000000000000:3-lun0 host0 (sdd) ST31500341AS_9VS2N1C3 Dec 14 17:19:40 Tower emhttp: pci-0000:02:00.0-sas-phy4:1-0x0400000000000000:4-lun0 host0 (sde) SAMSUNG_HD204UIS2HGJ90Z906573 Dec 14 17:19:40 Tower emhttp: pci-0000:02:00.0-sas-phy5:1-0x0500000000000000:5-lun0 host0 (sdf) SAMSUNG_HD103SIS1VSJ90S573490 Dec 14 17:19:40 Tower emhttp: pci-0000:02:00.0-sas-phy6:1-0x0600000000000000:6-lun0 host0 (sdg) SAMSUNG_HD204UIS2HFJ9CZA00326 Dec 14 17:19:40 Tower emhttp: pci-0000:02:00.0-sas-phy7:1-0x0700000000000000:7-lun0 host0 (sdh) ST31500341AS_9VS2BC09 Dec 14 17:19:40 Tower emhttp: pci-0000:03:00.0-sas-phy0:1-0x0000000000000000:0-lun0 host2 (sdj) ST32000542AS_5XW0E6Z4 Dec 14 17:19:40 Tower emhttp: pci-0000:03:00.0-sas-phy4:1-0x0400000000000000:4-lun0 host2 (sdk) SAMSUNG_HD204UIS2HGJ9FZ900400 Dec 14 17:19:40 Tower emhttp: pci-0000:03:00.0-sas-phy5:1-0x0500000000000000:5-lun0 host2 (sdl) WDC_WD10EACS-00_WD-WCASJ1695946 Dec 14 17:19:40 Tower emhttp: pci-0000:03:00.0-sas-phy6:1-0x0600000000000000:6-lun0 host2 (sdm) SAMSUNG_HD204UIS2HFJ9BZA00817 Dec 14 17:19:40 Tower emhttp: pci-0000:03:00.0-sas-phy7:1-0x0700000000000000:7-lun0 host2 (sdn) ST31500341AS_9VS2NKHC and the the Unraid slot assignments: Dec 14 17:19:40 Tower kernel: md: unRAID driver 1.1.1 installed Dec 14 17:19:40 Tower kernel: md: import disk0: [65,32] (sds) SAMSUNG HD204UI S2HFJ9FZA01378 size: 1953514552 Dec 14 17:19:40 Tower kernel: md: import disk1: [8,160] (sdk) SAMSUNG HD204UI S2HGJ9FZ900400 size: 1953514552 Dec 14 17:19:40 Tower kernel: md: import disk2: [8,96] (sdg) SAMSUNG HD204UI S2HFJ9CZA00326 size: 1953514552 Dec 14 17:19:40 Tower kernel: md: import disk3: [8,176] (sdl) WDC WD10EACS-00Z WD-WCASJ1695946 size: 976762552 Dec 14 17:19:40 Tower kernel: md: import disk4: [8,192] (sdm) SAMSUNG HD204UI S2HFJ9BZA00817 size: 1953514552 Dec 14 17:19:40 Tower kernel: md: import disk5: [8,64] (sde) SAMSUNG HD204UI S2HGJ90Z906573 size: 1953514552 Dec 14 17:19:40 Tower kernel: md: import disk6: [8,0] (sda) WDC WD10EACS-00Z WD-WCASJ1699821 size: 976762552 Dec 14 17:19:40 Tower kernel: md: import disk7: [8,32] (sdc) WDC WD20EARS-00M WD-WCAZA2920888 size: 1953514552 Dec 14 17:19:40 Tower kernel: md: import disk8: [8,16] (sdb) SAMSUNG HD103SI S1VSJ90S573052 size: 976762552 Dec 14 17:19:40 Tower kernel: md: import disk9: [8,80] (sdf) SAMSUNG HD103SI S1VSJ90S573490 size: 976762552 Dec 14 17:19:40 Tower kernel: md: import disk10: [8,112] (sdh) ST31500341AS 9VS2BC09 size: 1465138552 Dec 14 17:19:40 Tower kernel: md: import disk11: [65,0] (sdq) ST31500341AS 9VS299T1 size: 1465138552 Dec 14 17:19:40 Tower kernel: md: import disk12: [8,240] (sdp) ST31500341AS 9VS2B9ZW size: 1465138552 Dec 14 17:19:40 Tower kernel: md: import disk13: [8,208] (sdn) ST31500341AS 9VS2NKHC size: 1465138552 Dec 14 17:19:40 Tower kernel: md: import disk14: [8,48] (sdd) ST31500341AS 9VS2N1C3 size: 1465138552 Dec 14 17:19:40 Tower kernel: md: import disk15: [65,16] (sdr) ST31500341AS 9VS2N6RH size: 1465138552 Dec 14 17:19:40 Tower kernel: md: import disk16: [8,224] (sdo) ST31500341AS 9VS2NKCP size: 1465138552 Dec 14 17:19:40 Tower kernel: md: import disk17: [8,144] (sdj) ST32000542AS 5XW0E6Z4 size: 1953514552 You can observe yourself some correlations there. Then you search the syslog I will start with devices attached to pci bus pci-0000:02:00.0 - these are (in your case and it may change if you change your configuration, especially adding anew controller) sda to sdh. From the inventory you can see where each drive is connected (to physical ports - the cable should have labels on them). These are your host0. Once you search the syslog for pci-0000:02:00.0 you will find their ataXX connection. In the syslog it starts from here: Dec 14 17:19:39 Tower kernel: mvsas 0000:02:00.0: mvsas: driver version 0.8.4 all the way to: Dec 14 17:19:39 Tower kernel: sd 0:0:7:0: [sdh] Attached SCSI disk These are your ata1 to ata8 Then we move to the HD attached to pci-0000:03:00.0 These are attached to your second SM SASLP card -one on the primary channel and 4 on the secondary. These are sdj to sdn. These are also host2 The interesting entries for pci-0000:03:00.0 starts from here: Dec 14 17:19:39 Tower kernel: mvsas 0000:03:00.0: mvsas: driver version 0.8.4 and goes down to: Dec 14 17:19:39 Tower kernel: sd 2:0:4:0: [sdn] Attached SCSI disk You will see that these are ata9 to ata13 The devices attached to pci-0000:00:11.0 are your motherboard SATA ports. These are sdo to sds. They are individual hosts. You have used your MB sata1 to sata3 ports, sata4 is not used and then sata5 and 6 are used again. These correspond to your ata14 to ata19. ata17 is unused Hope that clears it a bit for you. in your case sda is ata1 is md6 is disk6 is 6 All these "assignments" from the current syslog can change especially if you add a new controller.
642 Posted December 15, 2011 Author Posted December 15, 2011 Waow. This is An explanation, thanks a lot. If you didn't explain it, I didn't had a chance to find it by myself. It remains of course some dark region, like the two lists scsi3-scsi4-scsi5-scsi6-scsi7-scsi8 and ata14-ata15-ata16-ata17-ata18-ata19 that should be accepted as if they were bounded (if I correctly understood it), but if it's the way it works, the first of the first list with the first of the second list...why not? It's time now to see what's the matter with those two drives, ata18 and ata19, the cables etc... I will soon made some changes in my tower, but just after I will write a complete table of the drive after that (sdx - ataxx - mdx - diskx - x - 4lastID). And again, Thanks a lot.
bcbgboy13 Posted December 15, 2011 Posted December 15, 2011 In your case: scsi0 are the drives on the first SASLP card scsi1 is your flash drive (or to be exact in your case a flash reader...) scsi2 are the drives on the second SASLP scsi3 to scsi8 are the drives attached to the motherboard SATA ports =========================================================== Do not sweat that much as Unraid 5 will mask some of these with the purpose that once you add a HD then you can change motherboards, controllers, etc but your system will know what slot this drive is supposed to be.
JonathanM Posted December 15, 2011 Posted December 15, 2011 As long as version 5 includes better syslog error documentation so we can find which drive is kicking out errors. I've never had a problem matching drive serial numbers to unraid slot assignments, it's matching serial numbers to bus errors in the syslog that tends to be a pain.
Recommended Posts
Archived
This topic is now archived and is closed to further replies.