Jump to content

ATA errors. How can I tell what drive/controller is the cause ?


KentBrockman

Recommended Posts

Hi All,

 

Running Unraid 6.3.5 and I have noticed the following errors when I start a parity check:

 

Aug 14 20:15:42 Tower kernel: sas: sas_ata_task_done: SAS error 8a (Errors)
Aug 14 20:15:42 Tower kernel: sas: Enter sas_scsi_recover_host busy: 1 failed: 1 (Drive related)
Aug 14 20:15:42 Tower kernel: sas: ata10: end_device-1:3: cmd error handler (Errors)
Aug 14 20:15:42 Tower kernel: sas: ata7: end_device-1:0: dev error handler (Drive related)
Aug 14 20:15:42 Tower kernel: sas: ata8: end_device-1:1: dev error handler (Drive related)
Aug 14 20:15:42 Tower kernel: sas: ata9: end_device-1:2: dev error handler (Drive related)
Aug 14 20:15:42 Tower kernel: sas: ata10: end_device-1:3: dev error handler (Drive related)
Aug 14 20:15:42 Tower kernel: ata10.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 (Errors)
Aug 14 20:15:42 Tower kernel: ata10.00: cmd 25/00:00:68:e7:04/00:04:00:00:00/e0 tag 28 dma 524288 in (Drive related)
Aug 14 20:15:42 Tower kernel:         res 01/04:00:f7:b1:48/00:00:00:00:00/e0 Emask 0x12 (ATA bus error) (Errors)
Aug 14 20:15:42 Tower kernel: ata10.00: status: { ERR } (Drive related)
Aug 14 20:15:42 Tower kernel: ata10.00: error: { ABRT } (Errors)

 

How can I tell what drive or controller is the cause?

 

I have attached diagnostics.

 

Any help would be appreciated.

 

Cheers

tower-diagnostics-20170814-2020.zip

Link to comment

Here's one the first lines showing which drive is connected to ata10 --

 

Aug 14 15:59:49 Tower kernel: ata10.00: ATA-8: ST3000DM001-1CH166,             W1F2LKRJ, CC24, max UDMA/133

 

And here's one of the beginning troublesome items:

 

Aug 14 19:58:16 Tower kernel: ------------[ cut here ]------------
Aug 14 19:58:16 Tower kernel: WARNING: CPU: 0 PID: 32254 at drivers/ata/libata-core.c:5015 __ata_qc_complete+0x59/0xe2
Aug 14 19:58:16 Tower kernel: Modules linked in: xt_nat veth ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat md_mod coretemp ata_piix e1000 sata_sil24 mvsas intel_agp intel_gtt libsas agpgart scsi_transport_sas asus_atk0110 acpi_cpufreq
Aug 14 19:58:16 Tower kernel: CPU: 0 PID: 32254 Comm: kworker/u8:3 Not tainted 4.9.30-unRAID #1
Aug 14 19:58:16 Tower kernel: Hardware name: System manufacturer System Product Name/P5QPL-AM, BIOS 0317    09/21/2009
Aug 14 19:58:16 Tower kernel: Workqueue: events_unbound async_run_entry_fn
Aug 14 19:58:16 Tower kernel: ffffc90003bb3970 ffffffff813a4a1b 0000000000000000 ffffffff819928cd
Aug 14 19:58:16 Tower kernel: ffffc90003bb39b0 ffffffff8104d0d9 0000139717161d70 ffff880117161d50
Aug 14 19:58:16 Tower kernel: ffff8801171622c0 ffff880117161e80 ffff880117160000 ffff88011a7030a0
Aug 14 19:58:16 Tower kernel: Call Trace:
Aug 14 19:58:16 Tower kernel: [<ffffffff813a4a1b>] dump_stack+0x61/0x7e
Aug 14 19:58:16 Tower kernel: [<ffffffff8104d0d9>] __warn+0xb8/0xd3
Aug 14 19:58:16 Tower kernel: [<ffffffff8104d1a1>] warn_slowpath_null+0x18/0x1a
Aug 14 19:58:16 Tower kernel: [<ffffffff814c75e8>] __ata_qc_complete+0x59/0xe2
Aug 14 19:58:16 Tower kernel: [<ffffffff814c7793>] ata_qc_complete+0x122/0x14f
Aug 14 19:58:16 Tower kernel: [<ffffffff814c7acc>] ata_qc_issue+0x251/0x26b
Aug 14 19:58:16 Tower kernel: [<ffffffff814c7df9>] ata_exec_internal_sg+0x313/0x541
Aug 14 19:58:16 Tower kernel: [<ffffffff814c80a0>] ata_exec_internal+0x79/0x86
Aug 14 19:58:16 Tower kernel: [<ffffffff8167bd33>] ? io_schedule_timeout+0xd3/0xfd
Aug 14 19:58:16 Tower kernel: [<ffffffff814d1411>] ata_read_log_page+0xf2/0x142
Aug 14 19:58:16 Tower kernel: [<ffffffff814d163a>] ata_eh_analyze_ncq_error+0xa3/0x26d
Aug 14 19:58:16 Tower kernel: [<ffffffff814d1911>] ata_eh_link_autopsy+0x10d/0x780
Aug 14 19:58:16 Tower kernel: [<ffffffff81082bc0>] ? vprintk_emit+0x344/0x355
Aug 14 19:58:16 Tower kernel: [<ffffffff814d1fac>] ata_eh_autopsy+0x28/0xca
Aug 14 19:58:16 Tower kernel: [<ffffffff814d48f6>] ata_do_eh+0x23/0x93
Aug 14 19:58:16 Tower kernel: [<ffffffff814ca7b8>] ? ata_phys_link_offline+0x26/0x26
Aug 14 19:58:16 Tower kernel: [<ffffffffa0039067>] ? sas_ata_printk+0x70/0x70 [libsas]
Aug 14 19:58:16 Tower kernel: [<ffffffff814ca68e>] ? ata_phys_link_online+0x26/0x26
Aug 14 19:58:16 Tower kernel: [<ffffffffa0039067>] ? sas_ata_printk+0x70/0x70 [libsas]
Aug 14 19:58:16 Tower kernel: [<ffffffff814d49bb>] ata_std_error_handler+0x55/0x5c
Aug 14 19:58:16 Tower kernel: [<ffffffff814d44e2>] ata_scsi_port_error_handler+0x21f/0x55a
Aug 14 19:58:16 Tower kernel: [<ffffffffa0039613>] async_sas_ata_eh+0x43/0x62 [libsas]
Aug 14 19:58:16 Tower kernel: [<ffffffff81065d3e>] async_run_entry_fn+0x32/0xc8
Aug 14 19:58:16 Tower kernel: [<ffffffff8105ed53>] process_one_work+0x192/0x295
Aug 14 19:58:16 Tower kernel: [<ffffffff8105f752>] worker_thread+0x27d/0x369
Aug 14 19:58:16 Tower kernel: [<ffffffff8105f4d5>] ? rescuer_thread+0x2b1/0x2b1
Aug 14 19:58:16 Tower kernel: [<ffffffff81063939>] kthread+0xdb/0xe3
Aug 14 19:58:16 Tower kernel: [<ffffffff8106385e>] ? kthread_park+0x52/0x52
Aug 14 19:58:16 Tower kernel: [<ffffffff8167f785>] ret_from_fork+0x25/0x30
Aug 14 19:58:16 Tower kernel: ---[ end trace f9d4e6597bd93bd2 ]---
Aug 14 19:58:16 Tower kernel: ata10.00: READ LOG DMA EXT failed, trying unqueued
Aug 14 19:58:16 Tower kernel: sas: sas_ata_task_done: SAS error 8a
Aug 14 19:58:16 Tower kernel: ata10: failed to read log page 10h (errno=-5)
Aug 14 19:58:16 Tower kernel: ata10.00: exception Emask 0x1 SAct 0x3e000000 SErr 0x0 action 0x6
Aug 14 19:58:16 Tower kernel: ata10.00: failed command: READ FPDMA QUEUED
Aug 14 19:58:16 Tower kernel: ata10.00: cmd 60/00:00:40:fc:03/04:00:00:00:00/40 tag 25 ncq dma 524288 in
Aug 14 19:58:16 Tower kernel:         res 01/04:e8:40:0c:04/00:00:00:00:00/40 Emask 0x3 (HSM violation)
Aug 14 19:58:16 Tower kernel: ata10.00: status: { ERR }
Aug 14 19:58:16 Tower kernel: ata10.00: error: { ABRT }
Aug 14 19:58:16 Tower kernel: ata10.00: failed command: READ FPDMA QUEUED
Aug 14 19:58:16 Tower kernel: ata10.00: cmd 60/00:00:40:00:04/04:00:00:00:00/40 tag 26 ncq dma 524288 in
Aug 14 19:58:16 Tower kernel:         res 01/04:e8:40:0c:04/00:00:00:00:00/40 Emask 0x3 (HSM violation)
Aug 14 19:58:16 Tower kernel: ata10.00: status: { ERR }
Aug 14 19:58:16 Tower kernel: ata10.00: error: { ABRT }
Aug 14 19:58:16 Tower kernel: ata10.00: failed command: READ FPDMA QUEUED
Aug 14 19:58:16 Tower kernel: ata10.00: cmd 60/00:00:40:04:04/04:00:00:00:00/40 tag 27 ncq dma 524288 in
Aug 14 19:58:16 Tower kernel:         res 01/04:e8:40:0c:04/00:00:00:00:00/40 Emask 0x3 (HSM violation)
Aug 14 19:58:16 Tower kernel: ata10.00: status: { ERR }
Aug 14 19:58:16 Tower kernel: ata10.00: error: { ABRT }
Aug 14 19:58:16 Tower kernel: ata10.00: failed command: READ FPDMA QUEUED
Aug 14 19:58:16 Tower kernel: ata10.00: cmd 60/00:00:40:08:04/04:00:00:00:00/40 tag 28 ncq dma 524288 in
Aug 14 19:58:16 Tower kernel:         res 01/04:e8:40:0c:04/00:00:00:00:00/40 Emask 0x3 (HSM violation)
Aug 14 19:58:16 Tower kernel: ata10.00: status: { ERR }
Aug 14 19:58:16 Tower kernel: ata10.00: error: { ABRT }
Aug 14 19:58:16 Tower kernel: ata10.00: failed command: READ FPDMA QUEUED
Aug 14 19:58:16 Tower kernel: ata10.00: cmd 60/00:00:40:0c:04/04:00:00:00:00/40 tag 29 ncq dma 524288 in
Aug 14 19:58:16 Tower kernel:         res 01/04:e8:40:0c:04/00:00:00:00:00/40 Emask 0x3 (HSM violation)
Aug 14 19:58:16 Tower kernel: ata10.00: status: { ERR }
Aug 14 19:58:16 Tower kernel: ata10.00: error: { ABRT }
Aug 14 19:58:16 Tower kernel: ata10: hard resetting link
Aug 14 19:58:16 Tower kernel: sas: ata11: end_device-1:4: dev error handler
Aug 14 19:58:16 Tower kernel: sas: ata12: end_device-1:5: dev error handler
Aug 14 19:58:16 Tower kernel: sas: ata13: end_device-1:6: dev error handler
Aug 14 19:58:16 Tower kernel: sas: ata14: end_device-1:7: dev error handler
Aug 14 19:58:16 Tower kernel: sas: sas_ata_task_done: SAS error 8a
Aug 14 19:58:16 Tower kernel: ata10.00: failed to IDENTIFY (I/O error, err_mask=0x11)
Aug 14 19:58:16 Tower kernel: ata10.00: revalidation failed (errno=-5)
Aug 14 19:58:17 Tower kernel: mvsas 0000:01:00.0: Phy3 : No sig fis
Aug 14 19:58:20 Tower kernel: sas: sas_form_port: phy3 belongs to port3 already(1)!
Aug 14 19:58:21 Tower emhttp: cmd: /usr/local/emhttp/plugins/dynamix/scripts/tail_log syslog
Aug 14 19:58:21 Tower kernel: ata10: hard resetting link
Aug 14 19:58:22 Tower kernel: ata10.00: configured for UDMA/133
Aug 14 19:58:22 Tower kernel: ata10: EH complete
Aug 14 19:58:22 Tower kernel: sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 5 tries: 1
Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_sas.c 1870:Release slot [f] tag[f], task [ffff8800871b0200]:
Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 672:command active 03FF8000,  slot [f].
Aug 14 19:58:22 Tower kernel: sas: sas_ata_task_done: SAS error 8a
Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_sas.c 1870:Release slot [10] tag[10], task [ffff8800871b0700]:
Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 672:command active 03FF0000,  slot [10].
Aug 14 19:58:22 Tower kernel: sas: sas_ata_task_done: SAS error 8a
Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_sas.c 1870:Release slot [11] tag[11], task [ffff8800871b0800]:
Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 672:command active 03FE0000,  slot [11].
Aug 14 19:58:22 Tower kernel: sas: sas_ata_task_done: SAS error 8a
Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_sas.c 1870:Release slot [12] tag[12], task [ffff8800871b0d00]:
Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 672:command active 03FC0000,  slot [12].
Aug 14 19:58:22 Tower kernel: sas: sas_ata_task_done: SAS error 8a
Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_sas.c 1870:Release slot [13] tag[13], task [ffff88006111e800]:
Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 672:command active 03F80000,  slot [13].
Aug 14 19:58:22 Tower kernel: sas: sas_ata_task_done: SAS error 8a
Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_sas.c 1870:Release slot [14] tag[14], task [ffff88006111e700]:
Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 672:command active 03F00000,  slot [14].
Aug 14 19:58:22 Tower kernel: sas: sas_ata_task_done: SAS error 8a
Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_sas.c 1870:Release slot [15] tag[15], task [ffff88006111e600]:
Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 672:command active 03E00000,  slot [15].
Aug 14 19:58:22 Tower kernel: sas: sas_ata_task_done: SAS error 8a
Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_sas.c 1870:Release slot [16] tag[16], task [ffff88006111e900]:
Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 672:command active 03C00000,  slot [16].
Aug 14 19:58:22 Tower kernel: sas: sas_ata_task_done: SAS error 8a
Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_sas.c 1870:Release slot [17] tag[17], task [ffff88006111e100]:
Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 672:command active 03800000,  slot [17].
Aug 14 19:58:22 Tower kernel: sas: sas_ata_task_done: SAS error 8a
Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_sas.c 1870:Release slot [18] tag[18], task [ffff88006111e400]:
Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 672:command active 03000000,  slot [18].
Aug 14 19:58:22 Tower kernel: sas: sas_ata_task_done: SAS error 8a
Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_sas.c 1870:Release slot [19] tag[19], task [ffff88006111e300]:
Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 672:command active 02000000,  slot [19].
Aug 14 19:58:22 Tower kernel: sas: sas_ata_task_done: SAS error 8a

 

Link to comment
1 hour ago, BRiT said:

If you look earlier in the syslog you will be able to see which /dev/sd# device is mapped to which ata#.

Thank you guys.  Looks like I have a drive to replace.

 

If Unraid knows this is related to a specific drive why aren't these errors reported in the disk log?

Should I make this a feature request or is there a reason this is the way it is?

 

Cheers

Link to comment
21 hours ago, BRiT said:

If you look earlier in the syslog you will be able to see which /dev/sd# device is mapped to which ata#.

Even easier is to click on the little drive icon next to each drive on the main tab.  Brings up the log relating specifically to that drive.

Link to comment
7 hours ago, KentBrockman said:

If it's not the drive, what do you suggest I do ?

 

If it's the SAS2LP the best solution would be to replace it with an LSI, but try that disk on a different controller first, your onboard controller is IDE only, a board with AHCI support with be much better.

Link to comment

It looks like it might be the controller.  My parity drive just red balled and is showing similar errors.

An extended smart test seems to indicate the drive is ok.

 

I had not heard of issues with the SAS2LP but it looks like the consensus is an LSI based card is the way to go.

I am looking at this one:  http://www.ebay.ca/itm/291641245650

 

Any last advice before I make the jump to the new controller?

 

tower-diagnostics-20170819-1044.zip

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...