KentBrockman Posted August 15, 2017 Share Posted August 15, 2017 Hi All, Running Unraid 6.3.5 and I have noticed the following errors when I start a parity check: Aug 14 20:15:42 Tower kernel: sas: sas_ata_task_done: SAS error 8a (Errors) Aug 14 20:15:42 Tower kernel: sas: Enter sas_scsi_recover_host busy: 1 failed: 1 (Drive related) Aug 14 20:15:42 Tower kernel: sas: ata10: end_device-1:3: cmd error handler (Errors) Aug 14 20:15:42 Tower kernel: sas: ata7: end_device-1:0: dev error handler (Drive related) Aug 14 20:15:42 Tower kernel: sas: ata8: end_device-1:1: dev error handler (Drive related) Aug 14 20:15:42 Tower kernel: sas: ata9: end_device-1:2: dev error handler (Drive related) Aug 14 20:15:42 Tower kernel: sas: ata10: end_device-1:3: dev error handler (Drive related) Aug 14 20:15:42 Tower kernel: ata10.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 (Errors) Aug 14 20:15:42 Tower kernel: ata10.00: cmd 25/00:00:68:e7:04/00:04:00:00:00/e0 tag 28 dma 524288 in (Drive related) Aug 14 20:15:42 Tower kernel: res 01/04:00:f7:b1:48/00:00:00:00:00/e0 Emask 0x12 (ATA bus error) (Errors) Aug 14 20:15:42 Tower kernel: ata10.00: status: { ERR } (Drive related) Aug 14 20:15:42 Tower kernel: ata10.00: error: { ABRT } (Errors) How can I tell what drive or controller is the cause? I have attached diagnostics. Any help would be appreciated. Cheers tower-diagnostics-20170814-2020.zip Link to comment
BRiT Posted August 15, 2017 Share Posted August 15, 2017 If you look earlier in the syslog you will be able to see which /dev/sd# device is mapped to which ata#. Link to comment
BRiT Posted August 15, 2017 Share Posted August 15, 2017 Here's one the first lines showing which drive is connected to ata10 -- Aug 14 15:59:49 Tower kernel: ata10.00: ATA-8: ST3000DM001-1CH166, W1F2LKRJ, CC24, max UDMA/133 And here's one of the beginning troublesome items: Aug 14 19:58:16 Tower kernel: ------------[ cut here ]------------ Aug 14 19:58:16 Tower kernel: WARNING: CPU: 0 PID: 32254 at drivers/ata/libata-core.c:5015 __ata_qc_complete+0x59/0xe2 Aug 14 19:58:16 Tower kernel: Modules linked in: xt_nat veth ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat md_mod coretemp ata_piix e1000 sata_sil24 mvsas intel_agp intel_gtt libsas agpgart scsi_transport_sas asus_atk0110 acpi_cpufreq Aug 14 19:58:16 Tower kernel: CPU: 0 PID: 32254 Comm: kworker/u8:3 Not tainted 4.9.30-unRAID #1 Aug 14 19:58:16 Tower kernel: Hardware name: System manufacturer System Product Name/P5QPL-AM, BIOS 0317 09/21/2009 Aug 14 19:58:16 Tower kernel: Workqueue: events_unbound async_run_entry_fn Aug 14 19:58:16 Tower kernel: ffffc90003bb3970 ffffffff813a4a1b 0000000000000000 ffffffff819928cd Aug 14 19:58:16 Tower kernel: ffffc90003bb39b0 ffffffff8104d0d9 0000139717161d70 ffff880117161d50 Aug 14 19:58:16 Tower kernel: ffff8801171622c0 ffff880117161e80 ffff880117160000 ffff88011a7030a0 Aug 14 19:58:16 Tower kernel: Call Trace: Aug 14 19:58:16 Tower kernel: [<ffffffff813a4a1b>] dump_stack+0x61/0x7e Aug 14 19:58:16 Tower kernel: [<ffffffff8104d0d9>] __warn+0xb8/0xd3 Aug 14 19:58:16 Tower kernel: [<ffffffff8104d1a1>] warn_slowpath_null+0x18/0x1a Aug 14 19:58:16 Tower kernel: [<ffffffff814c75e8>] __ata_qc_complete+0x59/0xe2 Aug 14 19:58:16 Tower kernel: [<ffffffff814c7793>] ata_qc_complete+0x122/0x14f Aug 14 19:58:16 Tower kernel: [<ffffffff814c7acc>] ata_qc_issue+0x251/0x26b Aug 14 19:58:16 Tower kernel: [<ffffffff814c7df9>] ata_exec_internal_sg+0x313/0x541 Aug 14 19:58:16 Tower kernel: [<ffffffff814c80a0>] ata_exec_internal+0x79/0x86 Aug 14 19:58:16 Tower kernel: [<ffffffff8167bd33>] ? io_schedule_timeout+0xd3/0xfd Aug 14 19:58:16 Tower kernel: [<ffffffff814d1411>] ata_read_log_page+0xf2/0x142 Aug 14 19:58:16 Tower kernel: [<ffffffff814d163a>] ata_eh_analyze_ncq_error+0xa3/0x26d Aug 14 19:58:16 Tower kernel: [<ffffffff814d1911>] ata_eh_link_autopsy+0x10d/0x780 Aug 14 19:58:16 Tower kernel: [<ffffffff81082bc0>] ? vprintk_emit+0x344/0x355 Aug 14 19:58:16 Tower kernel: [<ffffffff814d1fac>] ata_eh_autopsy+0x28/0xca Aug 14 19:58:16 Tower kernel: [<ffffffff814d48f6>] ata_do_eh+0x23/0x93 Aug 14 19:58:16 Tower kernel: [<ffffffff814ca7b8>] ? ata_phys_link_offline+0x26/0x26 Aug 14 19:58:16 Tower kernel: [<ffffffffa0039067>] ? sas_ata_printk+0x70/0x70 [libsas] Aug 14 19:58:16 Tower kernel: [<ffffffff814ca68e>] ? ata_phys_link_online+0x26/0x26 Aug 14 19:58:16 Tower kernel: [<ffffffffa0039067>] ? sas_ata_printk+0x70/0x70 [libsas] Aug 14 19:58:16 Tower kernel: [<ffffffff814d49bb>] ata_std_error_handler+0x55/0x5c Aug 14 19:58:16 Tower kernel: [<ffffffff814d44e2>] ata_scsi_port_error_handler+0x21f/0x55a Aug 14 19:58:16 Tower kernel: [<ffffffffa0039613>] async_sas_ata_eh+0x43/0x62 [libsas] Aug 14 19:58:16 Tower kernel: [<ffffffff81065d3e>] async_run_entry_fn+0x32/0xc8 Aug 14 19:58:16 Tower kernel: [<ffffffff8105ed53>] process_one_work+0x192/0x295 Aug 14 19:58:16 Tower kernel: [<ffffffff8105f752>] worker_thread+0x27d/0x369 Aug 14 19:58:16 Tower kernel: [<ffffffff8105f4d5>] ? rescuer_thread+0x2b1/0x2b1 Aug 14 19:58:16 Tower kernel: [<ffffffff81063939>] kthread+0xdb/0xe3 Aug 14 19:58:16 Tower kernel: [<ffffffff8106385e>] ? kthread_park+0x52/0x52 Aug 14 19:58:16 Tower kernel: [<ffffffff8167f785>] ret_from_fork+0x25/0x30 Aug 14 19:58:16 Tower kernel: ---[ end trace f9d4e6597bd93bd2 ]--- Aug 14 19:58:16 Tower kernel: ata10.00: READ LOG DMA EXT failed, trying unqueued Aug 14 19:58:16 Tower kernel: sas: sas_ata_task_done: SAS error 8a Aug 14 19:58:16 Tower kernel: ata10: failed to read log page 10h (errno=-5) Aug 14 19:58:16 Tower kernel: ata10.00: exception Emask 0x1 SAct 0x3e000000 SErr 0x0 action 0x6 Aug 14 19:58:16 Tower kernel: ata10.00: failed command: READ FPDMA QUEUED Aug 14 19:58:16 Tower kernel: ata10.00: cmd 60/00:00:40:fc:03/04:00:00:00:00/40 tag 25 ncq dma 524288 in Aug 14 19:58:16 Tower kernel: res 01/04:e8:40:0c:04/00:00:00:00:00/40 Emask 0x3 (HSM violation) Aug 14 19:58:16 Tower kernel: ata10.00: status: { ERR } Aug 14 19:58:16 Tower kernel: ata10.00: error: { ABRT } Aug 14 19:58:16 Tower kernel: ata10.00: failed command: READ FPDMA QUEUED Aug 14 19:58:16 Tower kernel: ata10.00: cmd 60/00:00:40:00:04/04:00:00:00:00/40 tag 26 ncq dma 524288 in Aug 14 19:58:16 Tower kernel: res 01/04:e8:40:0c:04/00:00:00:00:00/40 Emask 0x3 (HSM violation) Aug 14 19:58:16 Tower kernel: ata10.00: status: { ERR } Aug 14 19:58:16 Tower kernel: ata10.00: error: { ABRT } Aug 14 19:58:16 Tower kernel: ata10.00: failed command: READ FPDMA QUEUED Aug 14 19:58:16 Tower kernel: ata10.00: cmd 60/00:00:40:04:04/04:00:00:00:00/40 tag 27 ncq dma 524288 in Aug 14 19:58:16 Tower kernel: res 01/04:e8:40:0c:04/00:00:00:00:00/40 Emask 0x3 (HSM violation) Aug 14 19:58:16 Tower kernel: ata10.00: status: { ERR } Aug 14 19:58:16 Tower kernel: ata10.00: error: { ABRT } Aug 14 19:58:16 Tower kernel: ata10.00: failed command: READ FPDMA QUEUED Aug 14 19:58:16 Tower kernel: ata10.00: cmd 60/00:00:40:08:04/04:00:00:00:00/40 tag 28 ncq dma 524288 in Aug 14 19:58:16 Tower kernel: res 01/04:e8:40:0c:04/00:00:00:00:00/40 Emask 0x3 (HSM violation) Aug 14 19:58:16 Tower kernel: ata10.00: status: { ERR } Aug 14 19:58:16 Tower kernel: ata10.00: error: { ABRT } Aug 14 19:58:16 Tower kernel: ata10.00: failed command: READ FPDMA QUEUED Aug 14 19:58:16 Tower kernel: ata10.00: cmd 60/00:00:40:0c:04/04:00:00:00:00/40 tag 29 ncq dma 524288 in Aug 14 19:58:16 Tower kernel: res 01/04:e8:40:0c:04/00:00:00:00:00/40 Emask 0x3 (HSM violation) Aug 14 19:58:16 Tower kernel: ata10.00: status: { ERR } Aug 14 19:58:16 Tower kernel: ata10.00: error: { ABRT } Aug 14 19:58:16 Tower kernel: ata10: hard resetting link Aug 14 19:58:16 Tower kernel: sas: ata11: end_device-1:4: dev error handler Aug 14 19:58:16 Tower kernel: sas: ata12: end_device-1:5: dev error handler Aug 14 19:58:16 Tower kernel: sas: ata13: end_device-1:6: dev error handler Aug 14 19:58:16 Tower kernel: sas: ata14: end_device-1:7: dev error handler Aug 14 19:58:16 Tower kernel: sas: sas_ata_task_done: SAS error 8a Aug 14 19:58:16 Tower kernel: ata10.00: failed to IDENTIFY (I/O error, err_mask=0x11) Aug 14 19:58:16 Tower kernel: ata10.00: revalidation failed (errno=-5) Aug 14 19:58:17 Tower kernel: mvsas 0000:01:00.0: Phy3 : No sig fis Aug 14 19:58:20 Tower kernel: sas: sas_form_port: phy3 belongs to port3 already(1)! Aug 14 19:58:21 Tower emhttp: cmd: /usr/local/emhttp/plugins/dynamix/scripts/tail_log syslog Aug 14 19:58:21 Tower kernel: ata10: hard resetting link Aug 14 19:58:22 Tower kernel: ata10.00: configured for UDMA/133 Aug 14 19:58:22 Tower kernel: ata10: EH complete Aug 14 19:58:22 Tower kernel: sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 5 tries: 1 Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_sas.c 1870:Release slot [f] tag[f], task [ffff8800871b0200]: Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 672:command active 03FF8000, slot [f]. Aug 14 19:58:22 Tower kernel: sas: sas_ata_task_done: SAS error 8a Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_sas.c 1870:Release slot [10] tag[10], task [ffff8800871b0700]: Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 672:command active 03FF0000, slot [10]. Aug 14 19:58:22 Tower kernel: sas: sas_ata_task_done: SAS error 8a Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_sas.c 1870:Release slot [11] tag[11], task [ffff8800871b0800]: Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 672:command active 03FE0000, slot [11]. Aug 14 19:58:22 Tower kernel: sas: sas_ata_task_done: SAS error 8a Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_sas.c 1870:Release slot [12] tag[12], task [ffff8800871b0d00]: Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 672:command active 03FC0000, slot [12]. Aug 14 19:58:22 Tower kernel: sas: sas_ata_task_done: SAS error 8a Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_sas.c 1870:Release slot [13] tag[13], task [ffff88006111e800]: Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 672:command active 03F80000, slot [13]. Aug 14 19:58:22 Tower kernel: sas: sas_ata_task_done: SAS error 8a Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_sas.c 1870:Release slot [14] tag[14], task [ffff88006111e700]: Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 672:command active 03F00000, slot [14]. Aug 14 19:58:22 Tower kernel: sas: sas_ata_task_done: SAS error 8a Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_sas.c 1870:Release slot [15] tag[15], task [ffff88006111e600]: Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 672:command active 03E00000, slot [15]. Aug 14 19:58:22 Tower kernel: sas: sas_ata_task_done: SAS error 8a Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_sas.c 1870:Release slot [16] tag[16], task [ffff88006111e900]: Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 672:command active 03C00000, slot [16]. Aug 14 19:58:22 Tower kernel: sas: sas_ata_task_done: SAS error 8a Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_sas.c 1870:Release slot [17] tag[17], task [ffff88006111e100]: Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 672:command active 03800000, slot [17]. Aug 14 19:58:22 Tower kernel: sas: sas_ata_task_done: SAS error 8a Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_sas.c 1870:Release slot [18] tag[18], task [ffff88006111e400]: Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 672:command active 03000000, slot [18]. Aug 14 19:58:22 Tower kernel: sas: sas_ata_task_done: SAS error 8a Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_sas.c 1870:Release slot [19] tag[19], task [ffff88006111e300]: Aug 14 19:58:22 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 672:command active 02000000, slot [19]. Aug 14 19:58:22 Tower kernel: sas: sas_ata_task_done: SAS error 8a Link to comment
BRiT Posted August 15, 2017 Share Posted August 15, 2017 You also have a stale browser session or other program open since you're getting a lot of wrong csrf_token errors in the log too. Aug 14 19:25:44 Tower root: error: webGui/include/DeviceList.php: wrong csrf_token Link to comment
KentBrockman Posted August 15, 2017 Author Share Posted August 15, 2017 1 hour ago, BRiT said: If you look earlier in the syslog you will be able to see which /dev/sd# device is mapped to which ata#. Thank you guys. Looks like I have a drive to replace. If Unraid knows this is related to a specific drive why aren't these errors reported in the disk log? Should I make this a feature request or is there a reason this is the way it is? Cheers Link to comment
JorgeB Posted August 15, 2017 Share Posted August 15, 2017 Run an extended SMART test on the disk before replacing it, SMART looks fine and the problem could be the SAS2LP, it's a common issue with them. Link to comment
KentBrockman Posted August 15, 2017 Author Share Posted August 15, 2017 Attached is the result of an extended Smart test. If it's not the drive, what do you suggest I do ? tower-smart-20170815-1733.zip Link to comment
Squid Posted August 16, 2017 Share Posted August 16, 2017 21 hours ago, BRiT said: If you look earlier in the syslog you will be able to see which /dev/sd# device is mapped to which ata#. Even easier is to click on the little drive icon next to each drive on the main tab. Brings up the log relating specifically to that drive. Link to comment
brando56894 Posted August 16, 2017 Share Posted August 16, 2017 5 hours ago, KentBrockman said: Attached is the result of an extended Smart test. If it's not the drive, what do you suggest I do ? tower-smart-20170815-1733.zip Try different cables and/or different SATA ports on your motherboard. If it has multiple SATA controllers, try it on each controller. Link to comment
JorgeB Posted August 16, 2017 Share Posted August 16, 2017 7 hours ago, KentBrockman said: If it's not the drive, what do you suggest I do ? If it's the SAS2LP the best solution would be to replace it with an LSI, but try that disk on a different controller first, your onboard controller is IDE only, a board with AHCI support with be much better. Link to comment
KentBrockman Posted August 19, 2017 Author Share Posted August 19, 2017 It looks like it might be the controller. My parity drive just red balled and is showing similar errors. An extended smart test seems to indicate the drive is ok. I had not heard of issues with the SAS2LP but it looks like the consensus is an LSI based card is the way to go. I am looking at this one: http://www.ebay.ca/itm/291641245650 Any last advice before I make the jump to the new controller? tower-diagnostics-20170819-1044.zip Link to comment
SSD Posted August 19, 2017 Share Posted August 19, 2017 You might look at this one ... http://www.ebay.ca/itm/IBM-LSI-SAS9201-8i-6Gbps-SAS-PCIe2-x8-Express-RAID-controller-card-LP-/162631925041?hash=item25dd9e3d31:g:9vsAAOSwdGFY0pJe It is a 9201-8i, which does not need to be flashed. It is also cheaper. This is the one I typically recommend. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.