tucansam Posted July 26, 2017 Share Posted July 26, 2017 Never seen errors like these before: Jul 25 06:54:49 ffs2 kernel: ata11.00: cmd 60/08:00:90:15:7b/00:00:d2:01:00/40 tag 3 ncq dma 4096 in Jul 25 06:54:49 ffs2 kernel: res 01/04:10:e8:7d:49/00:00:2f:03:00/40 Emask 0x3 (HSM violation) Jul 25 06:54:49 ffs2 kernel: ata11.00: status: { ERR } Jul 25 06:54:49 ffs2 kernel: ata11.00: error: { ABRT } Jul 25 06:54:49 ffs2 kernel: ata11: hard resetting link Jul 25 06:54:49 ffs2 kernel: sas: ata12: end_device-1:3: dev error handler Jul 25 06:54:49 ffs2 kernel: sas: ata13: end_device-1:4: dev error handler Jul 25 06:54:49 ffs2 kernel: sas: ata14: end_device-1:5: dev error handler Jul 25 06:54:49 ffs2 kernel: sas: ata15: end_device-1:6: dev error handler Jul 25 06:54:49 ffs2 kernel: sas: sas_ata_task_done: SAS error 8a Jul 25 06:54:49 ffs2 kernel: ata11.00: failed to IDENTIFY (I/O error, err_mask=0x11) Jul 25 06:54:49 ffs2 kernel: ata11.00: revalidation failed (errno=-5) Jul 25 06:54:50 ffs2 kernel: mvsas 0000:01:00.0: Phy3 : No sig fis Jul 25 06:54:54 ffs2 kernel: ata11: hard resetting link Jul 25 06:54:56 ffs2 kernel: mvsas 0000:01:00.0: Phy3 : No sig fis Jul 25 06:55:00 ffs2 kernel: ata11.00: qc timeout (cmd 0xec) Jul 25 06:55:00 ffs2 kernel: ata11.00: failed to IDENTIFY (I/O error, err_mask=0x5) Jul 25 06:55:00 ffs2 kernel: ata11.00: revalidation failed (errno=-5) Jul 25 06:55:00 ffs2 kernel: ata11: hard resetting link Jul 25 06:55:00 ffs2 kernel: check SRS 0 00000001. Jul 25 06:55:02 ffs2 kernel: drivers/scsi/mvsas/mv_sas.c 1435:mvs_I_T_nexus_reset for device[2]:rc= 0 Jul 25 06:55:02 ffs2 kernel: mvsas 0000:01:00.0: Phy3 : No sig fis Jul 25 06:55:08 ffs2 kernel: sas: sas_form_port: phy3 belongs to port2 already(1)! Jul 25 06:55:08 ffs2 kernel: ata11.00: configured for UDMA/133 Jul 25 06:55:08 ffs2 kernel: ata11: EH complete Jul 25 06:55:08 ffs2 kernel: sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 1 tries: 1 Jul 25 07:03:48 ffs2 kernel: mvsas 0000:01:00.0: Phy3 : No sig fis Jul 25 07:03:54 ffs2 kernel: mvsas 0000:01:00.0: Phy3 : No sig fis Jul 25 07:04:04 ffs2 kernel: XFS (md1): Metadata corruption detected at xfs_dir3_data_reada_verify+0x73/0x76, xfs_dir3_data_reada block 0x32f497da8 Jul 25 07:04:04 ffs2 kernel: XFS (md1): Unmount and run xfs_repair Jul 25 07:04:04 ffs2 kernel: XFS (md1): First 64 bytes of corrupted metadata buffer: Jul 25 07:04:04 ffs2 kernel: ffff880297d15000: 49 4e 00 00 03 02 00 00 00 00 00 63 00 00 00 64 IN.........c...d Jul 25 07:04:04 ffs2 kernel: ffff880297d15010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Jul 25 07:04:04 ffs2 kernel: ffff880297d15020: 58 86 fe 0e 0a 6b 02 7d 59 71 bf bd 24 0a d5 00 X....k.}Yq..$... Jul 25 07:04:04 ffs2 kernel: ffff880297d15030: 59 71 bf bd 24 0a d5 00 00 00 00 00 00 00 00 06 Yq..$........... Jul 25 07:04:06 ffs2 kernel: sas: sas_form_port: phy3 belongs to port2 already(1)! Jul 25 07:04:06 ffs2 kernel: XFS (md1): Metadata CRC error detected at xfs_dir3_block_read_verify+0xa1/0xa9, xfs_dir3_block block 0x32f497da8 Jul 25 07:04:06 ffs2 kernel: XFS (md1): Unmount and run xfs_repair Jul 25 07:04:06 ffs2 kernel: XFS (md1): First 64 bytes of corrupted metadata buffer: Jul 25 07:04:06 ffs2 kernel: ffff880297d15000: 46 00 00 00 96 63 d8 8c 02 d9 a8 40 d1 cd 2c 35 F....c.....@..,5 Jul 25 07:04:06 ffs2 kernel: ffff880297d15010: 30 38 03 98 18 bf fe 06 a0 01 b8 8a 68 9b 00 1d 08..........h... Jul 25 07:04:06 ffs2 kernel: ffff880297d15020: 18 3f 0f 80 06 f5 70 63 00 11 d0 0b 8d 18 bc 1c .?....pc........ Jul 25 07:04:06 ffs2 kernel: ffff880297d15030: 3a 01 94 02 fb 66 62 88 98 06 e0 6b ff fb 82 ba :....fb....k.... Jul 25 07:04:06 ffs2 kernel: XFS (md1): metadata I/O error: block 0x32f497da8 ("xfs_trans_read_buf_map") error 74 numblks 8 Jul 25 07:04:34 ffs2 kernel: sas: Enter sas_scsi_recover_host busy: 1 failed: 1 Jul 25 07:04:34 ffs2 kernel: sas: trying to find task 0xffff88026a123400 Jul 25 07:04:34 ffs2 kernel: sas: sas_scsi_find_task: aborting task 0xffff88026a123400 Jul 25 07:04:34 ffs2 kernel: sas: sas_scsi_find_task: task 0xffff88026a123400 is aborted Jul 25 07:04:34 ffs2 kernel: sas: sas_eh_handle_sas_errors: task 0xffff88026a123400 is aborted Jul 25 07:04:34 ffs2 kernel: sas: ata15: end_device-1:6: cmd error handler Jul 25 07:04:34 ffs2 kernel: sas: ata9: end_device-1:0: dev error handler Jul 25 07:04:34 ffs2 kernel: sas: ata10: end_device-1:1: dev error handler Jul 25 07:04:34 ffs2 kernel: sas: ata11: end_device-1:2: dev error handler Jul 25 07:04:34 ffs2 kernel: sas: ata12: end_device-1:3: dev error handler Jul 25 07:04:34 ffs2 kernel: sas: ata13: end_device-1:4: dev error handler Jul 25 07:04:34 ffs2 kernel: sas: ata14: end_device-1:5: dev error handler Jul 25 07:04:34 ffs2 kernel: sas: ata15: end_device-1:6: dev error handler Jul 25 07:04:34 ffs2 kernel: ata15.00: exception Emask 0x0 SAct 0x200 SErr 0x0 action 0x6 frozen Jul 25 07:04:34 ffs2 kernel: ata15.00: failed command: READ FPDMA QUEUED Jul 25 07:04:34 ffs2 kernel: ata15.00: cmd 60/00:00:20:cb:52/02:00:c3:01:00/40 tag 9 ncq dma 262144 in Jul 25 07:04:34 ffs2 kernel: res 40/00:ff:ff:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) Jul 25 07:04:34 ffs2 kernel: ata15.00: status: { DRDY } Jul 25 07:04:34 ffs2 kernel: ata15: hard resetting link Jul 25 07:04:35 ffs2 kernel: drivers/scsi/mvsas/mv_sas.c 1435:mvs_I_T_nexus_reset for device[6]:rc= 0 Jul 25 07:04:35 ffs2 kernel: sas: sas_ata_task_done: SAS error 8a Jul 25 07:04:35 ffs2 kernel: ata15.00: failed to IDENTIFY (I/O error, err_mask=0x11) Jul 25 07:04:35 ffs2 kernel: ata15.00: revalidation failed (errno=-5) Jul 25 07:04:36 ffs2 kernel: mvsas 0000:01:00.0: Phy7 : No sig fis Jul 25 07:04:40 ffs2 kernel: sas: sas_form_port: phy7 belongs to port6 already(1)! Jul 25 07:04:41 ffs2 kernel: ata15: hard resetting link Jul 25 07:04:41 ffs2 kernel: ata15.00: configured for UDMA/133 Jul 25 07:04:41 ffs2 kernel: ata15: EH complete Jul 25 07:04:41 ffs2 kernel: sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 1 tries: 1 Jul 25 13:56:50 ffs2 kernel: drivers/scsi/mvsas/mv_sas.c 1870:Release slot [4] tag[4], task [ffff8802b8363700]: Jul 25 13:56:50 ffs2 kernel: sas: sas_ata_task_done: SAS error 8a Jul 25 13:56:51 ffs2 kernel: sas: Enter sas_scsi_recover_host busy: 1 failed: 1 Jul 25 13:56:51 ffs2 kernel: sas: ata11: end_device-1:2: cmd error handler Jul 25 13:56:51 ffs2 kernel: sas: ata9: end_device-1:0: dev error handler Jul 25 13:56:51 ffs2 kernel: sas: ata10: end_device-1:1: dev error handler Jul 25 13:56:51 ffs2 kernel: sas: ata11: end_device-1:2: dev error handler Jul 25 13:56:51 ffs2 kernel: sas: ata12: end_device-1:3: dev error handler Jul 25 13:56:51 ffs2 kernel: ata11.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 Jul 25 13:56:51 ffs2 kernel: sas: ata13: end_device-1:4: dev error handler Jul 25 13:56:51 ffs2 kernel: ata11.00: failed command: SMART Jul 25 13:56:51 ffs2 kernel: sas: ata14: end_device-1:5: dev error handler Jul 25 13:56:51 ffs2 kernel: ata11.00: cmd b0/d0:01:00:4f:c2/00:00:00:00:00/00 tag 1 pio 512 in Jul 25 13:56:51 ffs2 kernel: res 01/04:81:82:00:00/00:00:00:00:00/40 Emask 0x12 (ATA bus error) Jul 25 13:56:51 ffs2 kernel: sas: ata15: end_device-1:6: dev error handler Jul 25 13:56:51 ffs2 kernel: ata11.00: status: { ERR } Jul 25 13:56:51 ffs2 kernel: ata11.00: error: { ABRT } Jul 25 13:56:51 ffs2 kernel: ata11: hard resetting link Jul 25 13:56:51 ffs2 kernel: sas: sas_ata_task_done: SAS error 8a Jul 25 13:56:51 ffs2 kernel: ata11.00: failed to IDENTIFY (I/O error, err_mask=0x11) Jul 25 13:56:51 ffs2 kernel: ata11.00: revalidation failed (errno=-5) Jul 25 13:56:52 ffs2 kernel: mvsas 0000:01:00.0: Phy3 : No sig fis Jul 25 13:56:56 ffs2 kernel: ata11: hard resetting link Jul 25 13:56:59 ffs2 kernel: mvsas 0000:01:00.0: Phy3 : No sig fis Jul 25 13:57:02 ffs2 kernel: ata11.00: qc timeout (cmd 0xec) Jul 25 13:57:02 ffs2 kernel: ata11.00: failed to IDENTIFY (I/O error, err_mask=0x5) Jul 25 13:57:02 ffs2 kernel: ata11.00: revalidation failed (errno=-5) Jul 25 13:57:02 ffs2 kernel: ata11: hard resetting link Jul 25 13:57:04 ffs2 kernel: drivers/scsi/mvsas/mv_sas.c 1435:mvs_I_T_nexus_reset for device[2]:rc= 0 Jul 25 13:57:04 ffs2 kernel: mvsas 0000:01:00.0: Phy3 : No sig fis Jul 25 13:57:09 ffs2 kernel: ata11.00: qc timeout (cmd 0x27) Jul 25 13:57:09 ffs2 kernel: ata11.00: failed to read native max address (err_mask=0x4) Jul 25 13:57:09 ffs2 kernel: ata11.00: HPA support seems broken, skipping HPA handling Jul 25 13:57:09 ffs2 kernel: ata11.00: revalidation failed (errno=-5) Jul 25 13:57:09 ffs2 kernel: ata11.00: disabled Jul 25 13:57:09 ffs2 kernel: ata11: hard resetting link Jul 25 13:57:11 ffs2 kernel: sas: sas_form_port: phy3 belongs to port2 already(1)! Jul 25 13:57:12 ffs2 kernel: drivers/scsi/mvsas/mv_sas.c 1435:mvs_I_T_nexus_reset for device[2]:rc= 0 Jul 25 13:57:12 ffs2 kernel: ata11: EH complete Jul 25 13:57:12 ffs2 kernel: sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 1 tries: 1 Jul 25 13:57:12 ffs2 kernel: sd 1:0:2:0: [sdn] tag#2 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00 Jul 25 13:57:12 ffs2 kernel: sd 1:0:2:0: [sdn] tag#2 CDB: opcode=0x88 88 00 00 00 00 01 d2 7b 15 90 00 00 00 08 00 00 Lots of meta data errors, various hard resetting links.... Array was running tip-top until now, now I have a red X'd disk and all this crazy stuff in the syslog. Link to comment
JorgeB Posted July 26, 2017 Share Posted July 26, 2017 Would prefer to see the complete diagnostics but those look like the typical SASLP/SAS2LP issues, you can try disabling VT-D or use the controller in a different slot bust best bet is to get an LSI controller. Link to comment
aspdend Posted July 26, 2017 Share Posted July 26, 2017 I would have a look at my thread and others - https://forums.lime-technology.com/topic/57384-sync-errors-on-parity-check/ That look exactly like the issues you are having I'm afraid Link to comment
tucansam Posted July 26, 2017 Author Share Posted July 26, 2017 Thanks to you both. I have an LSI on order. I guess my biggest curiosity is the randomness of the errors. My old server ran for over a year with that controller and never had any issues. The MB was too old to support VT-D, it wasn't even an option in the BIOS. On two other motherboards, VT-D was supported but disabled. Both those system (one of which is my present one giving me trouble) had all kinds of issues. Still, sometimes I would get two red balls in a month, and sometimes I would go six or more months with no issues at all. Trying to wrap my head around the exact cause, and why it doesn't happen with any regularity. Link to comment
aspdend Posted July 26, 2017 Share Posted July 26, 2017 Mine ran for a few years without issue until I updated my MB - although at the same time I updated the version of UnRaid I was running and I would presume that it is the version of UnRaid that is most likely the issue...I suppose the best part is that I have learned a lot more (thanks @Johnnie.black) and the new controller is significantly cheaper... Link to comment
JorgeB Posted July 26, 2017 Share Posted July 26, 2017 That's common, they ran fine on v5 and still run fine on v6 for some users, but any hardware or software change can trigger a issue, IMO they are like a ticking time bomb, though I'm still using two SAS2LP on a backup server without issues so far, had a problem once with a SASLP and retire them all. Link to comment
tucansam Posted July 26, 2017 Author Share Posted July 26, 2017 Had a rebuild of disk1 going over night. Woke up to Disk 1 emulated (again) and Disk 2 has 72,000,000 errors. Diags attached. ffs2-diagnostics-20170726-1128.zip Link to comment
tucansam Posted July 26, 2017 Author Share Posted July 26, 2017 I have enough ports on the MB and a spare (marvell.....) controller, I can run 100% of my unraid disks on non-Marvell ports, and a mostly unused scratch disk outside the array on a Marvell until my LSI arrives (Aug 2). Can I attempt to rebuild Disk 1? Is Disk 2 totally toast? From the command line I am getting i/o errors, tried to 'ls /mnt/disk2" to see what all data I fear I am going to lose. How bad is this? Stopped the array to power down and swap hardware, now Disk 2 is missing and Disk 1 is emulated. I assume this means I am going to lose data. Link to comment
JorgeB Posted July 26, 2017 Share Posted July 26, 2017 Reboot and disk2 should come online, but I'd wait for the LSI to attempt another rebuild (or do it if you can connect all disks without the SASLP) Link to comment
tucansam Posted July 26, 2017 Author Share Posted July 26, 2017 Thanks Johnnie. Disk 2 came back up and Disk 1 is rebuilding. All array devices are on my existing LSI and Asmedia/Intel ports on the MB. My spare scratch disk is on an old PCI Promise Tech SATA1 controller, but speed is not necessary on that drive. Array is rebuilding Disk 1, fingers are crossed. As an aside, are there any four-port PCIe SATA controllers that DON'T use the Marvell chipset? I have a spare server with 8 drives and the only other four-port controller I have is a Marvell, would very much like to stay away from this chipset, forever. Need something to add ports, could go with another LSI I suppose but seems like a waste as I have five ports on the MB, and this is a backup server that will come on once a month. Link to comment
JorgeB Posted July 26, 2017 Share Posted July 26, 2017 There are some 4 port LSI controllers but they usually cost almost the same as the 8 port version, other than that there's the older Adaptec 1430SA that still works well, it does use a Marvell controller but it uses a specific driver and AFAIK there's never been any issues with it and unRAID. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.