tucansam Posted August 4, 2016 Share Posted August 4, 2016 6.0-rc5 Have been running it for a while now. 1 Aug's automatic parity test resulted in 5 parity errors found. I know this is minimal, but its the first time I've had errors on this system. Syslog entries that piqued my interest: -- Jul 27 12:32:27 ffs2 kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Jul 27 12:32:27 ffs2 kernel: ata7.00: failed command: IDENTIFY DEVICE Jul 27 12:32:27 ffs2 kernel: ata7.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 19 pio 512 in Jul 27 12:32:27 ffs2 kernel: res 40/00:00:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) Jul 27 12:32:27 ffs2 kernel: ata7.00: status: { DRDY } Jul 27 12:32:27 ffs2 kernel: ata7: hard resetting link Jul 27 12:32:27 ffs2 kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Jul 27 12:32:27 ffs2 kernel: ata7.00: configured for UDMA/133 Jul 27 12:32:27 ffs2 kernel: ata7: EH complete Jul 27 12:49:52 ffs2 kernel: mdcmd (446): spindown 3 Jul 27 13:44:23 ffs2 kernel: mdcmd (447): spindown 3 Jul 27 14:06:51 ffs2 kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Jul 27 14:06:51 ffs2 kernel: ata7.00: failed command: IDENTIFY DEVICE Jul 27 14:06:51 ffs2 kernel: ata7.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 24 pio 512 in Jul 27 14:06:51 ffs2 kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jul 27 14:06:51 ffs2 kernel: ata7.00: status: { DRDY } Jul 27 14:06:51 ffs2 kernel: ata7: hard resetting link Jul 27 14:06:51 ffs2 kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Jul 27 14:06:51 ffs2 kernel: ata7.00: configured for UDMA/133 Jul 27 14:06:51 ffs2 kernel: ata7: EH complete Aug 1 00:00:01 ffs2 kernel: mdcmd (578): check NOCORRECT Aug 1 00:00:01 ffs2 kernel: Aug 1 00:00:01 ffs2 kernel: md: recovery thread woken up ... Aug 1 00:00:01 ffs2 kernel: md: recovery thread checking parity... Aug 1 00:00:01 ffs2 kernel: md: using 2048k window, over a total of 2930266532 blocks. Aug 1 02:11:41 ffs2 kernel: md: parity incorrect, sector=1565565768 Aug 1 02:11:41 ffs2 kernel: md: parity incorrect, sector=1565565776 Aug 1 02:11:41 ffs2 kernel: md: parity incorrect, sector=1565565784 Aug 1 02:11:41 ffs2 kernel: md: parity incorrect, sector=1565565792 Aug 1 02:11:41 ffs2 kernel: md: parity incorrect, sector=1565565800 Aug 1 05:00:48 ffs2 kernel: ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Aug 1 05:00:48 ffs2 kernel: ata9.00: failed command: IDENTIFY DEVICE Aug 1 05:00:48 ffs2 kernel: ata9.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 22 pio 512 in Aug 1 05:00:48 ffs2 kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Aug 1 05:00:48 ffs2 kernel: ata9.00: status: { DRDY } Aug 1 05:00:48 ffs2 kernel: ata9: hard resetting link Aug 1 05:00:48 ffs2 kernel: ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Aug 1 05:00:48 ffs2 kernel: ata9.00: configured for UDMA/133 Aug 1 05:00:48 ffs2 kernel: ata9: EH complete Aug 1 06:52:14 ffs2 kernel: mdcmd (579): spindown 2 Aug 1 06:52:15 ffs2 kernel: mdcmd (580): spindown 4 Aug 1 06:52:15 ffs2 kernel: mdcmd (581): spindown 5 Aug 1 06:52:15 ffs2 kernel: mdcmd (582): spindown 7 Aug 1 06:58:37 ffs2 kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Aug 1 06:58:37 ffs2 kernel: ata7.00: failed command: IDENTIFY DEVICE Aug 1 06:58:37 ffs2 kernel: ata7.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 6 pio 512 in Aug 1 06:58:37 ffs2 kernel: res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) Aug 1 06:58:37 ffs2 kernel: ata7.00: status: { DRDY } Aug 1 06:58:37 ffs2 kernel: ata7: hard resetting link Aug 1 06:58:37 ffs2 kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Aug 1 06:58:37 ffs2 kernel: ata7.00: configured for UDMA/133 Aug 1 06:58:37 ffs2 kernel: ata7: EH complete Aug 1 10:04:54 ffs2 kernel: md: sync done. time=36292sec Aug 1 10:04:54 ffs2 kernel: md: recovery thread sync completion status: 0 Aug 2 11:35:26 ffs2 kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Aug 2 11:35:26 ffs2 kernel: ata7.00: failed command: SMART Aug 2 11:35:26 ffs2 kernel: ata7.00: cmd b0/d0:01:00:4f:c2/00:00:00:00:00/00 tag 1 pio 512 in Aug 2 11:35:26 ffs2 kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Aug 2 11:35:26 ffs2 kernel: ata7.00: status: { DRDY } Aug 2 11:35:26 ffs2 kernel: ata7: hard resetting link Aug 2 11:35:26 ffs2 kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Aug 2 11:35:26 ffs2 kernel: ata7.00: configured for UDMA/133 Aug 2 11:35:26 ffs2 kernel: ata7: EH complete -- Not sure what to make of this. Advice welcome. Thanks. Link to comment
trurl Posted August 4, 2016 Share Posted August 4, 2016 You shouldn't be running such an old rc , especially since the stable branch is already on 6.1.9 Also, the notion that only 5 parity errors is minimal is completely wrong. If you have any parity errors at all it can corrupt a data disk rebuild. Syslog snippets are seldom sufficient. Always post complete diagnostics zip. Was this a correcting parity check? If so, you should run another to see if the parity errors have all been corrected. Link to comment
John_M Posted August 4, 2016 Share Posted August 4, 2016 The controllers seem to be losing communication with the disks and resetting the links. I'd power down and check the relevant SATA cables before doing another parity check. Link to comment
tucansam Posted August 5, 2016 Author Share Posted August 5, 2016 You shouldn't be running such an old rc , especially since the stable branch is already on 6.1.9 Also, the notion that only 5 parity errors is minimal is completely wrong. If you have any parity errors at all it can corrupt a data disk rebuild. Syslog snippets are seldom sufficient. Always post complete diagnostics zip. Was this a correcting parity check? If so, you should run another to see if the parity errors have all been corrected. Literally the entire rest of the syslog was spindown entries. Link to comment
tucansam Posted August 5, 2016 Author Share Posted August 5, 2016 The controllers seem to be losing communication with the disks and resetting the links. I'd power down and check the relevant SATA cables before doing another parity check. I pulled the server apart last month to dispose of dust bunnies, I probably bumped something, good call. Thanks to all. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.