Jump to content

First parity errors on v6


tucansam

Recommended Posts

6.0-rc5

 

Have been running it for a while now.  1 Aug's automatic parity test resulted in 5 parity errors found.  I know this is minimal, but its the first time I've had errors on this system.

 

Syslog entries that piqued my interest:

 

--

 

Jul 27 12:32:27 ffs2 kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Jul 27 12:32:27 ffs2 kernel: ata7.00: failed command: IDENTIFY DEVICE

Jul 27 12:32:27 ffs2 kernel: ata7.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 19 pio 512 in

Jul 27 12:32:27 ffs2 kernel:        res 40/00:00:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)

Jul 27 12:32:27 ffs2 kernel: ata7.00: status: { DRDY }

Jul 27 12:32:27 ffs2 kernel: ata7: hard resetting link

Jul 27 12:32:27 ffs2 kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

Jul 27 12:32:27 ffs2 kernel: ata7.00: configured for UDMA/133

Jul 27 12:32:27 ffs2 kernel: ata7: EH complete

Jul 27 12:49:52 ffs2 kernel: mdcmd (446): spindown 3

Jul 27 13:44:23 ffs2 kernel: mdcmd (447): spindown 3

Jul 27 14:06:51 ffs2 kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Jul 27 14:06:51 ffs2 kernel: ata7.00: failed command: IDENTIFY DEVICE

Jul 27 14:06:51 ffs2 kernel: ata7.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 24 pio 512 in

Jul 27 14:06:51 ffs2 kernel:        res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

Jul 27 14:06:51 ffs2 kernel: ata7.00: status: { DRDY }

Jul 27 14:06:51 ffs2 kernel: ata7: hard resetting link

Jul 27 14:06:51 ffs2 kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

Jul 27 14:06:51 ffs2 kernel: ata7.00: configured for UDMA/133

Jul 27 14:06:51 ffs2 kernel: ata7: EH complete

 

 

Aug  1 00:00:01 ffs2 kernel: mdcmd (578): check NOCORRECT

Aug  1 00:00:01 ffs2 kernel:

Aug  1 00:00:01 ffs2 kernel: md: recovery thread woken up ...

Aug  1 00:00:01 ffs2 kernel: md: recovery thread checking parity...

Aug  1 00:00:01 ffs2 kernel: md: using 2048k window, over a total of 2930266532 blocks.

Aug  1 02:11:41 ffs2 kernel: md: parity incorrect, sector=1565565768

Aug  1 02:11:41 ffs2 kernel: md: parity incorrect, sector=1565565776

Aug  1 02:11:41 ffs2 kernel: md: parity incorrect, sector=1565565784

Aug  1 02:11:41 ffs2 kernel: md: parity incorrect, sector=1565565792

Aug  1 02:11:41 ffs2 kernel: md: parity incorrect, sector=1565565800

Aug  1 05:00:48 ffs2 kernel: ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Aug  1 05:00:48 ffs2 kernel: ata9.00: failed command: IDENTIFY DEVICE

Aug  1 05:00:48 ffs2 kernel: ata9.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 22 pio 512 in

Aug  1 05:00:48 ffs2 kernel:        res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

Aug  1 05:00:48 ffs2 kernel: ata9.00: status: { DRDY }

Aug  1 05:00:48 ffs2 kernel: ata9: hard resetting link

Aug  1 05:00:48 ffs2 kernel: ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300)

Aug  1 05:00:48 ffs2 kernel: ata9.00: configured for UDMA/133

Aug  1 05:00:48 ffs2 kernel: ata9: EH complete

Aug  1 06:52:14 ffs2 kernel: mdcmd (579): spindown 2

Aug  1 06:52:15 ffs2 kernel: mdcmd (580): spindown 4

Aug  1 06:52:15 ffs2 kernel: mdcmd (581): spindown 5

Aug  1 06:52:15 ffs2 kernel: mdcmd (582): spindown 7

Aug  1 06:58:37 ffs2 kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Aug  1 06:58:37 ffs2 kernel: ata7.00: failed command: IDENTIFY DEVICE

Aug  1 06:58:37 ffs2 kernel: ata7.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 6 pio 512 in

Aug  1 06:58:37 ffs2 kernel:        res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)

Aug  1 06:58:37 ffs2 kernel: ata7.00: status: { DRDY }

Aug  1 06:58:37 ffs2 kernel: ata7: hard resetting link

Aug  1 06:58:37 ffs2 kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

Aug  1 06:58:37 ffs2 kernel: ata7.00: configured for UDMA/133

Aug  1 06:58:37 ffs2 kernel: ata7: EH complete

Aug  1 10:04:54 ffs2 kernel: md: sync done. time=36292sec

Aug  1 10:04:54 ffs2 kernel: md: recovery thread sync completion status: 0

 

 

Aug  2 11:35:26 ffs2 kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Aug  2 11:35:26 ffs2 kernel: ata7.00: failed command: SMART

Aug  2 11:35:26 ffs2 kernel: ata7.00: cmd b0/d0:01:00:4f:c2/00:00:00:00:00/00 tag 1 pio 512 in

Aug  2 11:35:26 ffs2 kernel:        res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

Aug  2 11:35:26 ffs2 kernel: ata7.00: status: { DRDY }

Aug  2 11:35:26 ffs2 kernel: ata7: hard resetting link

Aug  2 11:35:26 ffs2 kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

Aug  2 11:35:26 ffs2 kernel: ata7.00: configured for UDMA/133

Aug  2 11:35:26 ffs2 kernel: ata7: EH complete

 

--

 

Not sure what to make of this.  Advice welcome. 

 

Thanks.

Link to comment

You shouldn't be running such an old rc , especially since the stable branch is already on 6.1.9

 

Also, the notion that only 5 parity errors is minimal is completely wrong. If you have any parity errors at all it can corrupt a data disk rebuild.

 

Syslog snippets are seldom sufficient. ;D Always post complete diagnostics zip.

 

Was this a correcting parity check? If so, you should run another to see if the parity errors have all been corrected.

Link to comment

You shouldn't be running such an old rc , especially since the stable branch is already on 6.1.9

 

Also, the notion that only 5 parity errors is minimal is completely wrong. If you have any parity errors at all it can corrupt a data disk rebuild.

 

Syslog snippets are seldom sufficient. ;D Always post complete diagnostics zip.

 

Was this a correcting parity check? If so, you should run another to see if the parity errors have all been corrected.

 

 

Literally the entire rest of the syslog was spindown entries.

Link to comment

The controllers seem to be losing communication with the disks and resetting the links. I'd power down and check the relevant SATA cables before doing another parity check.

 

I pulled the server apart last month to dispose of dust bunnies, I probably bumped something, good call.

 

Thanks to all.

 

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...