Jump to content

Troubleshooting parity sync errors


Recommended Posts

My monthly parity check is currently running, and it's reported 165 errors so far. It's the first time that I've ever had any errors reported, and there have been no shutdowns, clean or otherwise in several months.

 

Can anyone give me some pointers on how I go about evaluating the cause and severity of the issues as this is unfamiliar territory for me.

 

Thanks.

 

P.s. I'm happy to provide the diagnostics too if that helps.

Link to comment

For now only parity2 was out of sync, this suggests it's not a data disk issue or RAM, or both parity drives would be out of sync, I would start by replacing the Asmedia controller using port multipliers, it's generating some ATA errors and it's also not good for performance, and coincidentally or not, parity2 is connected there, if you use a recommended controller it should also improve performance, last check took almost 3 days which is a lot more than it could be, though you were using the array.

 

P.S. please disable mover logging unless needed to troubleshoot the mover, or it generates a lot of log spam.

Link to comment
Nov  2 22:03:59 Tower kernel: md: recovery thread: Q corrected, sector=7038653368
Nov  2 22:03:59 Tower kernel: md: recovery thread: Q corrected, sector=7038653376

Looks like you have correcting parity checks scheduled. It is recommend that you only correct parity when you have determined that it needs correcting. You don't want a problem with another disk to corrupt parity.

 

And you do have a problem with a disk

Nov  2 19:21:52 Tower kernel: ata5.01: failed command: WRITE FPDMA QUEUED
Nov  2 19:21:52 Tower kernel: ata5.01: cmd 61/20:a8:30:1d:ce/00:00:bc:02:00/40 tag 21 ncq dma 16384 out
Nov  2 19:21:52 Tower kernel:         res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Nov  2 19:21:52 Tower kernel: ata5.01: status: { DRDY }
Nov  2 19:21:53 Tower kernel: ata5.15: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Nov  2 19:21:53 Tower kernel: ata5.00: SATA link up 6.0 Gbps (SStatus 133 SControl 330)
Nov  2 19:21:53 Tower kernel: ata5.01: hard resetting link

Looks like a connection issue. Unfortunately I can't tell which disk it is because you haven't rebooted in a long time and that information has "rotated" off the syslogs included in diagnostics. Probably you have some older syslogs in /var/log (syslog.3, syslog.4...) that might tell.

 

You should disable mover logging unless you are trying to diagnose an issue with mover since those entries in syslog are not anonymized.

Link to comment
11 minutes ago, trurl said:

How can you tell?

lsscsi in the diags lists all the devices and you can crosscheck with the devices letters for example on the SMART reports, parity2 is sdh:

 

[6:0:0:0]    disk    ATA      WDC WD40EFRX-68W 0A80  /dev/sdg   /dev/sg7
  state=running queue_depth=32 scsi_level=6 type=0 device_blocked=0 timeout=30
  dir: /sys/bus/scsi/devices/6:0:0:0  [/sys/devices/pci0000:00/0000:00:1c.7/0000:08:00.0/ata5/host6/target6:0:0/6:0:0:0]
[6:1:0:0]    disk    ATA      WDC WD120EMFZ-11 0A81  /dev/sdh   /dev/sg8
  state=running queue_depth=32 scsi_level=6 type=0 device_blocked=1 timeout=30
  dir: /sys/bus/scsi/devices/6:1:0:0  [/sys/devices/pci0000:00/0000:00:1c.7/0000:08:00.0/ata5/host6/target6:1:0/6:1:0:0]
[6:2:0:0]    disk    ATA      WDC WD120EDAZ-11 0A81  /dev/sdi   /dev/sg9
  state=running queue_depth=32 scsi_level=6 type=0 device_blocked=0 timeout=30
  dir: /sys/bus/scsi/devices/6:2:0:0  [/sys/devices/pci0000:00/0000:00:1c.7/0000:08:00.0/ata5/host6/target6:2:0/6:2:0:0]
[7:0:0:0]    disk    ATA      WDC WD120EMFZ-11 0A81  /dev/sdj   /dev/sg10
  state=running queue_depth=32 scsi_level=6 type=0 device_blocked=0 timeout=30
  dir: /sys/bus/scsi/devices/7:0:0:0  [/sys/devices/pci0000:00/0000:00:1c.7/0000:08:00.0/ata6/host7/target7:0:0/7:0:0:0]

 

These are all the disks connected to the Asmedia controller, and parity2 shares the same SATA port (and multiplier) with two other disks, you can see that because they all are under ata5, the one with ata6 is using the other port.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...