July 7, 20187 yr Good morning, We had a strange failure on one of our boxes last night and can't pinpoint exactly what is going on. We have an array of 12x 4tb drives of assorted disks, two parity setup. After a reboot of the machine and a scheduled parity check of the system we now have millions of Sync errors. Drives show no sign of failures, no smart problems and nothing but healthy on the dashboard. Attaching the diagnostics for reference. Any assistance would be greatly appreciated! sjc-nas-diagnostics-20180706-2131.zip
July 8, 20187 yr Community Expert I haven't looked at the diagnostics, but typically if you have a very large number of parity sync errors, then it means you didn't actually have valid parity to begin with.
July 8, 20187 yr Author @Trurl, The specific node has been running for over 2 years on the same hardware without a problem. Scheduled parity Checks every sunday evening. This is the first time it's shown any kind of errors. Any other ideas?
July 9, 20187 yr Maybe one drive or drive controller suddenly starting to feed garbage not matching the data used when parity was originally built. Or some software crash or rogue software that as resulted in a wild disk write directly to one of the physical disks, bypassing the parity calculation. /dev/md<id> are the device names that will catch writes and recompute parity. While writes to /dev/sd<id> will bypass parity.
July 9, 20187 yr Author Checked out the ram, all good there. Ran disk checks on all drives in the system and found some errors on drive #7, which were fixed. Ran two parity checks with zero errors after that. There was apparently a bad shutdown I was unaware of when someone pulled the power from the server.
Archived
This topic is now archived and is closed to further replies.