quinchu Posted July 7, 2018 Share Posted July 7, 2018 Good morning, We had a strange failure on one of our boxes last night and can't pinpoint exactly what is going on. We have an array of 12x 4tb drives of assorted disks, two parity setup. After a reboot of the machine and a scheduled parity check of the system we now have millions of Sync errors. Drives show no sign of failures, no smart problems and nothing but healthy on the dashboard. Attaching the diagnostics for reference. Any assistance would be greatly appreciated! sjc-nas-diagnostics-20180706-2131.zip Quote Link to comment
quinchu Posted July 8, 2018 Author Share Posted July 8, 2018 Anyone have any ideas why there are so many sync errors? Quote Link to comment
trurl Posted July 8, 2018 Share Posted July 8, 2018 I haven't looked at the diagnostics, but typically if you have a very large number of parity sync errors, then it means you didn't actually have valid parity to begin with. Quote Link to comment
quinchu Posted July 8, 2018 Author Share Posted July 8, 2018 @Trurl, The specific node has been running for over 2 years on the same hardware without a problem. Scheduled parity Checks every sunday evening. This is the first time it's shown any kind of errors. Any other ideas? Quote Link to comment
pwm Posted July 9, 2018 Share Posted July 9, 2018 Maybe one drive or drive controller suddenly starting to feed garbage not matching the data used when parity was originally built. Or some software crash or rogue software that as resulted in a wild disk write directly to one of the physical disks, bypassing the parity calculation. /dev/md<id> are the device names that will catch writes and recompute parity. While writes to /dev/sd<id> will bypass parity. Quote Link to comment
quinchu Posted July 9, 2018 Author Share Posted July 9, 2018 Checked out the ram, all good there. Ran disk checks on all drives in the system and found some errors on drive #7, which were fixed. Ran two parity checks with zero errors after that. There was apparently a bad shutdown I was unaware of when someone pulled the power from the server. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.