August 26, 201312 yr I recently had a drive die, replaced it, and rebuilt the array. Now, not 24 hours later, I'm running a non-correcting parity check as a sanity/safety check before migrating to v5, but I'm getting four parity errors. These occur very early in the scan - literally within seconds of starting it - which makes me suspect the affected areas may be in the system data not my user data. I'm also puzzled where they came from, as the array was reconstructed from the parity just only yesterday. Is there a how-to that can help me identify the affected files given the addresses reported in the log? If I determine that the affected file(s) is corrupt, how does one force a rebuild of the data from the parity instead of the default vice versa? Aug 26 13:02:59 Tower kernel: md: parity incorrect: 18144 Aug 26 13:02:59 Tower kernel: md: parity incorrect: 18152 Aug 26 13:02:59 Tower kernel: md: parity incorrect: 18168 Aug 26 13:02:59 Tower kernel: md: parity incorrect: 18176
August 26, 201312 yr There's apparently a known bug in v4.7 that can cause this behavior [see the last few posts here: http://lime-technology.com/forum/index.php?topic=29017.msg259721#msg259721 ] If the disks don't show any errors in the "error" column, I'd simply run a correcting parity check; then run a 2nd one to confirm zero errors -- and then do your v5 upgrade.
August 26, 201312 yr Author Gary, this is a known issue with 4.7, but it has never been shown to cause data loss. Apparently the rebuild process on 4.7 doesn't keep parity in sync while writing to some non-data area of the drive, and almost always causes a few sync errors at the beginning of the run. That sounds like it exactly. I'll run a couple of regular parity checks - one to fix and one to verify - before moving on. Thanks, Gary.
Archived
This topic is now archived and is closed to further replies.