[SOLVED] High Number of Parity Sync Errors on 2 consecutive parity checks


Recommended Posts

I am getting ready to upgrade my parity drive from 3TB to 8TB.  I found another thread that said to run a parity check or three to ensure there are no errors before swapping the parity drive with the new one.  So at this point I have precleared my new drive and I started a parity check with the old 3TB parity drive last night around 7pm.  Watching the check I could see the number of corrected errors climbing and when it completed this morning it had 148 errors corrected.  Figuring that I should run another check to ensure there no more errors, I started another check right after the first one completed.  I started it up and it's been running for about 2 hours and 45 minutes so far (about 36% done), and it already has corrected 146 errors.  I am pretty sure I had the box checked to correct errors, and I even looked at the syslog and the first few errors matched the previous parity check, but the subsequent errors appear to be on different sectors for the most part.  We did have a power blip on June 6th which caused the server to have an unclean shutdown, and on July 15th the server rebooted on it's own for some reason that I wasn't able to determine because I didn't have Nerd Tools installed to catch the actual reason, however it hasn't rebooted on it's own since then.  However following both of those shutdowns, the resulting parity check only had 1 or 2 errors.  Up until April, my monthly parity check had zero errors, but starting in April I started having 1 error on my monthly parity check.  I did also start seeing some UDMA CRC error counts increasing on one disk, but that was because I had my SATA cables tied together which I have since undone.

 

I'm attaching my diagnostics from this morning, while the second consecutive parity check is running.  Does anyone see anything that could be hinting at any possible issues or problems?  I am planning on running a third check after this one completes, but I just wanted to see if there is something majorly wrong here or if this seems normal to anyone.

unraid-diagnostics-20180801-0656.zip

Link to comment
3 hours ago, trurl said:

Have you done a memtest lately?

 

I have not.  Guess that should be the next test.  The second parity check completed and it found 326 sync errors.  I just started a third one and it's up to 30 sync errors already and we're only 10 minutes in.  Should I let this sync run and finish or should I cancel it before running the memtest?

Link to comment
1 minute ago, trurl said:

I don't think we're learning anything from another parity check that can't wait until after memtest.

 

Alright, thanks.  I've cancelled the currently running parity check.  I'm at work right now and won't be home for another 3.5 hours or so.  Once I get home I will start the memtest and will report back with any findings.  Thanks for the responses.

Link to comment

Seems the new 8TB disk was host by onboard SATA controller ( but in IDE mode) & another in add-on SiI controller.

Suggest stop current sync and try change to AHCI mode.

There are no need to wait for full sync, just sync it i.e got 3 corrected ( mark down the time and size), then stop / start and check error occur again or not. ( The aim was ensure root problem solve first)

Edited by Benson
Link to comment
2 minutes ago, Benson said:

Seems the new 8TB disk was host by onboard SATA controller ( but in IDE mode) & another in add-on SiI controller.

Suggest stop current sync and try chang it to AHCI.

There are no need to wait for full sync, just sync it i.e got 3 corrected ( mark down the time and size), then stop / start and check error occur again or not. ( The aim was ensure root problem solve first)

 

I added the Sil PCI controller and moved my cache drives to it because the motherboard only has 6 SATA ports.  I then plugged the new 8TB drive into one of the ports the cache drives were using on the motherboard.  The original 3TB parity drive is still connected to the same port on the motherboard that it was connected to previously.  I can change the 8TB drive from IDE to AHCI, although I don't think it will correct this issue since that drive is not part of the array yet.  Unless you're seeing something in the diagnostics that I'm missing?

Link to comment

OK, that means the 8TB still not in production. Does you got same problem ( parity check inconsistency ) in previous ? Or you never exam this ?

 

Or problem come from new add SiI controller. Anyway memtest can perform first.

 

BTW, any PCI controller should avoid, it will limit the array speed max ~100MB/s

Edited by Benson
Link to comment

So I didn't have memtest in my boot menu, and had to add it back.  Not sure if I removed it, but whatever.  It is running now and has gone through one iteration and passed.  I am going to let it run at least through the night (7:57pm EDT here now) and see what we get in the morning.  If it still looks good I will keep letting it run until tomorrow evening to see what we get.  If all looks good, I will check SMART on the drives per the FAQ in the unRAID wiki, but I will continue to update this thread.

Link to comment

So this morning at 8:34am, the memtest was still running, which had been 13+ hours since I started it.  It had gone through 5 iterations of the test and there were no errors.  Attached is a screenshot of the memtest.  I am going to let it run until tonight so it will have gone for 24 hours.

IMG_5567.JPG

Link to comment

So I let the memtest run for almost 48 hours, and there were no errors.  I also looked through the SMART reports, and did not see any errors there either.  The next thing according to the FAQs is to run reiserfsck on the data drives.  Is this still the case and is that truly my next action?

Edited by mlounsbury
Fix spelling
Link to comment

reiserfsck isn't going to have any effect on parity errors.

 

There were other users in the past that had recurring parity errors but they seemed to be repeatable, as in, the same blocks would still report errors even after a correcting parity check. I don't remember whether they ever resolved that or not.

 

Maybe go back to the posts of Benson above.

Link to comment

So I've changed my drives from IDE mode to AHCI, and I removed the PCI card, unplugged the 8TB drive and plugged the cache drives back into the motherboard.  So basically it's the configuration I had prior to making any changes, except with AHCI now.  I'm running a sync now, however it seems to be running slower than when I had IDE mode set.  It has found and corrected 35 errors in 4.5% of the sync so far.  Guess we'll see what we get out of this and then a second sync after it completes.  I am getting some UDMA CRC errors again on drive 1, so maybe the issue is with the SATA cables.  Maybe I just need to scrap all of my cables and get new ones.  Kind of at a loss since my system has been running for so long without any issues.

Link to comment

So the first parity check completed, and found and corrected 206 errors.  I ran a second one and it completed with no errors.  The UDMA CRC errors incremented in both checks, but it seems like that is the only issue, and only on disk 1.  I did have my SATA cables bundled prior to doing anything, and when I installed the 8TB drive, I unbundled them.  The UDMA CRC errors I was getting prior to doing anything were on disk 1, so it seems there is either a bad connection or the cable is going bad.

 

I am still kind of mystified why the sync errors went crazy when I installed the PCI SATA controller, since I had my cache drives connected to it.  Am I correct in thinking that cache drives aren't taken into account in a parity sync?  Maybe it was just the bad cable that was causing all of the issues and it presents itself differently depending on the hardware configuration?

Link to comment
  • mlounsbury changed the title to [SOLVED] High Number of Parity Sync Errors on 2 consecutive parity checks

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.