Jump to content

200k Sync Errors Corrected Help


Recommended Posts

I've been using unraid for probably 15+ years if not longer. Never had a real problem. Mainly I let it do it's thing, meaning I don't know how to use it very well. Then last night a file I know was good was not playing correctly on VLC. So I hadn't done a parity check in almost a year.

 

Using 2 Parity Drives, 10 Disks, 40 TB, Unraid Pro Version 6.8.3

 

It's now 50% done and has more than 180,000 sync errors corrected. Should I be worried and if so what will he the consequences? And what should I do once the parity check is over?

Link to comment

Seems likely that you didn't actually have valid parity. No idea what you might have done in the past to get to that point.

 

  

2 hours ago, Cartierusm said:

maybe a bad or going bad HDD?

 

 

3 hours ago, ChatNoir said:

Your diagnostics might provide information ?

 

Link to comment

Sorry for being dim, I'm not a power user. I know very little about unraid. As I said over the last 15 years I've never had a problem.

 

So what should I expect when the parity check is over, rebuilt files or corrupt files? Or something new that we can't figure out because Stephen Hawking is among the stars now?

Link to comment
1 minute ago, Cartierusm said:

I'll figure out how to get diagnostics when the check is over. It has about an hour left.

No need to wait until then

 

If possible before rebooting and preferably with the array started
Go to Tools - Diagnostics and attach the complete Diagnostics ZIP file to your NEXT post in this thread.

Link to comment
21 minutes ago, Cartierusm said:

There is on Parity 1

Those are just CRC errors, they indicate a connection problem instead of a disk problem. You can acknowledge those by clicking on it in the Dashboard. It will warn you again if those increase.

 

I can see some connection issues for that disk in syslog back in January:

Jan 19 10:26:29 Tower kernel: ata6.00: ATA-9: WDC WD80EFAX-68KNBN0, VAKBJVVL, 81.00A81, max UDMA/133
Jan 19 18:06:38 Tower kernel: ata6.00: irq_stat 0x08000000, interface fatal error
Jan 19 18:06:38 Tower kernel: ata6: SError: { UnrecovData 10B8B BadCRC }
Jan 19 18:06:38 Tower kernel: ata6.00: failed command: READ DMA EXT
Jan 19 18:06:38 Tower kernel: ata6.00: cmd 25/00:00:a8:5d:e8/00:01:e8:00:00/e0 tag 27 dma 131072 in
Jan 19 18:06:38 Tower kernel:         res 50/00:00:a7:5d:e8/00:00:e8:00:00/40 Emask 0x10 (ATA bus error)
Jan 19 18:06:38 Tower kernel: ata6.00: status: { DRDY }
Jan 19 18:06:38 Tower kernel: ata6: hard resetting link

and more like that.

 

Also some connection issues on disk3 in March:

Jan 19 10:26:29 Tower kernel: ata10.00: ATA-10: WDC WD40EFRX-68N32N0,      WD-WCC7K7JKSNZC, 82.00A82, max UDMA/133
Mar 16 08:02:03 Tower kernel: ata10.00: exception Emask 0x10 SAct 0x0 SErr 0x400000 action 0x6 frozen
Mar 16 08:02:03 Tower kernel: ata10.00: irq_stat 0x08000000, interface fatal error
Mar 16 08:02:03 Tower kernel: ata10: SError: { Handshk }
Mar 16 08:02:03 Tower kernel: ata10.00: failed command: WRITE DMA EXT
Mar 16 08:02:03 Tower kernel: ata10.00: cmd 35/00:e8:30:e3:00/00:02:00:00:00/e0 tag 3 dma 380928 out
Mar 16 08:02:03 Tower kernel:         res 50/00:00:18:e6:00/00:00:00:00:00/e0 Emask 0x10 (ATA bus error)
Mar 16 08:02:03 Tower kernel: ata10.00: status: { DRDY }
Mar 16 08:02:03 Tower kernel: ata10: hard resetting link

and more like that.

 

I still think the most likely cause of your large number of parity errors (and it is correcting both parity) is that your parity wasn't valid to begin with. Did you ever do New Config and tell it to trust parity?

 

Another possibility would be bad RAM.

 

1 hour ago, Cartierusm said:

what should I expect when the parity check is over, rebuilt files or corrupt files?

Parity check doesn't do anything at all to any of your data disks, and in fact, your data disks seem to all be mountable and none disabled.

 

After this correcting parity check finishes correcting parity, and without rebooting, you should run a non-correcting parity check to confirm that you have no parity errors. Exactly zero sync errors is the only acceptable result, and until you get there you still have work to do. If it turns out after that check that you still have parity errors, we will want new diagnostics (without rebooting) so we can compare the parity checks in syslog.

 

Why is your disk3 ReiserFS?

 

 

Link to comment

What do you mean by connection issues, do you mean physically, like a bad cable? or something in the software?

 

Honestly don't remember, but I updated the whole system back in Jan 2020 where I changed out all my old disks, 1tbs and 2tbs, with new 4tb ones and two new 8tb paritites, so I'm assuming I built a new parity from scratch and trusted it.

 

Bad Ram, memtest? when I do all the other stuff you suggest?

 

Why is disc3 Paul Reiser FS? No idea. Should I change it? 

Link to comment
2 minutes ago, Cartierusm said:

do you mean physically, like a bad cable?

yes, or just a bad connection at the plug/connector. Could be SATA connection or power connection, either end, including splitters.

 

9 minutes ago, Cartierusm said:

don't remember

I guess it's not relevant at this point why your parity isn't valid.

 

We need a continuous syslog if we can get it, and reboot will reset that so don't reboot.

 

Finish this correcting parity check, post diagnostics when it is done.

 

Then start a non-correcting parity check so we can see if you still have parity errors.

 

10 minutes ago, Cartierusm said:

Why is disc3 Paul Reiser FS? No idea. Should I change it?

You would have to format the disk to change it. We can deal with that later.

Link to comment

Sweet. I appreciate the help!! I'll do the rest when the check is over.

 

As far as the bad connection, I'm using one of those 20 bay enclosure units with all the cables hooked to 4 or so break out boards, I keep the parities in the same area on the same break out board. So once this is all done should I move that parity to a different location? How can you tell if a connection is kind of bad or test whether it's stable?

Link to comment

Ok second parity check done. Zero errors. See final attached diagnostic report.

 

Couple questions still:

1. You said, "Parity check doesn't do anything at all to any of your data disks, and in fact, your data disks seem to all be mountable and none disabled." so does that mean during a self correcting parity check it corrects my parities and not the data disks? Meaning, that the couple of files I found that don't play correctly, corrupted in some way, but still kind of playable, that's a result of something else?

 

2. You gave me lots of tasks. Now that I've got a zero sum parity check what do I want to do next? You mention did I tell it to trust the parity or have a valid parity? What do I do about that, or did I do that already by running the two parity checks?

 

3. Do we investigate further about the 183,337 parity errors I got the first time? Bad HDD? Will I notice any corrupted files?

 

4. What should we do about disc3 Paul Reiser FS?

 

Thanks a bunch!!

 

tower-diagnostics-20210404-1623.zip

Link to comment
39 minutes ago, Cartierusm said:

Meaning, that the couple of files I found that don't play correctly, corrupted in some way, but still kind of playable, that's a result of something else?

 

parity check can't fix that since it doesn't change any of your data.

 

46 minutes ago, Cartierusm said:

valid parity? What do I do about that, or did I do that already by running the two parity checks?

 

 

yes already done

 

47 minutes ago, Cartierusm said:

investigate further about the 183,337 parity errors I got the first time?

No clues can be found about that. As I said, I suspect you did something that invalidated parity, such as setting a new disk configuration and telling it that parity was already valid when it really needed to rebuild parity.

 

49 minutes ago, Cartierusm said:

disc3 Paul Reiser FS

 

ReiserFS is included in Unraid for backwards compatibility with older versions of Unraid. It isn't named for Paul Reiser, it is named for this murderer:

 

https://en.wikipedia.org/wiki/Hans_Reiser

 

Changing that disk to XFS like your other disks will format the disk, so you would have to copy its data elsewhere.

 

If you plan to really start using dockers you might increase docker.img to 20G. I often see people with that much larger than needed, sometimes because they have misconfigured an application so that it writes its data into the docker.img. But in your case you have it smaller than recommended if you want to run a lot of dockers.

Link to comment
8 minutes ago, trurl said:

I suspect you did something that invalidated parity

Another possibility is you actually accumulated that many parity errors by never correcting them and having a lot of unclean shutdowns over the years. Unclean shutdowns could also explain corrupt files.

Link to comment

Umm, cough. Was it really named after him? I didn't read the wiki, that dude was too creepy looking.

 

Will move data and reformat, then rebuild parity? Or does it do that on the fly when reformatting the drive?

 

The only thing I have in dockers is the binhex krusader which I just use every once in a while to move files between disks which I'll do when moving disk 3 to reformat.

 

Hardly ever do hard shutdown and have a battery backup. Might have lost power and battery backup once or twice in the past year.

 

Again, thanks for the help.

 

 

So I take it I can update unraid, it's telling me 6.9.1 is available and then I can restart?

Link to comment
33 minutes ago, Cartierusm said:

Was it really named after him?

No, he wrote it.

34 minutes ago, Cartierusm said:

Will move data and reformat, then rebuild parity? Or does it do that on the fly when reformatting the drive?

Parity is updated for all writes, including formatting to a new file system. I'd copy the data instead of moving it, much faster, and you can verify the copy is complete and intact before you format the drive.

 

Before you update you may want to run for a little while to be sure everything is stable. It's not a good idea to make too many changes at once.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...