Jump to content
Pjhal

Parity check finished (521769 errors)

12 posts in this topic Last Reply

Recommended Posts

Something is very wrong with my system.

 

I am using five:  8TB 5400 rpm  HDDs they are now installed in two backplanes with  4  SATA slots  in a Silverstone C381 case.
Using a Silverstone branded: Mini-SAS HD SFF8643 cable this then
connects to a Lsi/broadcom SAS 9300-8i Host Bus Adapter, that is inserted in a
Asrock Rack X470D4U motherboard pcie slot.

My CPU is a 3700x and i am running a single stick of  16 GB ECC (unregistered), set to auto (ECC on)  . (want to run more then 1 ram stick, but it hasn't been delivered yet).

The system has had issues previously including Unraid reporting 8.900.000 errors on the  parity drive.

No important data is stored on the system. I'm just starting out with new hardware and am new to Unraid.

Have had multiple XFS corruptions so far. I suspect one of my new devices is broken but i don't know how to narrow this down properly. 

 

tower-diagnostics-20191202-1526.zip

Edited by Pjhal
clean up

Share this post


Link to post

I remember reading that kernel on current Unraid, even with v6.8rc, doesn't support ECC with Ryzen3, and sync errors with frequent filesystem corruption makes me suspicious of RAM, start by running a memtest.

 

Another good test is to run a couple of non correct parity checks, if the errors are not the same it's again likely bad RAM.

Share this post


Link to post
4 hours ago, johnnie.black said:

I remember reading that kernel on current Unraid, even with v6.8rc, doesn't support ECC with Ryzen3, and sync errors with frequent filesystem corruption makes me suspicious of RAM, start by running a memtest.

 

Another good test is to run a couple of non correct parity checks, if the errors are not the same it's again likely bad RAM.

Thank, you very much for your advice! I am running Unraid v6.7.2. btw, forgot to mention that.

Currently letting  memtest86 run.

So far no errors, but i will probably  just let it run over night.

2019-12-02.20.52.memtest86.CaptureScreen.jpeg

Edited by Pjhal

Share this post


Link to post
19 hours ago, johnnie.black said:

I remember reading that kernel on current Unraid, even with v6.8rc, doesn't support ECC with Ryzen3, and sync errors with frequent filesystem corruption makes me suspicious of RAM, start by running a memtest.

 

Another good test is to run a couple of non correct parity checks, if the errors are not the same it's again likely bad RAM.

Oke, so i ran memtest86 for almost 16 hours on my single stick of 16 GB ECC unregistered ram. And after almost 16 hours not even a single error. 

In the same period had the system been doing party, it would have had hundreds of thousands of errors if not millions.

Can ram now be ruled out ?

 

2019-12-03.11.18.memtest86.CaptureScreen.jpeg

Share this post


Link to post
1 minute ago, Pjhal said:

Can ram now be ruled out ?

Not yet, there have been Ryzen users before with overclocked memory that would give sync errors despite memtest not detecting anything.

 

Run a couple of consecutive non correct parity check and compare the errors, or better yet post diags after they finish.

Share this post


Link to post
5 minutes ago, johnnie.black said:

Not yet, there have been Ryzen users before with overclocked memory that would give sync errors despite memtest not detecting anything.

 

Run a couple of consecutive non correct parity check and compare the errors, or better yet post diags after they finish.

Thank you, am running the test right now. My memory is running at default settings btw. How will we know, that its ram and not say a SAS cable, HBA or the back plane? 

Wen i had the 8.900.000 errors, 4 discs where in the right back plane 1 was external via USB. I moved  the discs into the left back plane and left the SAS cables ( meaning now connected to a different SAS cable, back plane and port on the HBA and it recovered to zero errors.)

I have put the external disc into the right back plane now.

And  i did get the 500.000 errors on a rebuild after that.

So i am worried that it might be the right back plane, SAS cable or that part (port?) of the HBA .

Share this post


Link to post

Possibly parity is just invalid and needs syncing, this can also be confirmed by the parity checks, if the errors are exactly the same on both runs.

Share this post


Link to post
On 12/3/2019 at 1:06 PM, johnnie.black said:

Possibly parity is just invalid and needs syncing, this can also be confirmed by the parity checks, if the errors are exactly the same on both runs.

So i completed  one parity check with the ''Write corrections to parity'' box unchecked. And no errors!

Which is good but also scary because i still don't understand where old the errors came from.

Should i run it again ?

tower-diagnostics-20191204-1405.Anonymized.zip

Share this post


Link to post

The previous check was also correct, do another one, if errors are still zero then most likely parity was just invalid during first check.

 

Share this post


Link to post

Should be fine, a few sync errors are normal after an unclean shutdown, other than that every time there should be 0 errors, see how it goes for the next checks.

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.