(Mostly) Different errors in consecutive parity checks


papnikol

Recommended Posts

Hi everyone,

I have a problem with my unRAID server. I started a parity check and noticed it found quite a few errrors (about 10 errors at 10%). So I stopped it and restarted. It still finds errors some in the same, some in different positions. The number of errors does not seem to get higher after every run.

 

- I performed a memcheck but there does not seem to be any problem.

- SMART status seems OK for all disks.

 

I am starting to fear it might be a controller problem (I have an AOC-SAS2LP-MV8 on an Asus P5Q Deluxe Mobo) although I would think that i would have more errors.

What else should I check in order to pinpoint the problem?

 

Thanks for any help.

 

PS1: I attached the results of Tools/diagnostics

PS2: I notice that the  syslog does not mention the parity errors, probably because I run it without writing corrections to parity disk. But here are the sector errors from 2 consecutive  runs up to 10% I run some time ago (red font highlights same sector in both runs):

 

1ST RUN

sector=227271416

sector=326803560

sector=870691376

sector=1254335696

sector=1635813392

sector=2133668016

sector=2361685768

sector=2571393240

sector=2628717368

sector=2763282288

sector=3294123952

sector=3680661280

sector=4450802440

sector=5136242464

sector=5705459328

sector=6185627984

sector=8193815688

sector=9479063848

sector=9653427048

sector=1050839046

 

2ND RUN

sector=187728488

sector=227271416

sector=247795216

sector=326803560

sector=747245168

sector=870691376

sector=247795216

sector=747245168

sector=949971664

sector=978378680

sector=999114088

sector=1034802856

sector=1142471208

sector=1170450440

sector=1328714912

 

towerp-diagnostics-20150902-0015.zip

Link to comment
  • 2 weeks later...

 

Thanks for the info, I had never seen this.

 

I am trying it now, although I think it is not the perfect choice for my case, since errors,  appear in different places of the HDDs.

If the errors are in different places each time, it is more likely to be a memory problem, disk controller problem, or a power supply problem.

 

Very first thing to check is to run a memory test, preferably overnight (or at least several full passes).  As often as not, a bad memory strip is the issue.

 

Joe L.

Link to comment

 

Thanks for the info, I had never seen this.

 

I am trying it now, although I think it is not the perfect choice for my case, since errors,  appear in different places of the HDDs.

If the errors are in different places each time, it is more likely to be a memory problem, disk controller problem, or a power supply problem.

 

Very first thing to check is to run a memory test, preferably overnight (or at least several full passes).  As often as not, a bad memory strip is the issue.

 

Joe L.

 

It took me sometime, but I am back. Well, I tried a memory test (although unraid memtest allows only one pass, for some reason) and there were no errors. Just for good measure I changed back to an old SASLP-MV8 in place of a fairly recent SAS2LP-MV8 (the only extension card) and tried to run a non-correcting parity check. I let it get to around 7% twice and I still get errors but a very strange thing I noticed is that for the 2nd run, there are only 2 errors but they also happen to be the same with 2 out of 4 errors of the first run:

 

run 1:

Sep 20 17:34:51 towerP kernel: md: parity incorrect, sector=188020848 (Errors)

Sep 20 17:49:47 towerP kernel: md: parity incorrect, sector=311953656 (Errors)

Sep 20 17:54:02 towerP kernel: md: parity incorrect, sector=358760056 (Errors)

Sep 20 17:59:23 towerP kernel: md: parity incorrect, sector=420290960 (Errors)

 

run 2:

Sep 20 19:01:26 towerP kernel: md: parity incorrect, sector=311953656 (Errors)

Sep 20 19:10:50 towerP kernel: md: parity incorrect, sector=420290960 (Errors)

 

This is REALLY strange, because if the reason of the problem were the RAM, the controller or the PSU, I would expect the errors to be erratic.

 

UPDATE: I run the check for the 3rd time and the aforementioned 2 errors persist while other stochastic errors appear. I am thinking that whatever the error might be, I probably ran parity check once without disabling parity correct. This means that "wrong" corrections were written to the parity drive and are now found.

 

Of course the problem of parity errors persists and I have yet to pinpoint the reason.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.