papnikol Posted September 1, 2015 Share Posted September 1, 2015 Hi everyone, I have a problem with my unRAID server. I started a parity check and noticed it found quite a few errrors (about 10 errors at 10%). So I stopped it and restarted. It still finds errors some in the same, some in different positions. The number of errors does not seem to get higher after every run. - I performed a memcheck but there does not seem to be any problem. - SMART status seems OK for all disks. I am starting to fear it might be a controller problem (I have an AOC-SAS2LP-MV8 on an Asus P5Q Deluxe Mobo) although I would think that i would have more errors. What else should I check in order to pinpoint the problem? Thanks for any help. PS1: I attached the results of Tools/diagnostics PS2: I notice that the syslog does not mention the parity errors, probably because I run it without writing corrections to parity disk. But here are the sector errors from 2 consecutive runs up to 10% I run some time ago (red font highlights same sector in both runs): 1ST RUN sector=227271416 sector=326803560 sector=870691376 sector=1254335696 sector=1635813392 sector=2133668016 sector=2361685768 sector=2571393240 sector=2628717368 sector=2763282288 sector=3294123952 sector=3680661280 sector=4450802440 sector=5136242464 sector=5705459328 sector=6185627984 sector=8193815688 sector=9479063848 sector=9653427048 sector=1050839046 2ND RUN sector=187728488 sector=227271416 sector=247795216 sector=326803560 sector=747245168 sector=870691376 sector=247795216 sector=747245168 sector=949971664 sector=978378680 sector=999114088 sector=1034802856 sector=1142471208 sector=1170450440 sector=1328714912 towerp-diagnostics-20150902-0015.zip Link to comment
dgaschk Posted September 10, 2015 Share Posted September 10, 2015 See here: http://lime-technology.com/wiki/index.php?title=FAQ#How_To_Troubleshoot_Recurring_Parity_Errors Link to comment
papnikol Posted September 12, 2015 Author Share Posted September 12, 2015 See here: http://lime-technology.com/wiki/index.php?title=FAQ#How_To_Troubleshoot_Recurring_Parity_Errors Thanks for the info, I had never seen this. I am trying it now, although I think it is not the perfect choice for my case, since errors, appear in different places of the HDDs. Link to comment
Joe L. Posted September 12, 2015 Share Posted September 12, 2015 See here: http://lime-technology.com/wiki/index.php?title=FAQ#How_To_Troubleshoot_Recurring_Parity_Errors Thanks for the info, I had never seen this. I am trying it now, although I think it is not the perfect choice for my case, since errors, appear in different places of the HDDs. If the errors are in different places each time, it is more likely to be a memory problem, disk controller problem, or a power supply problem. Very first thing to check is to run a memory test, preferably overnight (or at least several full passes). As often as not, a bad memory strip is the issue. Joe L. Link to comment
papnikol Posted September 20, 2015 Author Share Posted September 20, 2015 See here: http://lime-technology.com/wiki/index.php?title=FAQ#How_To_Troubleshoot_Recurring_Parity_Errors Thanks for the info, I had never seen this. I am trying it now, although I think it is not the perfect choice for my case, since errors, appear in different places of the HDDs. If the errors are in different places each time, it is more likely to be a memory problem, disk controller problem, or a power supply problem. Very first thing to check is to run a memory test, preferably overnight (or at least several full passes). As often as not, a bad memory strip is the issue. Joe L. It took me sometime, but I am back. Well, I tried a memory test (although unraid memtest allows only one pass, for some reason) and there were no errors. Just for good measure I changed back to an old SASLP-MV8 in place of a fairly recent SAS2LP-MV8 (the only extension card) and tried to run a non-correcting parity check. I let it get to around 7% twice and I still get errors but a very strange thing I noticed is that for the 2nd run, there are only 2 errors but they also happen to be the same with 2 out of 4 errors of the first run: run 1: Sep 20 17:34:51 towerP kernel: md: parity incorrect, sector=188020848 (Errors) Sep 20 17:49:47 towerP kernel: md: parity incorrect, sector=311953656 (Errors) Sep 20 17:54:02 towerP kernel: md: parity incorrect, sector=358760056 (Errors) Sep 20 17:59:23 towerP kernel: md: parity incorrect, sector=420290960 (Errors) run 2: Sep 20 19:01:26 towerP kernel: md: parity incorrect, sector=311953656 (Errors) Sep 20 19:10:50 towerP kernel: md: parity incorrect, sector=420290960 (Errors) This is REALLY strange, because if the reason of the problem were the RAM, the controller or the PSU, I would expect the errors to be erratic. UPDATE: I run the check for the 3rd time and the aforementioned 2 errors persist while other stochastic errors appear. I am thinking that whatever the error might be, I probably ran parity check once without disabling parity correct. This means that "wrong" corrections were written to the parity drive and are now found. Of course the problem of parity errors persists and I have yet to pinpoint the reason. Link to comment
dgaschk Posted September 20, 2015 Share Posted September 20, 2015 Memtest needs to run at least overnight. The longer the better. If it will only run once a RAM issue is indicated. Link to comment
papnikol Posted September 22, 2015 Author Share Posted September 22, 2015 Ok, I ran 9 passes of memtest with no errors (i wish it were the memory, the solution would have been easy). Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.