burnaby_boy Posted May 2, 2011 Share Posted May 2, 2011 Greetings, I ran a parity check last night and it finished with 20 errors. I started another check, assuming that the errors would have been fixed during the first check. However, I'm halfway through the second parity check and 20 errors have popped up again. I am running unRaid 4.7 and my syslog is attached. I wondered if it might have be caused by the automatic movement of 15 files from the cache drive to the array in the middle of the check. Any ideas as to what might be going on? Cheers syslog.txt Link to comment
Joe L. Posted May 2, 2011 Share Posted May 2, 2011 Greetings, I ran a parity check last night and it finished with 20 errors. I started another check, assuming that the errors would have been fixed during the first check. However, I'm halfway through the second parity check and 20 errors have popped up again. I am running unRaid 4.7 and my syslog is attached. I wondered if it might have be caused by the automatic movement of 15 files from the cache drive to the array in the middle of the check. Any ideas as to what might be going on? Cheers File movement has absolutely nothing to do with parity errors. Parity is maintained while the moves are made. Are you sure you are not doing NOCORRECT parity checks, where it detects, but does not correct the errors. (The web-management interface ALWAYS says it corrected them even when in NOCORRCT mode. It never got updated to where it says something different) Sure looks like you are doing parity checks in CORRECT mode. I'd say you have either bad memory, or memory set up with the wrong voltage. timing, or clock speed, or a bad motherboard chipset, or a bad disk drive returning random data, or possible a bad power supply. Start with smart reports on all your disks, since bad sectors could cause this class of error, then a memory test, preferably overnight, then you can go about isolating the cause if those are not the issue. Unfortunately, when you have this class of error, it could be almost anything. Link to comment
burnaby_boy Posted May 2, 2011 Author Share Posted May 2, 2011 Hi Joe, Thanks for your input. The only thing that has changed in the array recently in the addition of a second sata card, an Adaptec 1430SA, that I installed yesterday. I put it in the second pci-e x16 slot. In the first slot I have a Supermicro AOC-SASLP-MV8 that has been in the server for the past year. I'll remove the Adaptec and run another check. Could that be the cause? Prior to that I have had few issues with the server since I built it last year. Thanks again. Link to comment
Joe L. Posted May 2, 2011 Share Posted May 2, 2011 Hi Joe, Thanks for your input. The only thing that has changed in the array recently in the addition of a second sata card, an Adaptec 1430SA, that I installed yesterday. I put it in the second pci-e x16 slot. In the first slot I have a Supermicro AOC-SASLP-MV8 that has been in the server for the past year. I'll remove the Adaptec and run another check. Could that be the cause? Prior to that I have had few issues with the server since I built it last year. Thanks again. Looking closer, since the addresses of the parity errors are the same, it is possible the first flipped some bits, and the second flipped them back. A third parity check might be in order, but this time do it in the nocorrect mode. Joe L. Link to comment
burnaby_boy Posted May 2, 2011 Author Share Posted May 2, 2011 How do I run the check in nocorrect mode? I'm using the default interface and I don't see where to set that. Thanks Link to comment
burnaby_boy Posted May 2, 2011 Author Share Posted May 2, 2011 I did smart reports on all the disks, and all reported no errors except for disk 9 that had a UDMA_CRC_ERROR_COUNT of 10. Does that have to do with cable issues? smart_sdd.txt Link to comment
Joe L. Posted May 3, 2011 Share Posted May 3, 2011 How do I run the check in nocorrect mode? I'm using the default interface and I don't see where to set that. Thanks in your version of unRAID, you must do it on the command line. (or from the button in unMENU's array management page) Log in and type /root/mdcmd check NOCORRECT Link to comment
burnaby_boy Posted May 3, 2011 Author Share Posted May 3, 2011 Thanks, Joe. Yes I found it, and it's about halfway through the check. So far no errors. If the check completes without errors, is everything then OK with the array? Cheers Link to comment
burnaby_boy Posted May 3, 2011 Author Share Posted May 3, 2011 Hi Joe, In nocorrect mode, the parity check completed with no sync errors. Does that mean that the problem is solved, or do I need to do further tests? Cheers Link to comment
dgaschk Posted May 3, 2011 Share Posted May 3, 2011 Run one more regular correcting check to make sure. Link to comment
Joe L. Posted May 3, 2011 Share Posted May 3, 2011 Run one more regular correcting check to make sure. That is BAD advice. If there is a problem with one of your disks or hardware it would write BAD information to the parity disk. (if your system was working properly, it would do no harm at all) The reason the NOCORRECT parity check exists is because we unRAID users requested it to assist in our tests in exactly the situation you had, random parity errors. The more correct advice would be to run several more NOCORRECT parity checks. If they all are without error, then you are fine. If they detect anything, at least parity is correct (as best as it is right now) and you can still use it in the event another disk failed. Link to comment
burnaby_boy Posted May 3, 2011 Author Share Posted May 3, 2011 Thanks, Joe. I've started another NOCORRECT parity check. Cheers Link to comment
burnaby_boy Posted May 4, 2011 Author Share Posted May 4, 2011 Hi Joe, I have run 3 successive NOCORRECT parity checks with no errors. Is there any need to run a regular parity check before resuming use of the array and adding new, precleared drives? Link to comment
Joe L. Posted May 4, 2011 Share Posted May 4, 2011 Hi Joe, I have run 3 successive NOCORRECT parity checks with no errors. Is there any need to run a regular parity check before resuming use of the array and adding new, precleared drives? no need. Link to comment
burnaby_boy Posted May 4, 2011 Author Share Posted May 4, 2011 Thanks so much for your help with this issue, Joe. Very much appreciated! Link to comment
lionelhutz Posted May 4, 2011 Share Posted May 4, 2011 A lot of members here will never normally run correcting parity checks. I always run nocorrect checks. I don't want the parity drive changing if a data drive is acting up and feeding bad info to the OS. Peter Link to comment
dgaschk Posted May 4, 2011 Share Posted May 4, 2011 I know my advise was wrong before but now that your confident in the system wouldn't it be prudent to run a correcting check? Just to come full circle. That is what revealed the problem in the first place and to be absolutely sure that it has been resolved a successful completion of a correcting test is required. Link to comment
lionelhutz Posted May 5, 2011 Share Posted May 5, 2011 I know my advise was wrong before but now that your confident in the system wouldn't it be prudent to run a correcting check? Just to come full circle. That is what revealed the problem in the first place and to be absolutely sure that it has been resolved a successful completion of a correcting test is required. A correcting check will not find errors that a nocorrect check will miss. It makes no difference as long as there are no parity errors when done. Peter Link to comment
dgaschk Posted May 5, 2011 Share Posted May 5, 2011 I know my advise was wrong before but now that your confident in the system wouldn't it be prudent to run a correcting check? Just to come full circle. That is what revealed the problem in the first place and to be absolutely sure that it has been resolved a successful completion of a correcting test is required. A correcting check will not find errors that a nocorrect check will miss. It makes no difference as long as there are no parity errors when done. Peter This is true in theory; however, a correcting parity check not fixing parity correctly is the bases for this thread. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.