October 7, 201312 yr Hi All - using v5, and am receiving puzzling results with parity checks. As can be seen in the syslog, I conducted three NOCORRECT parity checks - 1) 64 sync errors, 78 read errors, 2) no errors, and 3) 64 sync errors, 29 read errors. Errors were not shown in the same blocks. The parity drive is on a backplane that connects directly to the mobo and not a controller card. I checked and reseated all power and cables, and conducted a memtest without error. SMART test also doesn't show any critical errors. The array was not being written to durity the tests. At the moment, I swapped locations of the cache and partiy drive which are on the same backplane to see if I can get any repeatable errors. Could it be possible that the drive is dying even though SMART tests indicate otherwise? Any feedback would be appreciated. Thanks! syslog.txt smart.txt
October 8, 201312 yr Yes, of course it can. Do a correcting check and then another and see if the errors continue. Since the disk has moved this test will isolate the drive from the physical.
October 8, 201312 yr Author Hi dgaschk, thanks for the assist. Even after the swap, there were still sync errors with a non-correcting check. Following your advice, I went ahead and did a correcting parity check when I got home with disks in their same positions, and it completed it's first run without any sync errors! (?) Into my second correcting check now, and still no errors as of yet. Ran so dd scripts as well, and all the checksums matched. Shouldn't a CORRECT parity check have reported some sync errors being corrected? Is this a case of a dodgy disk, and should it be replaced? Thanks for helping to interpret these perplexing results.
October 8, 201312 yr Yes, if the disk was the problem there should have been errors reported as corrected. This indicates that drive is ok and the physical position has an issue, with the connection, cable, port, etc. The cache may show errors now. Put a data disk where cache drive is and do a parity check. If errors are reported, move the data disk back and do another check. The second check should have no errors. This will confirm that the issue is not a disk.
October 9, 201312 yr Author Hi dgaschk, second parity check in original positions completed with no errors. I swapped a data drive into the parity drive position, and conducted another parity check. That completed with no sync or read errors, and a second one is in process but there are still no errors. So now, I'm completely confused as to what caused the original errors in multiple parity checks with the parity drive. I presume that if bad cabling would also cause read errors in the data drives even if the array wasn't being written to?
October 9, 201312 yr Author It could also be related to faulty power supply or power cables. Hmm... I've considered that. But there were a consistent number of sync errors in the original runs. And if a faulty cable caused the read errors on the parity drive, those should have manifested themselves when I did the drive swap. And if the power supply was to blame, shouldn't problems have exhibited themselves in all these consecutive parity checks with all drives spinning?
October 9, 201312 yr Author Has there been a power failure? No, there hasn't been, and the server is also on a UPS.
October 21, 201312 yr Author Just a quick update on the situation. Last night, I completed two nocorrect parity checks (returned without any errors) before deciding to swap back the parity disk to its original position. I cleanly shutdown and stopped the array, put the parity disk back in its original slot, and started another nocorrect check. So far, I received 96 parity errors - the same number that I received when I originally started this thread but different sectors. There are also some read errors in the log and show up on the main status page. The strange thing is that I received no disk errors on the data disk that resided in the current parity disk slot when I conducted the swap. Things were OK when the disks were swapped, but the strange parity errors give me some pause as to whether the parity disk is failing although SMART reports do not show it.
Archived
This topic is now archived and is closed to further replies.