July 7, 201015 yr I keep getting sync errors from parity check after one power outage. How do I get this fixed? syslog-2010-07-07.txt
July 8, 201015 yr I keep getting sync errors from parity check after one power outage. How do I get this fixed? your syslog is very interesting because, as outlined at below, parity check should fix all inconsistent parity once it finished without problem. however some sectors still come up as incorrect at following parity check. I think you might want to (a) Run memtest to make sure memory is fine. (b) Run Smart test to make sure all disks are fine © reboot your system then kill cache_dir and any add-on that might contribute disk I/O during parity check (d) Manually start parity check again see how it goes. (e) If problem persistent, re-assign parity disk to different sata port and start parity rebuild manually (you will NOT have parity protection) (f) If problem persistent, replace parity disk Jul 6 17:25:15 Tower kernel: md: parity incorrect: 72497704 Jul 6 17:53:50 Tower kernel: md: parity incorrect: 327194792 Jul 6 18:00:56 Tower kernel: md: parity incorrect: 390371112 Jul 6 18:33:12 Tower kernel: md: parity incorrect: 679134568 Jul 6 20:16:47 Tower kernel: md: parity incorrect: 1546485272 Jul 6 21:22:00 Tower kernel: md: sync done. time=14748sec rate=66230K/sec Jul 6 21:22:00 Tower kernel: md: recovery thread sync completion status: 0 Jul 6 22:30:58 Tower kernel: md: parity incorrect: 72497704 Jul 6 22:41:09 Tower kernel: md: parity incorrect: 163771768 Jul 6 22:58:58 Tower kernel: md: parity incorrect: 323455104 Jul 6 22:59:24 Tower kernel: md: parity incorrect: 327194792 Jul 6 23:06:27 Tower kernel: md: parity incorrect: 390371112 Jul 6 23:38:34 Tower kernel: md: parity incorrect: 679134568 Jul 7 01:22:09 Tower kernel: md: parity incorrect: 1546485272 Jul 7 02:27:31 Tower kernel: md: sync done. time=14672sec rate=66573K/sec Jul 7 02:27:31 Tower kernel: md: recovery thread sync completion status: 0 Jul 7 02:47:38 Tower kernel: md: parity incorrect: 163771768 Jul 7 03:05:27 Tower kernel: md: parity incorrect: 323455104 Jul 7 06:36:04 Tower kernel: md: sync done. time=14795sec rate=66019K/sec Jul 7 06:36:04 Tower kernel: md: recovery thread sync completion status: 0 Jul 7 10:18:39 Tower kernel: md: parity incorrect: 361838360 Jul 7 11:09:32 Tower kernel: md: parity incorrect: 800490048 Jul 7 11:13:21 Tower kernel: md: parity incorrect: 835349664 Jul 7 12:24:01 Tower kernel: md: parity incorrect: 1427268040 Jul 7 13:44:56 Tower kernel: md: sync done. time=15123sec rate=64587K/sec Jul 7 13:44:56 Tower kernel: md: recovery thread sync completion status: 0
July 8, 201015 yr GK20 offers good advice. Only thing I would add is that if sync errors persist, you don't know which disk(s) are bad - not necessarily Parity, could be any of them. What is your power supply model? Could also be the culprit.
July 8, 201015 yr Author Here's what I did since the power outage - Parity check returned numerous error - Performed more parity check, still returned errors - Lowest error ever recorded was 2 errors - Ran reiserfsck check on every drive, found no corruptions - Ran Smart test, nothing our of the ordinary - Reboots, unassigned parity, and rebuilt parity - Parity build completed - Recheck parity - Unraid Frozen...... I'm running reiserfsck to check all drives again, then I will kill cache_dir, remove from go script, unassign parity, and rebuild parity. This is not likely a psu issue. Previously I had issue with a 10EARS drive constantly giving me 2 parity sync errors. Replaced that and Ive never gotten any errors until now. Parity drive is 2 weeks old, no issues from smart report either. If it's hardware, I'd say it's the mobo or ram. Although I'd think it's the cache_dir causing troubles now that you mentioned it.
July 8, 201015 yr Cache_dirs can not cause trouble. It merely triggers existing trouble. If you stop cache_dirs, it does not make the trouble go away. Feel free to close your eyes and pretend the trouble doesn't exist though.
July 8, 201015 yr In the boot menu you have "Memtest86+". Chose that, and let it run for a few hours.
July 8, 201015 yr If it's hardware, I'd say it's the mobo or ram. Although I'd think it's the cache_dir causing troubles now that you mentioned it. cache_dir should NOT cause problem, it just slow down your parity check a little bit since this process is competing with parity check process in disk I/O. At this moment, you want to remove it to simplify your troubleshooting.
July 9, 201015 yr Oh boy coming late to the thread here.. It's times like this I see the value of the ustatdb/locate tools I'm working on. I'll put more effort into it this weekend. Side tracked with updating the powerdown package. What smart tests have you done? Short or Long? Are there pending sectors?
July 9, 201015 yr Author If it's hardware, I'd say it's the mobo or ram. Although I'd think it's the cache_dir causing troubles now that you mentioned it. cache_dir should NOT cause problem, it just slow down your parity check a little bit since this process is competing with parity check process in disk I/O. At this moment, you want to remove it to simplify your troubleshooting. I completely disabled cache_dir, rebooted. Recheck parity, twice, 0 errors.
July 9, 201015 yr [i completely disabled cache_dir, rebooted. Recheck parity, twice, 0 errors. If possible, for the benefit of many, can you (a) Put cache_dir back to GO script (b) Reboot system © manually restart parity check immediately once system is ready. I am still not convinced cache_dir would be an issue because (a) cache_dir should be a read-only process, it scan your disks in order to load data into memory. disk read will not conflict with parity calculation and comparison in parity check process. (b) Even by any chance cache_dir does write something into unRaid while parity check in progress, down to driver level, those disk writes should be serialized unless something is really broken. You can still write data to a RAID while parity check is in progress this is a basic design rule in RAID.
July 9, 201015 yr I completely disabled cache_dir, rebooted. Recheck parity, twice, 0 errors. None of your conclusions are very useful until you tell us that you ran memtest overnight without any errors.
July 9, 201015 yr Cache_dirs does not write anything. It does the equivalent of you listing the files in the directories. That just gets their directory date into the disk buffer cache, exactly as if you did a directory listing. Cache_dirs just repeats that process every few seconds, far more frequently than you would be listing the directories otherwise. This keeps their information more recently accessed than other data, so media players can get to it quicker, usually without spinning up a disk to read it directly from the physical disk. As everybody else has said, it will not cause intermittent parity errors. Those are an indication of some kind of hardware that is intermittent. Most likely suspect is the memory, but it could be the disk controller, or even one of the disks, potentially the power supply if it is marginal. It will be very difficult to find the issue, but you should be able to eventually determine by process of elimination the culprit. The very first suspect is memory (and easiest to test). If you can not pass multiple memory test passes, run overnight, then all bets are off. Joe L.
Archived
This topic is now archived and is closed to further replies.