jk2587 Posted April 30, 2009 Share Posted April 30, 2009 I have been using an Abit IC7 MAX3 for a long time now and it has been stable, but slow. The last parity check was about a week ago and took around 23 hours. I just upgraded to an Asus P5Q WS board and after the first boot I started a parity check, which ran for a while then the server rebooted. I once again started a parity check. This time it completed but not before a few segfaults and a couple thousand lines in the syslog that looked like this: Apr 29 13:29:49 media kernel: ReiserFS: md9: warning: vs-5150: search_by_key: invalid format found in block 5696591. Fsck? Apr 29 13:29:49 media kernel: ReiserFS: warning: is_tree_node: node level 133 does not match to the expected one 1 The web gui is working fine, but the samba shares are not accessible over the network. I can still access the server via telnet though. I'm not really sure what the best way to troubleshoot this is, I suspect I need to run an fsck but am not really sure how to go about doing that. Any other suggestions? Asus P5Q WS Intel e7400 4GB OCZ RAM Corsair 750TX SUPERMICRO AOC-SAT2-MV8 12 mixed SATA disks Thanks for your help. Quote Link to comment
RobJ Posted April 30, 2009 Share Posted April 30, 2009 Wow! Take a look at your syslog, you won't have to go very far before you see some serious memory corruption! Scattered but clustered random letters have been modified. Although you will need to run reiserfsck later (see Check Disk File systems page), plus a parity check to correct any parity corruption, there is no point now, until you can completely trust your machine. Either the memory is bad or the timings are off. Check your memory specs and compare with the BIOS settings. I'm guessing you will need to adjust the memory settings in the BIOS to more conservative numbers. You may also need to adjust the voltage up or down for the memory. There is a chance your memory sticks are not supported for that board. Test it thoroughly, over night, with the MemTest on the unRAID boot menu. There is a possibility that this is not memory corruption, but bus problems, unstable. There's no evidence that you are overclocking, but something is really wrong, hopefully just a memory stick or the memory configuration. I note that in some cases, it looks like just a single bit has flipped. That would implicate a hardware issue. But in other cases, it looks more like a wild character pointer, writing to the wrong block. That would point to program bug somewhere, not hardware, but no one else has ever had this. I would under no conditions write to your data disks, including saving files, for now. They could be corrupted, and further file system corruption could also occur. JoeL is better at advice concerning checking and correcting your memory setup. It would also be good to hear from anyone else with that motherboard, as to what memory they are using, and what BIOS settings they found to work best. By the way, please indicate what memory you have installed. Quote Link to comment
jk2587 Posted April 30, 2009 Author Share Posted April 30, 2009 Thanks for the advice! I'll get the memory test running tomorrow morning and report back my findings. The RAM is OCZ PC2-6400 Value Series (OCZ2V8004GK) - it was cheap. I haven't overclocked anything but I never did bother to check if the voltages the motherboard supplied were ok with the ram. Guess I was a little too excited and didn't exactly follow the best practices, like running a memtest before starting everything up. Thanks again for your help here, I've been lurking around these forums for a while and am always amazed at how helpful people are. Quote Link to comment
Joe L. Posted April 30, 2009 Share Posted April 30, 2009 Thanks for the advice! I'll get the memory test running tomorrow morning and report back my findings. The RAM is OCZ PC2-6400 Value Series (OCZ2V8004GK) - it was cheap. I haven't overclocked anything but I never did bother to check if the voltages the motherboard supplied were ok with the ram. Guess I was a little too excited and didn't exactly follow the best practices, like running a memtest before starting everything up. Thanks again for your help here, I've been lurking around these forums for a while and am always amazed at how helpful people are. Not just voltage, but also timing and clock frequency need to be set properly. Let us know how the memory test works out. (Both before and AFTER you change any BIOS settings) Joe L. Quote Link to comment
jk2587 Posted April 30, 2009 Author Share Posted April 30, 2009 I had originally run 1 pass of the memtest before starting up unraid, which passed. I ran it again this morning and it was immediately full of errors. I guess that really goes to show the importance of running it overnight in the first place. Unfortunately I don't have time to make any changes right now, but when I get home from work tonight I've got some time to spend with it. Quote Link to comment
Joe L. Posted April 30, 2009 Share Posted April 30, 2009 I had originally run 1 pass of the memtest before starting up unraid, which passed. I ran it again this morning and it was immediately full of errors. I guess that really goes to show the importance of running it overnight in the first place. Unfortunately I don't have time to make any changes right now, but when I get home from work tonight I've got some time to spend with it. The BIOS setting might have changed, or memory has actually failed, or it worked better at the temperature you originally tested with... but glad to see it is probably going to be easy to fix. It might be a single bad ram strip, or it could be voltage/timing/clock speed set wrong. Joe L. Quote Link to comment
jk2587 Posted April 30, 2009 Author Share Posted April 30, 2009 So it turns out that the timings were incorrect, so I adjusted them. I still had errors, but was able to isolate them to one stick of ram. Anyways, I got new ram and am currently on the 4th pass of memtest and all is well so far. I'll let it run for a few more hours before booting unraid. I assume that I should run a parity check when I load up unraid, and see about which disk needs the fsck. Is there anything else I should be aware of? Thanks Quote Link to comment
RobJ Posted May 1, 2009 Share Posted May 1, 2009 I would run the reiserfsck on all of the data disks, but not the parity disk. See the Check Disk File systems page. Quote Link to comment
jk2587 Posted May 1, 2009 Author Share Posted May 1, 2009 It looks like the problems are resolved. I ran reiserfsck on all the data drives and there were two that had issues: disk 1 and the cache drive. Before I even knew there was a problem with the build I had transferred some data from the cache drive to disk1, so it was not a surprise to me that both of them would need fixing. Other than that, it looks like everything is running smoothly. I'm currently running the preclear script on 2 1.5 TB drives, I can't believe how much faster it is than my old system! I'll attach another syslog here, I'd really appreciate if RobJ could take a quick look to make sure everything is ok. I think it is, but I'm far from an expert when it comes to reading the logs. Thanks again for all the help, it made the process very painless and greatly minimized my chance of data loss. Quote Link to comment
RobJ Posted May 2, 2009 Share Posted May 2, 2009 Looks fine to me. I did not take a long look at the addon related stuff, as I don't use most of them. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.