December 16, 200817 yr I seem to be having a problem where the array is online, it says the parity is valid, and I run a parity check that returns many errors. Now, I read that this check is supposed to fix these errors, but when I reboot and do another parity check, there are still plenty of errors. I have 3x 1.5TB drives. All have been checked with SeaTools and SpinRite and are good to go, so I don't think it is my hardware. I don't see how data is being changed that invalidates the parity drive. Sequence: -Boot unraid -Run parity check (errors) -Stop array -Restart -Run parity check (errors) I'm really leaning towards unraid as a NAS solution, but this is somewhat of a roadblock Ikes, forgot to say I'm running version 4.4 (but there seems to be no forum for it yet?). I'm going to repeat this process and post a syslog if that helps.
December 16, 200817 yr I seem to be having a problem where the array is online, it says the parity is valid, and I run a parity check that returns many errors. Now, I read that this check is supposed to fix these errors, but when I reboot and do another parity check, there are still plenty of errors. I have 3x 1.5TB drives. All have been checked with SeaTools and SpinRite and are good to go, so I don't think it is my hardware. I don't see how data is being changed that invalidates the parity drive. Sequence: -Boot unraid -Run parity check (errors) -Stop array -Restart -Run parity check (errors) I'm really leaning towards unraid as a NAS solution, but this is somewhat of a roadblock Ikes, forgot to say I'm running version 4.4 (but there seems to be no forum for it yet?). In the past, when symptoms like yours were reported, it was random read-errors caused by bad memory. Or, it was tracked down to specific hardware, or disk cabling. Please post a syslog for analysis, it might help to isolate a pattern of the errors. It will not matter which version of unRAID you are using, at least not for this type of error. 4.4 should be fine. Whe you are stopping the array, are you using the maintenance screen to stop the array first, then powering down, or just hitting the switch? You should be stopping the array first. Instructions in the wiki on how to post a syslog. Please capture a copy after the errors have occurred, but before you reboot. Zip it up and attach it to your next post. Joe L.
December 17, 200817 yr Author Here is my syslog. I have not rebooted yet, but I am thinking of doing a manual shutdown (from console) to make sure each step happens and if there are any problems. WHOA! Never mind... I'm just running another check (no reboot) and it is finding errors... Nothing has been changed, I'm just doing a second pass and it is finding errors. I think it may be because I have improperly formatted disks... I had issues setting up the server which resulted in incomplete formatting. Some of the disks were labeled as formatted when they weren't fully. I did a reiserfsck with the fix command (which then allowed me to write to shares), but I think I'll do a complete file system rebuild.
December 17, 200817 yr Here is my syslog. I have not rebooted yet, but I am thinking of doing a manual shutdown (from console) to make sure each step happens and if there are any problems. WHOA! Never mind... I'm just running another check (no reboot) and it is finding errors... Nothing has been changed, I'm just doing a second pass and it is finding errors. I think it may be because I have improperly formatted disks... I had issues setting up the server which resulted in incomplete formatting. Some of the disks were labeled as formatted when they weren't fully. I did a reiserfsck with the fix command (which then allowed me to write to shares), but I think I'll do a complete file system rebuild. It sounds as if you are on your way to fixing things.. When you "fixed" the file-system using reiserfsck, did you do it on the /dev/md? device, or on the /dev/hd?1 or /dev/sd?1 device? If you did it on the /dev/[hs]d?1 partition, you yourself caused the parity errors... You wrote directly to the physical disk without the parity "md" driver being involved. Now, it is getting itself back in sync and the errors are to be expected. In the future, the wiki describes how to stop SAMBA, un-mount a "md" device, and then run reiserfsck on it. That way, parity is enforced. Joe L.
December 17, 200817 yr Author /dev/md1 & /dev/md2 as described by some unraid forum post I ran across (can't seem to find it now, but is pretty much what is in the Wiki). So samba was down and the disks were unmounted. The initial scans fixed stuff (--fix-fixable), I ran parity check, fixed stuff, ran file system scan again = no problems this time, ran parity check = more errors. (This was a day or two ago before I even noticed the parity not "sticking") Oh yah, and I isolated myself out of the cause... kind of. I ran a parity check right after finishing a parity check and it started to find errors right away. So now, I just finished doing a file system rebuild reiserfsck --rebuild-tree (at least I think this goes through everything and takes care of it). Doing one more parity check that is fixing sync errors. Thanks for sticking with me!
December 17, 200817 yr Author Well, ran file system rebuild on both drives then ran parity check. Woke up this morning, check was done, so I ran another and it started throwing errors. Here is the syslog. Maybe I do need to run that ramcheck... well next time I get a monitor. But I was able to run SpinRite and Systools and they completed without error... and they should be using the ram to check the hard drives. Is there any way to do a complete format of my drives or was what I did pretty much good enough? (I'm still weary of them...)
December 17, 200817 yr the only thing odd I see is these lines; ata1: softreset failed (device not ready) ata1: failed due to HW bug, retry pmp=0 However, it seems to recognize the problem and overcome it? I've got a SIL3132 based PCIe card that is doing something similar. Every parity check comes up with errors. It doesn't put any errors or faults into the syslog at all. I moved a 1/2 full drive to the card to test it and anything read from the card is corrupt. Glad I wasn't writing to the drive on the port. Peter
December 17, 200817 yr Not sure what motherboard you have but it's just been found that the AMD SB700 southbridge is causing some trouble and this could be your problem. See this post and the response from Rob. http://lime-technology.com/forum/index.php?topic=2826.15 If you have a SB700 then try the 4.3 version and see what happens. Peter
December 17, 200817 yr Author Yes I do have the AMD SB700. I'll give 4.3.3 a try. Whew, at least the new drives are getting a good stress test with all of these parity checks.
December 18, 200817 yr Author Allright... compressed syslog does not fit (1.7 megs)... so I'm going to cut out repetitive lines with this symbol [. . .]. The cuts are all just lines regarding sync errors. Each time there is always parity incorrect. Doesn't look like this version helped either. Any tests I should do? Just finished finals today, so back home to thinker (maybe even get FreeNAS to work with my hardware finally).
December 18, 200817 yr Allright... compressed syslog does not fit (1.7 megs)... so I'm going to cut out repetitive lines with this symbol [. . .]. The cuts are all just lines regarding sync errors. Each time there is always parity incorrect. Doesn't look like this version helped either. Any tests I should do? Just finished finals today, so back home to thinker (maybe even get FreeNAS to work with my hardware finally). It appears as if you have a hardware issue, not one related to a specific version of unRAID. First thing I'd try is different cables to the disk drives... make sure they are 80 conductor type, not 40 conductor. Make sure they are flat, not round, make sure they are as short as possible, not 36 inches long. Make sure your power supply is a quality one... many are labeled as "high wattage" but it exists only in the marketing team's fantasy world. Make sure you have the memory voltage set correctly for your memory. Many BIOS do not set it correctly, even when set to AUTO. Same with CPU voltages. These exact same hardware issues will cause grief regardless of what you load on the server... Some OS will silently ignore errors until you cannot reboot (MS-Windows comes to mind here) At least unRAID lets you know something is not right. Good luck... Until your hardware is up to the task, no server software will work for you. Joe L.
December 18, 200817 yr I'd stick with v4.3.3, not that the version matters much, until you have some stability and data integrity. I don't think you have drive problems. Try running a memory test all night, as well as Joe's suggestions.
December 18, 200817 yr Author Here is my setup, a brand new healthy beast for <$200 (thanks to rebates and other such deals): Mother Board GIGABYTE GA-MA74GM-S2 AM2+/AM2 AMD 740G Micro ATX AMD Motherboard (http://www.newegg.com/Product/Product.aspx?Item=N82E16813128342) Processor AMD Athlon X2 4850e 2.5GHz Socket AM2 45W (http://www.newegg.com/Product/Product.aspx?Item=N82E16819103255) RAM OCZ SLI-Ready 2GB (2 x 1GB) 240-Pin DDR2 SDRAM DDR2 800 (PC2 6400) (http://www.newegg.com/Product/Product.aspx?Item=N82E16820227198) PSU Antec earthwatts EA380 (380W 80+ Certified) (http://www.newegg.com/Product/Product.aspx?Item=N82E16817371005) HDDs 3x 1.5TB SATA Seagates (came with new firmware already... didn't have to update, SD37 or something) And like I said, I was able to run SpinRite level 5 on all three drives as well as a full SeaTools check. So unless these hard drive checkers are avoiding the rest of my system, my system must be stable (though, I guess I will do a RAM test as soon as I'm able.) Though you may have a point with the ram voltage. I know I set the CPU voltage, but I never checked the RAM (so I'm shutting up until I check this and do a full test). Haha, thanks RobJ just saw you chime in, RAM test here I come.
December 18, 200817 yr Here is my setup, a brand new healthy beast for <$200 (thanks to rebates and other such deals): Mother Board GIGABYTE GA-MA74GM-S2 AM2+/AM2 AMD 740G Micro ATX AMD Motherboard (http://www.newegg.com/Product/Product.aspx?Item=N82E16813128342) Processor AMD Athlon X2 4850e 2.5GHz Socket AM2 45W (http://www.newegg.com/Product/Product.aspx?Item=N82E16819103255) RAM OCZ SLI-Ready 2GB (2 x 1GB) 240-Pin DDR2 SDRAM DDR2 800 (PC2 6400) (http://www.newegg.com/Product/Product.aspx?Item=N82E16820227198) PSU Antec earthwatts EA380 (380W 80+ Certified) (http://www.newegg.com/Product/Product.aspx?Item=N82E16817371005) HDDs 3x 1.5TB SATA Seagates (came with new firmware already... didn't have to update, SD37 or something) And like I said, I was able to run SpinRite level 5 on all three drives as well as a full SeaTools check. So unless these hard drive checkers are avoiding the rest of my system, my system must be stable (though, I guess I will do a RAM test as soon as I'm able.) Though you may have a point with the ram voltage. I know I set the CPU voltage, but I never checked the RAM (so I'm shutting up until I check this and do a full test). Haha, thanks RobJ just saw you chime in, RAM test here I come. Not bad... especially if the < $200 figure included the three disk drives. The flat vs. round cables comment was directed to a person with an IDE based disk, so it does not apply. The memory voltage and timings certainly do. Best of luck with your new server. Joe L. (The UPS truck just pulled into my driveway. I now have a new MB, CPU, memory, and power supply as of 5 minutes ago.) I'll be going through the same initial setup as you, but with my second unRAID server. My first is mostly IDE disks, and a bit over 3 years old now. I did not make it under the $200 mark... closer to $300 actually. You hit better sales than me... but I did go for a 650 Watt power supply, Intel dual-core CPU and 4 Gig of ram... they pushed the price up a tiny bit. )
December 21, 200817 yr I have the same chipset (ECS motherboard) and similar memory with a BE-2350 processor and it's been working great (except for that SiL based SATA card I mentioned before). Definitely get back to us after a 24 hour memory test. I think that OCZ memory requires a slight voltage boost, like maybe 1.9V or 2.0V. Peter
December 21, 200817 yr Author Unfortunately my BIOS doesn't have an option to select RAM voltage other than +0.1, 0.2, etc (so I can't tell there what voltage it defaults at). Running Memtest86+ 2.10 reveals no errors. If anyone wants me to run any further tests, let me know, but I am at a total loss. FreeNAS troubleshooting here I come! (I guess this is why prebuilt NASs cost so much, guaranteed hardware/software compatibility, hah!)
December 28, 200817 yr Author Sorry to bring this up again, but I truly believe this is a hardware issue now, but a very strange one at that. Hard drives check out: SeaTools, Spinrite 6 CPU Checks out: CPU Burn-in 1.00, Prime95 RAM Checks out: Memtest86+ 2.10 I also checked voltages as well as UNDERclocked my CPU, HT Link, and ram speed to their lowest values and still received errors. So, I am concluding that everything BUT my motherboard is ok. Maybe the disk controller is broken or somebody is not communicating right because each component isolated does just fine (though I did mixed mode in Prime95 without error, so CPU and RAM can talk ok). If anyone else has any other troubleshooting ideas, let me know... otherwise it looks like I'll have to return or do warranty on this mobo. ........... !!! It works now... or seems to. It was the MoBo; got a new one (same make and model) and it worked! So guys, if there seems to be no explanation (every other component checks out) it is probably the MoBo. [Case Closed] Lock the thread if you need.
Archived
This topic is now archived and is closed to further replies.