May 5, 200818 yr Author unRAID server: Gigabyte GA-EP35-DS3R HTPC: Gigabyte GA-MA78GM-2SH main PC: DFI Lanparty UT-P35-T2R
May 5, 200818 yr Author Here's comparing from my main PC to the unRAID server with the switch WITHOUT the intel NIC (using on board LAN on the unRAID MOBO): Comparing files NA.ISO and Z:\NA.ISO BAE10607: 8D 85 0000000142590607: B4 B0 000000014B6EB587: F4 F0 000000019226F687: 2F 0F 00000001B3ECA40F: EF EE 00000001CC7F2607: CC 8C 0000000203089787: 17 13 000000023794D987: E7 67 000000023FFEB607: AC A4 000000028E6D0687: 7A 6A 0000000320CD3687: B8 38 0000000325A2F687: 1C 18 000000034BBB6587: D5 C5 000000037EBCA687: 88 80 000000039C4AC587: 0E 0A 00000003EF1B2587: C8 88 00000003FF0AB687: E8 C0 0000000422EDE607: D6 D2 00000004C09EB787: 54 44 00000004E96B5587: BC B8 0000000545D2B807: 17 13 Using that same setup, except adding the Intel NIC gave me these results: Comparing files NA.ISO and Z:\NA.ISO BFC7F507: 4A 0A 000000010B8D0607: 3E 2E 000000016FB81707: EC CC 00000001EC97E707: F9 79 000000043BD7C687: C1 40 00000004F8201607: 06 02 Comparing files NA.ISO and Z:\NA.ISO D1886907: 6F 67 000000019A683907: 83 03 0000000240983887: 10 00 00000004F2981887: E3 63 000000056FC3C587: B5 35 Comparing files NA.ISO and Z:\NA.ISO 7107F507: F6 F2 000000047627F787: 42 02 Less errors using the Intel NIC. I copied to movie from my disk1 to disk4 which is a different brand SATA disk. I ran the FCB 3 times and got this: Comparing files NA.ISO and X:\NA.ISO E9180987: AC 2C 00000003DBC1F587: 35 31 000000045A528607: AB 83 000000046F835887: ED 6D Comparing files NA.ISO and X:\NA.ISO E9180987: AC 2C 000000039FC80687: 35 21 00000003DBC1F587: 35 31 000000045A528607: AB 83 000000046F835887: ED 6D Comparing files NA.ISO and X:\NA.ISO E9180987: AC 2C 00000003DBC1F587: 35 31 0000000411785507: 50 40 000000045A528607: AB 83 000000046F835887: ED 6D That gave some interesting results. Some errors are the same accross all 3 FCB tests... but some errors are new and/or slight variations of other errors. Thoughts?
May 5, 200818 yr Author I just ran FCB from my main PC to my HTPC. This is going through the switch using onboard LAN's on both machines. main PC is vista x64 and HTPC is vista x86. No errors on each of my 3 tests: Comparing files NA.ISO and M:\NA.ISO FC: no differences encountered Comparing files NA.ISO and M:\NA.ISO FC: no differences encountered Comparing files NA.ISO and M:\NA.ISO FC: no differences encountered So... using FCB to compare my file it's saying that the error is definitely on the unRAID server somewhere. Going off the different hard drive tests it seems disk1 is throwing out all random errors. Copying from disk 1 to disk 4 cut down on the errors and gave more static errors. But there are still some randoms occurring on disk 4 as well. Thoughts?
May 5, 200818 yr HTPC: Gigabyte GA-MA78GM-2SH Ran across this, about a BIOS update for your HTPC, starts here at post #2059: http://www.avsforum.com/avs-vb/showthread.php?t=992503&page=69. Probably unrelated to your issues though...
May 6, 200818 yr Author HTPC: Gigabyte GA-MA78GM-2SH Ran across this, about a BIOS update for your HTPC, starts here at post #2059: http://www.avsforum.com/avs-vb/showthread.php?t=992503&page=69. Probably unrelated to your issues though... I'm using the latest F3 bios. But it appears that my HTPC is not related to the issue. Running FCB from my main pc to the HTPC is returning 0 errors.
May 6, 200818 yr Author I just copied the movie from disk 4 (results posted above) to disk 3. It returned the following: Comparing files NA.ISO and V:\NA.ISO E9180987: AC 2C 00000002D7987607: 61 60 00000003C28DA107: E6 66 00000003DBC1F587: 35 31 000000042D095787: B9 A1 000000045A528607: AB 83 000000046F835887: ED 6D 000000048F3FA787: C9 49 So... it appears the movie is corrupt here: E9180987: AC 2C 00000003DBC1F587: 35 31 000000045A528607: AB 83 000000046F835887: ED 6D I say that because those errors always appear whether reading from disk4 or disk3. But on top of those errors, I'm also getting the following read errors: 00000002D7987607: 61 60 00000003C28DA107: E6 66 000000042D095787: B9 A1 000000048F3FA787: C9 49
May 6, 200818 yr You might need to do some research on those motherboards. Check'em all out for bios updates and such. Look for hot components. I did a quick search on one board and this popped up. http://forums.whirlpool.net.au/forum-replies-archive.cfm/950450.html I don't know about the GA-MA78GM-2SH anymore. I've been reading on other forums (www.avsforum.com/avs-vb/...ead.php?t=992503) and here how ppl have been having a lot of trouble with it (from the NB being too hot to the HDMI not working). Some say Gigabyte have screwed up on the NB heatsink. Also the gigabyte board you are using for the unraid server is used by others here. Again, by downloading the kernel via wget directly from the net showed something. Perhaps the download wasn't fast enough thereby not taxing the chips enough. When you get a chance use fto from a good workstation directly to unraid and see. Northbridge overheating a lil could be a culprit for supurious errors like this.
May 6, 200818 yr Author You might need to do some research on those motherboards. Check'em all out for bios updates and such. Look for hot components. I did a quick search on one board and this popped up. http://forums.whirlpool.net.au/forum-replies-archive.cfm/950450.html I don't know about the GA-MA78GM-2SH anymore. I've been reading on other forums (www.avsforum.com/avs-vb/...ead.php?t=992503) and here how ppl have been having a lot of trouble with it (from the NB being too hot to the HDMI not working). Some say Gigabyte have screwed up on the NB heatsink. Also the gigabyte board you are using for the unraid server is used by others here. Again, by downloading the kernel via wget directly from the net showed something. Perhaps the download wasn't fast enough thereby not taxing the chips enough. When you get a chance use fto from a good workstation directly to unraid and see. Northbridge overheating a lil could be a culprit for supurious errors like this. I'll run the FTP test in a second to see if that does anything. I'm thinking about ordering the ABIT AB9 Pro board and giving that a shot. Maybe there's a problem with the gigabyte boards (or at least the two I've tried) and transferring large files at gigabit speeds.
May 6, 200818 yr Author When you get a chance use fto from a good workstation directly to unraid and see. I'm transferring now and my FTP program (CuteFTP) is showing a steady transfer speed of 192 mbs.
May 6, 200818 yr Author I tested my upload via FTP using FCB and this was my output: Comparing files NA.ISO and V:\NA.ISO 0000000157282687: DD DC Comparing files NA.ISO and V:\NA.ISO A77F6687: 5D 58 00000003A5881607: 94 84 00000003A7187587: 8F 8B 00000003B3032707: AC A8 000000047AC80707: 5B 1B 00000004B10E3707: 68 28 Looks like the file may have written fine. Looks like a bunch more read errors though.
May 6, 200818 yr Author Northbridge overheating a lil could be a culprit for supurious errors like this. The NB is pretty hot... I'll put a nice fan on top of it with the case opened and run a few tests.
May 6, 200818 yr Author Northbridge overheating a lil could be a culprit for supurious errors like this. I think you just figured it out! I put a fan on the NB heatsink and let it sit for about 10 minutes. I have cool air coming in from the outside right near the computer now too. I've run 4 tests reading from disk3 under this setup and all 4 have returned with no errors. I'm going to try testing this on the other disks to see what I can find out... but so far it's looking positive.
May 6, 200818 yr Author How accurate is this FCB script I've been running? I disconnected all the Optical and Hard Drives from my main PC and converted it into a test unRAID server (re-configured the BIOS, installed USB Flash Drive, connected one unRAID hard drive). So from boot I can see one drive in my unRAID array. It has the movie file I've been using all long on it. I got on my HTPC and did some of the FCB commands and this is what returned: Comparing files NA.ISO and Z:\NA.ISO FC: no differences encountered Comparing files NA.ISO and Z:\NA.ISO 0000000204D10A7F: C2 C0 Comparing files NA.ISO and Z:\NA.ISO FC: no differences encountered Comparing files NA.ISO and Z:\NA.ISO FC: no differences encountered Comparing files NA.ISO and Z:\NA.ISO FC: no differences encountered Nice right? 5 tests and 4 come back as no differences found. But one came back with this error: 0000000204D10A7F: C2 C0. So... what could make that happen? 4 tests work perfectly but one throws one error. Is fc /b not 100% perfect? Maybe it has issues with files of this large of size and/or comparing accross a network and/or processing accross a gigabit network. Thoughts?
May 6, 200818 yr Which test is this . From the mention above you were going to continue testing the motherboard with the northbridge that was getting hot. Which motherboard to which motherboard? It seems you may have two brands that are suspect to the same issue?
May 6, 200818 yr Author Which test is this . From the mention above you were going to continue testing the motherboard with the northbridge that was getting hot. Which motherboard to which motherboard? It seems you may have two brands that are suspect to the same issue? Sorry. This test was my main PC (DFI LanPart MOBO) setup as the unRAID server and my HTPC (running the Gigabyte 78GM MOBO). Basically, instead of using the hardware of my current unRAID server that's been giving me issues, I used the hardware of my main PC. With the above setup I passed 4 out of 5 tests... but 1 failed and I'd like to see if anyone knows why that could have happened. As for the old hardware that was giving me issues (overheating NB), I ran tests on each hard drive while a fan blew towards the NB. Each hard drive either passed 100% or failed with only 1 error (only 1 error per scan max). I attributed the occasional fail towards the NB not being cooled enough. But the fact that cooling it lessened and mostly stopped the errors from happening was a sure sign the NB was the problem. I even removed the fan, let the PC sit for 20 minutes with the case closed up, and then ran tests again and it started failing a bunch of times just like it always has. Any idea on why my last test (using my DFI MOBO setup as the unRAID server) passed 4 times but failed once? What could account for that one failure?
May 6, 200818 yr Seeing that you can create an environment which fails and then succeeds, then fails again... I would re-create the same tests as before with the new hardware. Check the NB on the HTPC and the MAIN PC, opening first, the one that was not part of the previous test.
May 6, 200818 yr Author Seeing that you can create an environment which fails and then succeeds, then fails again... I would re-create the same tests as before with the new hardware. Check the NB on the HTPC and the MAIN PC, opening first, the one that was not part of the previous test. I'll go ahead and try that on the HTPC. Maybe that NB is overheating too... maybe this is a common Gigabyte issue with their latest MOBO's. My DFI board seems rock solid. I've been running the FCB script about 10 times now on this machine (DFI MOBO setup turned back into my main PC) to the Windows Home Server and it's returned no errors what-so-ever. I'll try the same thing from the HTPC to the Windows Home Server and see if it errors there. If it does... I'll try cooling the NB.
May 6, 200818 yr Looks like you're making good progress. As suspected, it looks like your problems have nothing to do with the switch or your LAN, but I guess the jury is still out. I still recommend trying to recreate your problem on a single machine to reduce the number of possible problems. As it is, your CAT5 cables, LAN hardware, and a number of "between the machines" environmental factors are still in play, in addition to your MB, memory, and everthing happening inside the computer case(s). Overheating of your components can cause all kinds of problems. My experience is, though, that a non-overclocked motherboard in a decent computer case with at least one fan in and one fan out (+PSU) and without high-end graphics cards will operate reliably without extraordinary cooling. If your NB is truly failing under these normal conditions, I would return the MB. Note that NB notoriously run hot, so it is not an instant warning sign. Note that adding cooling to your NB may just have likley cooled your memory better and reduced the incidence of data corruption. It could also be that you jiggled something inside the case and made some connection better. The highest probability right now IMO is that you are having a memory error. Are you using the same brand / model of memory in multiple machines? Some higher end memory require a bit of a voltage boost to run reliably. (My memory is rated at 1.9-2.0v, rather than the 1.8v standard.) Are you sure that your RAM voltage settings in BIOS are correct? I'd recommend running memtst on each of your machines to rule out or rule in memory failure. Memory failure is relatively common as compared to something like a NB failure. Although encouraging relative to the number of compare errors you have been getting, one failure in 5 is no where near good enough. Picky technical corretion: The kinds of compare errors you are getting are not "read errors". A "read error" means that you tried to read a sector on the disk and you get an OS level error (sometimes you'll see an "Abort/Retry/Ignore" message or similar from the OS GUI). Similarly a "write error" means an extremely nasty condition usually indicative of drive death. Your kind of "errors" I'd classify as "data corruption" or "compare errors", not "read errors". Good luck. I think you're close.
May 6, 200818 yr As it is, your CAT5 cables, LAN hardware, and a number of "between the machines" environmental factors are still in play, in addition to your MB, memory, and everthing happening inside the computer case(s). I disagree with part of this. with this particular problem I don't believe the cables and lan hardware are the issue. If they were we would be seeing stalled transmissions, or may errors/retries. Do the following telnet to your unraid server login as root type ifconfig See if you see errors, dropped, overruns or frame errors rcotrone@gatekeeper ~> telnet media Trying 192.168.1.179... Connected to media. Escape character is '^]'. Media login: root Linux 2.6.24.4-unRAID. root@Media:~# ifconfig eth0 Link encap:Ethernet HWaddr 00:50:8D:9D:7B:AA inet addr:192.168.1.179 Bcast:192.168.1.255 Mask:255.255.255.0 UP BROADCAST NOTRAILERS RUNNING MULTICAST MTU:1500 Metric:1 RX packets:624209 [b]errors:0 dropped:0 overruns:0 frame:0[/b] TX packets:12291 [b]errors:0 dropped:0 overruns:0 carrier:0[/b] collisions:0 txqueuelen:1000 RX bytes:40677870 (38.7 MiB) TX bytes:2254642 (2.1 MiB) Interrupt:16 Base address:0x6000 Overheating of your components can cause all kinds of problems. My experience is, though, that a non-overclocked motherboard in a decent computer case with at least one fan in and one fan out (+PSU) and without high-end graphics cards will operate reliably without extraordinary cooling. If your NB is truly failing under these normal conditions, I would return the MB. Note that NB notoriously run hot, so it is not an instant warning sign. Note that adding cooling to your NB may just have likley cooled your memory better and reduced the incidence of data corruption. It could also be that you jiggled something inside the case and made some connection better. The highest probability right now IMO is that you are having a memory error. Are you using the same brand / model of memory in multiple machines? Some higher end memory require a bit of a voltage boost to run reliably. (My memory is rated at 1.9-2.0v, rather than the 1.8v standard.) Are you sure that your RAM voltage settings in BIOS are correct? I'd recommend running memtst on each of your machines to rule out or rule in memory failure. Memory failure is relatively common as compared to something like a NB failure. Although encouraging relative to the number of compare errors you have been getting, one failure in 5 is no where near good enough. I agree here for the most part. Does your memory have heatsinks? Is there airflow past it. Still I have to re-iterate that in the link I previously posted, another person complained about overheating northbridge. I'll repost the point. I don't know about the GA-MA78GM-2SH anymore. I've been reading on other forums (www.avsforum.com/avs-vb/...ead.php?t=992503) and here how ppl have been having a lot of trouble with it (from the NB being too hot to the HDMI not working). Some say Gigabyte have screwed up on the NB heatsink. I do agree running memtest on each machine would be wise. If it were me, I would run it on all the machines simultaneously overnight. Although encouraging relative to the number of compare errors you have been getting, one failure in 5 is no where near good enough. Any of these bitflip errors are impracticable, how can you possibly rely on any information if you know bits are flipping. Imagine how that would affect your checkbook/bank account or something of that nature.
May 6, 200818 yr I disagree with part of this. with this particular problem I don't believe the cables and lan hardware are the issue. If they were we would be seeing stalled transmissions, or may errors/retries. There is nothing quite like getting it out of the signal path completely to eliminate all doubt. I've been burned by "extremely unlikely" once or twice. In this situation it would seem easy to remove the network from the test scenarios. Still I have to re-iterate that in the link I previously posted, another person complained about overheating northbridge. I'll repost the point. I don't know about the GA-MA78GM-2SH anymore. I've been reading on other forums (www.avsforum.com/avs-vb/...ead.php?t=992503) and here how ppl have been having a lot of trouble with it (from the NB being too hot to the HDMI not working). Some say Gigabyte have screwed up on the NB heatsink. Sorry, didn't see this. If this MB has specific problems with the NB, it certainly is worth considering. On my primary workstation, I overclocked. In the process I replaced the thermal compound on my NB and SB with Arctic Silver 5, as well as adding a small fan to the NB. I even lapped the NB heatsink. This was probably overkill, but it has been running flawlessly for about a year. (Asus P5W DH) I'd recommend something like this (or even replacing the NB heatsink with an aftermarket HSF) if you want to get serious about cooling the NB. I'd be very nervous depending on a MB with a known weakness like this. After this problem gets solved, I'd recommended testing the h3ll out of it! One compare error and I'd junik it! I do agree running memtest on each machine would be wise. If it were me, I would run it on all the machines simultaneously overnight. We're in agreement here. My guess is that the memory test will fail within a few minutes, but I'd run the tests overnight if not longer to be as sure as possible that the memory is good. (Don't forget to check the memory specifications vs the BIOS settings for RAM voltage.)
May 6, 200818 yr The NB has traditionally been a component that mobo designers don't give enough consideration to. To them, it's a black box... throw some cooling on it if you need to. When you compare overall power usage of mobos with the same CPU, the NB is the main difference... some roasting hot, and some stay cool. A notable performer in this regard is the Biostar TA690G AM2. Lowest in overall power consumption for the CPU, and the NB is passively cooled and the heatsink stays cool to the touch.
May 6, 200818 yr Author Thanks for the replies everyone. I did test all my PC's and server a few nights ago with memtest and all of them passed after 10 hours of running overnight. I know some RAM needs more volts. I manually set the timings in the BIOS and checked to see what the correct voltage should be. For my Corsair RAM it's 1.8V. For my Gskill RAM it's 1.9V (at least while running in dual channel). I've tried different sticks in different machines just to rule memory out of the equation. It's not the memory. All my sticks passed memtest and I've move sticks around from different machines and as well as added new sticks to the mix. As of now I have a DFI UT-P35-T2R shipping to me today. It's the same MOBO I use in my main PC and it has a nice massive NB heatsink that's connected to other heatsinks to help cool it even more. I believe the only issue I may still have on my plate is my HTPC's MOBO. I'm going to be testing that today and see how it performs. Memory on that machine again, has passed memtest for over 10 hours. I'm gong to run the FCB script multiple times and if it throws any errors I'm going to put a fan on the NB, let it sit for about 15 minutes, and then run a bunch of tests with the NB being air cooled. I've done over 15 tests from my mainPC (DFI MOBO) to my Windows Home Server (HP MediaSmart Server). Every test has passed with no errors. This is showing that my main PC hardware is working correctly and has no issues. The other day when I passed 4 but failed 1, I believe my mainPC (that I had running as a temp unRAID server) was working correctly. I believe the HTPC was the device that caused that 1 error. I'm happy though that at least this is moving is a positive direction.
May 6, 200818 yr Author With this FCB script (batch file) I've been running... is there a way to automate it to run say 10 times? Can I add something to the batch file or to my command prompt to repeat the test a set number of times? This would be useful so I can run multiple tests without needing to log back onto the computer every 15-20 minutes.
May 6, 200818 yr I think you can use the FOR command C:\Documents and Settings\rcotrone>help for Runs a specified command for each file in a set of files. FOR %variable IN (set) DO command [command-parameters] %variable Specifies a single letter replaceable parameter. (set) Specifies a set of one or more files. Wildcards may be used. command Specifies the command to carry out for each file. command-parameters Specifies parameters or switches for the specified command. To use the FOR command in a batch program, specify %%variable instead of %variable. Variable names are case sensitive, so %i is different from %I. If Command Extensions are enabled, the following additional forms of the FOR command are supported: FOR /D %variable IN (set) DO command [command-parameters] If set contains wildcards, then specifies to match against directory names instead of file names. FOR /R [[drive:]path] %variable IN (set) DO command [command-parameters] Press any key to continue . . . here's my fortest.cmd file and output of a run. (Just take out the echo and replace file1 file2 with your files) C:\DOCUME~1\rcotrone>type fortest.cmd for %%v in ( 1 2 3 4 5 6 7 8 9 10 ) do echo fc file1 file2 C:\DOCUME~1\rcotrone>fortest C:\DOCUME~1\rcotrone>for %v in (1 2 3 4 5 6 7 8 9 10) do echo fc file1 file2 C:\DOCUME~1\rcotrone>echo fc file1 file2 fc file1 file2 C:\DOCUME~1\rcotrone>echo fc file1 file2 fc file1 file2 C:\DOCUME~1\rcotrone>echo fc file1 file2 fc file1 file2 C:\DOCUME~1\rcotrone>echo fc file1 file2 fc file1 file2 C:\DOCUME~1\rcotrone>echo fc file1 file2 fc file1 file2 C:\DOCUME~1\rcotrone>echo fc file1 file2 fc file1 file2 C:\DOCUME~1\rcotrone>echo fc file1 file2 fc file1 file2 C:\DOCUME~1\rcotrone>echo fc file1 file2 fc file1 file2 C:\DOCUME~1\rcotrone>echo fc file1 file2 fc file1 file2 C:\DOCUME~1\rcotrone>echo fc file1 file2 fc file1 file2
Archived
This topic is now archived and is closed to further replies.