Questionable build


Recommended Posts

Does memtest memory errors sometimes identify other problems than memory?

 

I have an intel i5 workstation that has been used for 4 years with Windows 7 seeing heavy use.  It would sometimes complain shutdown about a memory error, but seemed to work fine.  I have repurposed it as an unRaid 6 test box, and did a memtest.  It failed, with 1 error after 6 hours.  I swapped the 16 gb of memory with another unRaid server also with 16gb, and ran memtest on both machines, and the error reappeared on the original machine but at a different address.  The error did not follow the memory.  Bad motherboard, bad CPU?

 

M/B: BIOSTAR Group - TH67+

CPU: Intel® Core™ i5-2500K CPU @ 3.30GHz

 

Link to comment

Memtest isn't just testing memory, its testing comms between memory and CPU. It's also not a complete and thorough test.

 

Let's say there are scratches on the PCB on the actual lanes between a PCI device and the memory. These don't get tested during a memtest. So memtest could say everything is fine, but in use, when there is traffic going through alternate paths than the test, other problems may appear.

 

In your case, I doubt this is the CPU, but would definitely be suspicious of the motherboard.

 

Disclaimer: this is information I gathered from various sources, and I'm not a hardware engineer. Someone smarter than I may be able to better explain.

Link to comment

Not a hardware or software engineer myself, but I'll echo jonp's comments about it testing your system.  I'll also throw in the notion that it's testing your PSU too.  I've paid the wages of sin for buying cheap power supplies in the past, so that's usually the first suspect I have when errors like this pop up.  If heat is an issue, you can rule that out by testing with the case open and a fan blowing on the motherboard. 

Link to comment

A few possibilities, in decreasing order of likelihood ...

 

(1)  dirty/corroded contacts in the memory slot(s).  By far the most likely.  If you have some electronic contact cleaner, you can remove the memory modules, CAREFULLY spray contact cleaner in the socket; let it dry; then insert/remove the memory modules a few times.    If you don't have any contact cleaner, then just insert/remove the memory modules a few times ... cleaning the contacts on the modules in-between insertions.

 

(2)  power instability (i.e. cheap PSU with poor regulation).  Possible, but unless it's fairly heavily loaded not likely.

 

(3)  a memory controller issue.    Since the memory controller is embedded in the chip, I doubt this is the case.

 

(4)  degraded motherboard traces.  VERY unlikely unless you've flexed your board lately.  With modern multi-layer boards most of these traces are "inside" the board, and simply don't degrade unless the board is flexed or an over-current condition is encountered ... which will kill the connection -- not just cause intermittent errors.

 

By far the most likely cause of this (given that it's not "following the memory") is #1

 

 

Link to comment

By far the most likely cause of this (given that it's not "following the memory") is #1

 

Very interesting.  I'll give it a few connect re connect cycles.  What about dropping to only 2 x 4gb sticks of RAM??

 

Here is the error that the Win10 VM gives as it shuts down (and Win 7 did the same when running bare metal on this same MB).  This is the only time you are aware that there is any issue at all.  Runs normal otherwise.

 

LQPEuiR.png

Link to comment

I've seen that exact same error [different instruction address, but always a reference to address 0] in several Windows 10 VM's ... all running under VMware.    If that's the only error you're seeing, I wouldn't be concerned.

 

I've never, however, seen it on Windows 7 ... either bare metal or virtual.

 

And MemTest doesn't show any errors on any of the machines I've seen this on -- first time I saw it I ran MemTest for about 48 hours and all was well.

 

Link to comment

Well I do get memtest errors and it shows up in win7 too. I don't recall the address under win7.

 

Other wierd thing about this workstation is that the power supply continues to run after shut down using 20 watts and the lights on the mb stay lit up. Maybe I'll grab a different power supply and test too.

Link to comment

Microsoft's Answer: https://support.microsoft.com/en-us/kb/2929203

 

Insightful!  ::)

 

Actually it DID provide an interesting result.    The suggestion to shutdown via the Charms bar isn't useful in Windows 10 (since there's no charms bar).    But you can send the Windows 10 VM a Ctrl-Alt-Del (via the VMware controls) and then click the ShutDown icon at the lower-right; then click ShutDown; and the memory access error does NOT appear.    Just tried this on 2 different PC's that had Win10 VM's, and it works perfectly -- but both show the memory error if you use the Start - Shutdown process.

 

Assuming there's a way to send a Ctrl-Alt-Del to a KVM VM, I suspect tr0910 will see the same thing in his VM.

 

Link to comment

Well I do get memtest errors and it shows up in win7 too. I don't recall the address under win7.

 

Other wierd thing about this workstation is that the power supply continues to run after shut down using 20 watts and the lights on the mb stay lit up. Maybe I'll grab a different power supply and test too.

 

Clearly this is more than just the Windows 10 memory error -- which I suspect will not be a problem if you shut it down as I just noted.

 

But with MemTest showing errors, and errors running bare metal Windows 7, you clearly have a "real" problem.    I'd try cleaning the memory sockets (as I suggested earlier) and see if that helps.  It's not likely the actual memory modules, since the problem did not "follow the memory".

 

The "power supply keeps running" issue could be a PSU issue OR a motherboard issue.    One very simple thing to try is to replace the BIOS battery, just in case the CMOS isn't retaining the settings.  Not likely, but a simple and inexpensive (~ $2) thing to try.

 

 

 

Link to comment
  • 2 weeks later...

This could also be a memory voltage / timing issue. If the DIMMS are high performance units they might want a higher voltage to operate properly.

 

Certainly possible if the memory modules weren't standard 1.5v DDR3 modules.    However, it seems unlikely, since this happened with two different sets of modules [i.e. "... The error did not follow the memory ..."].

 

Link to comment

He did say 16gb modules though, which to me would suggest high performance kit that was split?

 

Not 16GB modules  ...  16GB of memory  (likely a pair of 8GB modules, or possibly even 4 4GB modules).    It's certainly true that if they are high-performance modules that require a non-standard voltage, that could have been the issue ... but I think that's VERY unlikely ==> especially since the machine has worked fine "... for 4 years with Windows 7 seeing heavy use ...".

Link to comment

The 20w being used while shutdown could be standby voltages for WOL and other various standby services.

 

No, that's FAR outside of the ATX standby power specification, which is 12.5w with a potential 3-second long peak of 17.5w if a USB device triggers a wake-up event.  Typical standby current draws are in the 4-6 watt range.

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.