cpthook Posted January 10, 2017 Share Posted January 10, 2017 Hello.. Can someone assist with explaining what this Call Trace means? I woke up this morning to an unresponsive server; both GUI and TS which caused me to do an unclean shutdown and painful Parity Sync. Sync came back clean, but this will be the 3rd time in the last week my server crashed. Any help would be appreciated. See attachment for screen shot of Call Trace. Link to comment
RobJ Posted January 12, 2017 Share Posted January 12, 2017 These aren't easy to figure out, and there's little info evident there. First thing I'd try is to check for a motherboard BIOS update. Is this ECC RAM? If not, do a Memtest, although I think you would be better off trying the newer PassMark Memtest. It's less convenient, you have to download it and create bootable media for it. After that, you may want to try different kernels, go forward and go backward. Link to comment
cpthook Posted January 12, 2017 Author Share Posted January 12, 2017 Thanks for the response RobJ. First thing I'd try is to check for a motherboard BIOS update. Ok... My MOBO is fairly new (SUPERMICRO MBD-X11SSL-F-O) but I will check for updates Is this ECC RAM? But of course! Moderators / Developers recommendations insist; however, I'm only utilizing one RAM bank on the MOBO with this 16GB stick: Kingston 16GB DDR4 2133 ECC DIMM. Would you still recommend a memtest? Based upon the trace, it looked to me like some type of RAM error so I ordered a duplicate stick which will be here today. After that, you may want to try different kernels, go forward and go backward. Ok you got me!! At the risk of embarrassment here in this post, I'm assuming you mean try different unRAID versions? Roll back to previous or update to Beta version? From what little I do know, I assumed the unRAID OS was using the latest Linux Kernel. If I'm way off please correct me!!! Link to comment
JorgeB Posted January 12, 2017 Share Posted January 12, 2017 You can check the event log in the bios or with ipmiview, see if there are any ecc errors. Link to comment
RobJ Posted January 13, 2017 Share Posted January 13, 2017 First thing I'd try is to check for a motherboard BIOS update. Ok... My MOBO is fairly new (SUPERMICRO MBD-X11SSL-F-O) but I will check for updates Yours was from late 2015, not terribly old but not new either, and motherboard BIOS updates are more common when the boards are new. Once they've been out there for awhile, they usually aren't interested in updating them, they would rather you buy their latest. Is this ECC RAM? But of course! Moderators / Developers recommendations insist; however, I'm only utilizing one RAM bank on the MOBO with this 16GB stick: Kingston 16GB DDR4 2133 ECC DIMM. Would you still recommend a memtest? Based upon the trace, it looked to me like some type of RAM error so I ordered a duplicate stick which will be here today. I was too lazy to look the board up. I don't think there's much point in Memtest, go with johnnie.black's advice instead. After that, you may want to try different kernels, go forward and go backward. Ok you got me!! At the risk of embarrassment here in this post, I'm assuming you mean try different unRAID versions? Roll back to previous or update to Beta version? From what little I do know, I assumed the unRAID OS was using the latest Linux Kernel. As a NAS, where stability is vital, unRAID stays a little behind the latest, but the beta versions are often close. Right now, the best version to try is the latest - 6.3.0-rc6. If it doesn't help, sometimes it's possible that an older release works better on some systems. I would try 6.1.9. Link to comment
cpthook Posted January 13, 2017 Author Share Posted January 13, 2017 You can check the event log in the bios or with ipmiview, see if there are any ecc errors. Thanks for the tip... See attachment. I did recently install an NVIDIA GeForce GT 730 card in an attempt to migrate my desktop PC to KVM. I like the idea that my main PC can be dynamic with specifications by virtualizing it. I had a feeling it was the GPU may be causing some issues, but I also tried out a SUPERMICRO AOC-SAS2LP-MV8 so I wasn't quite sure. I have no idea what is triggering these events though. I'm hoping their just related to shutdown(s) and reboot(s) of the VM or my Box. Yours was from late 2015, not terribly old but not new either, and motherboard BIOS updates are more common when the boards are new. Once they've been out there for awhile, they usually aren't interested in updating them, they would rather you buy their latest. Ok.. which is probably my case. According to the SuperMicro support site, I'm running the latest BIOS and IPMI versions. Right now, the best version to try is the latest - 6.3.0-rc6 I will give it a shot.. right now I'm on the latest stable 6.2.4. ipmi_eventlog.txt Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.