June 8, 201610 yr Hi, I've been running V5 for some time and decided to build a new machine and upgrade to V6. After the build and upgrade to 6.1.9 I have had random hard locks. First time it happened after 24 hours, and another time after being on for 4 days. The machine is currently hung again. I have the screen up and it's at the login prompt but I can't type anything. I can't ping the machine, nor can I telnet. I also can't access the webui. I'm not sure how to pull logs off of it at this point. I'm not even sure where to start to figure out what's going on. I ran memtest for 36 hours after the initial build with no errors. I'm getting close to the return policy on the memory, and it looks like dealing with Samsung memory RMA is not fun. IPMI is working fine, and I don't see any errors in the log. My initial thoughts are memory since I'm not sure what else would cause a hard lock. Any advice would be appreciated. Thanks. Specs: Motherboard: Supermicro MBD-X11SSH-LN4F CPU: Intel Xeon E3-1230 V5 Memory: 2x Samsung M391A2K43BB1-CPB 16GB (from the approved memory list) PSU: Corsair 760W Controller: LSI SAS921?1-8i flashed to P20 Case: Fractal R5 Flash Drive: Old Sandisk 2gb plugged into the internal USB3 port
June 9, 201610 yr Author So I ordered new memory, I'll give that a test and update. Anything else I should/could be testing?
June 21, 201610 yr Author Update: I replaced the memory, the machine was up for 5 days and again same issue happened. Can anyone please suggest something else to test or what else to do? I'm stumped. Here is the image from console redirection (note I can't type as it's hung). Edit: Just realized that when it's hung the hardware reset button does not work. I have to hold the power button. But normally the hard reset will reboot the machine.
June 21, 201610 yr It does *seem* like a hardware issue, hard to tell though without any errors evident at all. Several things to try though. V6 is 64 bit, which uses memory differently. V6 is also newer Linux kernels using newer technologies, involving the CPU and motherboard. So the first thing I would do is make sure you have the latest BIOS for your system. And set the BIOS to default settings. You may also want to update the firmware if possible on any addon cards. There's a better Memtest with more tests for more kinds of memory, free and downloadable from PassMark, just search memtest86. You have to build the bootable CD or USB flash drive yourself, but sometimes it finds issues that the Memtest built into unRAID doesn't find. If you install the NerdPack, there's a new tool you can install from it, called mcelog. Keep that installed, perhaps it will find an issue (CPU or memory). I certainly don't know everything, but I've never heard of reset button hanging before. I wonder if you have a defective power supply. You might want to make sure the reset button works at other times, perhaps it's become disconnected.
June 21, 201610 yr Author It does *seem* like a hardware issue, hard to tell though without any errors evident at all. Several things to try though. V6 is 64 bit, which uses memory differently. V6 is also newer Linux kernels using newer technologies, involving the CPU and motherboard. So the first thing I would do is make sure you have the latest BIOS for your system. And set the BIOS to default settings. You may also want to update the firmware if possible on any addon cards. There's a better Memtest with more tests for more kinds of memory, free and downloadable from PassMark, just search memtest86. You have to build the bootable CD or USB flash drive yourself, but sometimes it finds issues that the Memtest built into unRAID doesn't find. If you install the NerdPack, there's a new tool you can install from it, called mcelog. Keep that installed, perhaps it will find an issue (CPU or memory). I certainly don't know everything, but I've never heard of reset button hanging before. I wonder if you have a defective power supply. You might want to make sure the reset button works at other times, perhaps it's become disconnected. Thanks for the reply. I tried swapping the memory it's possible that both sets of memory had some issues but I think it's highly unlikely. My LSI card is on the newest firmware (though I can always try removing the card). My bios is up to date but it's still on Rev 1 since it's a pretty new motherboard. I know there is at least one other user running the same motherboard and cpu as I am (https://lime-technology.com/forum/index.php?topic=46871.msg448015#msg448015) Right now I just want to make sure it's not software related so tonight I'm going install Windows 10 on a spare hard drive and see if the machines hangs in windows. I think that might give me a better picture. The reset button does work under other situations. Is a stuck reset button a product of the motherboard or PSU? I'll keep updating this thread just in case someone else can relate to it. Thanks.
July 9, 20169 yr Author Wanted to give a hopefully final update. I think my issue is resolved. A bios update came out recently that fixed a whole bunch of issues, and I think the freeze was one of them. I have 14 days up with no issues. If you're running this board make sure your bios is all the way up to date!
July 9, 20169 yr That's great! It used to be that a BIOS update was only something you did as a last resort, because it carried a small risk, and it rarely did much good. But lately, BIOS updates have been the answer a surprising number of times!
Archived
This topic is now archived and is closed to further replies.