chalkdust Posted January 5, 2021 Share Posted January 5, 2021 Hi all, For the past few nights, at what seems to be about the same time, my unraid server goes down and reboots. After the first few times, i decided to set up the logserver and capture the errors. can anyone help me interpret the logs? attached are logs and diagnostics tower-diagnostics-20210104-2246.zip syslog-192.168.1.4 (1).log Quote Link to comment
trurl Posted January 5, 2021 Share Posted January 5, 2021 Nothing obvious. Do you have UPS? Have you done memtest? Quote Link to comment
chalkdust Posted January 5, 2021 Author Share Posted January 5, 2021 If it matters, I’ve recently upgraded all my hardware and kept the same install of unraid. Before installing unraid I did a mem test via the bios. there were a few lines about samba sending reboot or something in the logs right around the time it went down (9:40 pm) if that helps. Quote Link to comment
trurl Posted January 5, 2021 Share Posted January 5, 2021 10 hours ago, trurl said: Do you have UPS? 10 hours ago, chalkdust said: upgraded all my hardware New power supply too? Are you sure CPU cooling is OK? Quote Link to comment
chalkdust Posted January 5, 2021 Author Share Posted January 5, 2021 It's not currently connected to a UPS. i've got it temporarily set up in my workshop after the build. But i've got one in my server room. wasnt gonna move it until it was stable but i can try if you suggest it. I actually re-used my case (Enthoo Pro) and my 2yo PS, (EVGA GQ 650W 80+G SM ATX PSU). Rest of hardware is new. Running an Intel i9-9900K on a Asus Z390-p mobo with two sticks of Crucial Balistix 8gb DDR4 3200 sticks. with two NVME drives installed and a LSI SAS controller holding 6 spinners. CPU cooled by an Arctic Freezer II 240mm AIO liquid cooler. The mobo and CPU temps are reporting quite low and i'm pretty sure the CPU cooling is OK. Quote Link to comment
chalkdust Posted January 5, 2021 Author Share Posted January 5, 2021 (edited) so i went ahead and moved the machine to my server room to hook it up on the UPS. it was up for about 3 hours and then became unresponsive and then crashed and rebooted. attached is the latest log. Im at a loss. i guess i can revert back to my old server setup for stability, but i was looking forward to using this new hardware. Brought the machine up for more testing. Ive reset the bios to defaults. Although i cant get unraid to boot unless i enable CSM (Compatibility Support Module) in the BIOS. is that normal? could this be memory stick related? syslog-192.168.1.4 (2).log Edited January 5, 2021 by chalkdust Quote Link to comment
trurl Posted January 5, 2021 Share Posted January 5, 2021 17 hours ago, chalkdust said: a mem test via the bios. I'm not familiar with that. Does it actually go through and test all bits of all addresses? Quote Link to comment
chalkdust Posted January 5, 2021 Author Share Posted January 5, 2021 im not sure. Is there a recommended memtest process? i can do it again for sure Quote Link to comment
itimpi Posted January 5, 2021 Share Posted January 5, 2021 5 minutes ago, chalkdust said: im not sure. Is there a recommended memtest process? i can do it again for sure There is the one supplied with UnRAID if you are using legacy boot mode. If you are using UEFI boot then you should download and use the version from the memtest86 site. Quote Link to comment
chalkdust Posted January 5, 2021 Author Share Posted January 5, 2021 running from built unraid boot menu option now. i will report back when its done. thanks. Quote Link to comment
chalkdust Posted January 6, 2021 Author Share Posted January 6, 2021 not done but one pass complete with no errors. Quote Link to comment
chalkdust Posted January 6, 2021 Author Share Posted January 6, 2021 passed twice. and it crashed again twice in a row within 10 minutes of eachother. im at a loss. any other suggestions? Quote Link to comment
trurl Posted January 6, 2021 Share Posted January 6, 2021 Have you tried SAFE mode with no dockers or VMs? Quote Link to comment
chalkdust Posted January 6, 2021 Author Share Posted January 6, 2021 Not yet. I’ll try. Quote Link to comment
ElBurrito Posted January 6, 2021 Share Posted January 6, 2021 7 hours ago, chalkdust said: so i went ahead and moved the machine to my server room to hook it up on the UPS. it was up for about 3 hours and then became unresponsive and then crashed and rebooted. attached is the latest log. Im at a loss. i guess i can revert back to my old server setup for stability, but i was looking forward to using this new hardware. Brought the machine up for more testing. Ive reset the bios to defaults. Although i cant get unraid to boot unless i enable CSM (Compatibility Support Module) in the BIOS. is that normal? could this be memory stick related? syslog-192.168.1.4 (2).log 586.56 kB · 0 downloads I just rebuilt mine with new hardware, I am running AMD though, but I saw something in one of the guides where I had to enable that CSM for it to work. Not sure if that was just for ryzen devices or not. I had an issue like you had after my previous upgrade, I was running dual Xeons on super micro MB, and I had an issue with a couple of sticks of memory being the issue. It was running something like 96G and failed pretty far into the memtest. i finally isolated it down to the sticks and replaced them to resolve the issue. Side note, i have had some weird behavior like that also running a PCIe sata expansion card. If i remember correctly, it didn't like my drives over 2TB being on there and caused some weird crashing stuff. Quote Link to comment
chalkdust Posted January 6, 2021 Author Share Posted January 6, 2021 I’ve got an LSI sas controller card. I can pull it and see Quote Link to comment
itimpi Posted January 6, 2021 Share Posted January 6, 2021 26 minutes ago, chalkdust said: passed twice. and it crashed again twice in a row within 10 minutes of eachother. im at a loss. any other suggestions? Are you saying that the system crashed running memtest? If so you definitely have a hardware problem that is not Unraid related Obvious candidates would be cooling and power supply. The other option is something wrong at the motherboard/CPU/RAM level but hopefully not as that has no easy fix. Quote Link to comment
chalkdust Posted January 6, 2021 Author Share Posted January 6, 2021 No, sorry. It crashed after I booted back to unraid. Quote Link to comment
chalkdust Posted January 6, 2021 Author Share Posted January 6, 2021 (edited) Quick update: in my effort to troubleshoot my unraid crashing, i've installed a stress tester docker app ( https://hub.docker.com/r/progrium/stress/ ) and when i run it, my CPUS hit near 100% and my motherboard lets out a longgggg beep. It doesn't crash under the test settings ( $ docker run --rm -it progrium/stress --cpu 2 --io 1 --vm 2 --vm-bytes 128M --timeout 10s ) but the temp spikes from ~30deg to 68deg. I actually think i may have heard that same beep before a crash at some point recently. My issue clearly is hardware based. does anyone have any ideas for what to look at first? my cooler? my CPU? EDIT: so it was not actually my motherboard sending out the long, loud beep, it was the UPS i had connected it to! under load, my server took down every device connected to the UPS. something seems up with my power draw? should i swap out for a higher quality power supply or does something seem wrong with the hardware? Edited January 6, 2021 by chalkdust Quote Link to comment
JonathanM Posted January 6, 2021 Share Posted January 6, 2021 12 minutes ago, chalkdust said: it was the UPS i had connected it to! What model? Quote Link to comment
chalkdust Posted January 6, 2021 Author Share Posted January 6, 2021 it was connected to a CyberPower 425VA with an iMac 27" and secondary monitor attached (only the mac is on battery backup and that lost power too). The server has crashed like this in both my server room (on a different power circuit) and in my office. Ive moved it to its own plug now and will test again. Quote Link to comment
chalkdust Posted January 6, 2021 Author Share Posted January 6, 2021 Ive moved it off the UPS and ran docker run --rm -it progrium/stress --cpu 16 --io 1 --vm 2 --vm-bytes 128M --timeout 10s which 100% all my cpu cores for 10 seconds. it did not crash. What does this mean? Do i need a higher powered UPS to run this on? Im not 100% this is the cause of my crashing though as i THINK i had it on its own plug and it still crashed yesterday. Quote Link to comment
trurl Posted January 6, 2021 Share Posted January 6, 2021 2 hours ago, chalkdust said: CyberPower 425VA I wouldn't try to run 2 computers on one that small. Quote Link to comment
chalkdust Posted January 6, 2021 Author Share Posted January 6, 2021 Ok right so it crashed again just before OFF of the UPS. Just during normal use. I’m at a total loss. I may put my old board back together in this case to see if it’s maybe the power supply. Quote Link to comment
F3nris Posted March 16, 2021 Share Posted March 16, 2021 (edited) I ran across this looking for something else, but out of curiosity, did you find the problem? It would appear the memory modules you are using (BL8G32C16U4B.M8FE) are not qualified on the MB. edit: Also, you are running a release candidate. I would suggest staying on stable versions. Edited March 16, 2021 by F3nris Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.