cowgill Posted February 1, 2021 Share Posted February 1, 2021 (edited) So, a few weeks ago my server started crashing/rebooting every few minutes to few hours, never lasting more than 6 hours or so at a time. I do not expect it is a hardware issue related to CPU/MB/RAM/Power Supply, as I've taken troubleshooting this as an opportunity to perform some upgrades and these are now new, yet the . If anyone sees any clues as to what may be going on or what to try it would be greatly appreciated. tower-diagnostics-20210131-0726.zip Unraid Version 6.8.3 CPU: Intel Core i7-9700K Coffee Lake 3.6GHz Eight-Core MB: Gigabyte Z390 Aorus Pro WiFi Intel LGA 1151 ATX RAM: Crucial Ballistix Gaming 32GB (2 x 16GB) DDR4-3600 PC4-28800 (not running it anywhere near that speed...2666 I believe right now) PSU: SEASONIC FOCUS+ 750W 80+G FM ATX Hardware items which were not upgraded: LSI Logic SAS9211-8I 8PORT Int 6GB Sata+SAS Pcie 2.0 (appears to have latest firmware installed, v20.00.07.00) Old (fanless) ASUS Video Card for booting prior to setting up iGPU as default, but still installed (2) 1TB SSD cache drives (1) 8TB parity drive (9) data drives (various sizes) Edited March 10, 2021 by cowgill Quote Link to comment
Squid Posted February 2, 2021 Share Posted February 2, 2021 3 hours ago, cowgill said: I do not expect it is a hardware issue related to CPU/MB/RAM/Power Supply, as I've taken troubleshooting this as an opportunity to perform some upgrades and these are now new, yet the . Have you at least run a MEMTEST from the boot menu for at least a pass or two (ideally a day)? Are you overclocking the CPU? XMP enabled on the RAM? Quote Link to comment
cowgill Posted February 2, 2021 Author Share Posted February 2, 2021 20 hours ago, Squid said: Have you at least run a MEMTEST from the boot menu for at least a pass or two (ideally a day)? Are you overclocking the CPU? XMP enabled on the RAM? Appreciate the feedback! No overclocking. I did not have XMP enabled and lowered the RAM speed manually to 2666 from whatever it defaulted at. I did not mess with any timings. I had not run MEMTEST as you may have suspected due to my faulty logic above. I started MEMTEST, and about 5.5 hours later I get an email from my server that it has started a parity check. I guess MEMTEST crashed and the system rebooted? When MEMTEST fails in this fashion (at least I think that means it failed) does that typically imply memory issues? I am not too familiar with digging into RAM issues. I checked into the RAM / MB settings and it was at 1.20v when Crucial recommends 1.35v. I changed that as well as changing all the timings to Crucial's recommended settings and am running MEMTEST again now. Is that a reasonable course of action or should I retard some of the timings while keeping the voltage at recommended in order to gain more stability? Will report back on MEMTEST results. Quote Link to comment
ChatNoir Posted February 3, 2021 Share Posted February 3, 2021 Are you sure that you don't have power issues ? Are you using a UPS ? Quote Link to comment
cowgill Posted February 3, 2021 Author Share Posted February 3, 2021 (edited) 9 hours ago, ChatNoir said: Are you sure that you don't have power issues ? Are you using a UPS ? I have a UPS but disconnected it when I first started having issues. Your reply inspired me to check the PSU cables...the 2x2 motherboard power cable was not connected. 😭 Apparently that is optional unless you need extra juice though? Anyway I connected it. Edited February 3, 2021 by cowgill Quote Link to comment
cowgill Posted February 3, 2021 Author Share Posted February 3, 2021 8 minutes ago, cowgill said: the 2x2 motherboard power cable was not connected and still rebooting during MEMTEST 😔 Quote Link to comment
cowgill Posted February 7, 2021 Author Share Posted February 7, 2021 (edited) I found a bad stick of memory and made it through 24hrs of MEMTEST w/ no errors, but that doesn't solve the issue of the server rebooting itself while in Unraid. Using two completely seperate sets of PSU/MB/CPU/RAM and I have the same issue. Both sets of hardware make it through MEMTEST fine now that I eliminated the bad stick of RAM. Unraid safe mode is not safe, still reboots randomly there as well. Potential problems (things I haven't tried): SATA cables, LSI SAS card, Hard drives, USB boot drive. Any ideas on how best to troubleshoot? The USB drive checked out fine in my Windows machine, is it worth just trying a new one? I don't have enough SATA slots without the SAS card or I would pull that. I also don't have enough spare cables to try all that as well. Is there anything else that I'm missing? Edited February 7, 2021 by cowgill Quote Link to comment
itimpi Posted February 7, 2021 Share Posted February 7, 2021 Safe mode just stops plugins from running. Have you tried disabling the Docker and VM services so that you are running as a basic NAS? I have always wondered if those services should also be stopped from running in Safe Mode. Quote Link to comment
cowgill Posted February 15, 2021 Author Share Posted February 15, 2021 (edited) On 2/7/2021 at 11:25 AM, itimpi said: Safe mode just stops plugins from running. Have you tried disabling the Docker and VM services so that you are running as a basic NAS? I have always wondered if those services should also be stopped from running in Safe Mode. I had no VM's running and that service was disabled. I tried disabling Docker on your advice, that did not make a difference for me. I have since tried replacing my flash boot drive, that did not help either. Since the start of the thread I've ditched all the new hardware and gone back to my old stuff (first gen i7 w/ Gigabyte GA-X58A-UD3R). As far as I know the only things I have left to try are reseating/new SATA cables and a new SATA expansion card. I currently have the SAS9211-8I. My motherboard has 10x onboard SATA and I could get away with a 4xSATA expansion card, so I ordered an ASMedia ASM1064 (IO CREST SI-PEX40156). I figure even if that isn't the issue that card is flexible for use under other circumstances being 1x slot friendly. Also I guess I'll order a bunch of SATA cables to swap out just in case. If that doesnt work I'm out of ideas. Edited February 15, 2021 by cowgill Quote Link to comment
cowgill Posted March 10, 2021 Author Share Posted March 10, 2021 Okay...ready to mark this solved. ... The reset button on my case was shorting out. When I disconnected it from the motherboard case header my issues were resolved. 🥴 Other minor item: The IO CREST SI-PEX40156 if installed would not allow the Gigabyte GA-X58A-UD3R to get far enough booting to even get to the BIOS. The card boots fine installed in a modern (LGA 1200) motherboard. No idea what was going on there. Wanted to put that data point out there in case others have issues in the future as I have learned finding unresolved issue threads is beyond frustrating. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.