Random shut downs.


Recommended Posts

My server has been randomly shutting down for a while. I've tested basically every possible component, replaced CPUs, RAM, motherboard, PSU, and HBA. Tested the RAM for over 24 hours with no errors. Sometimes it shuts down within a minute or two of booting up, sometimes it lasts for hours. It even lasted around a month at once point.  But eventually it always shuts down. It's a Tyan S7012 motherboard. The IPMI remains accessible, and there is no event created. I have the server syslogging to a second unraid server, and I see nothing that identified a problem there. Im not good enough with the diagnostics to find a problem, so I'm hoping someone else can take a look at it and find something. Im wondering if something with unraid is causing it, because ive booted into both windows and linux which have not crashed once. It only seems to occur when booted into unraid. I'm not sure what that could mean. Someone suggested I replace the USB, but it seems unlikely that this would be the cause. I simply don't know what else to look for or test or check. 

I have this server and one other plugged into the same UPS, and only this one shuts down so I dont think it's a power problem either.

 

Any help is greatly appreciated. my head is raw from all the scratching..

server diags 7-28-22.zip

Link to comment

I've replaced every piece of hardware, except the HDDs/SSDs and the USB boot drive. I assume you saw nothing in the diags that pointed to anything? Im just at a complete loss. The weird part to me is that this only happens while booted into unraid. Would it be worth it to replace the USB and reflash unraid to a new usb?

Link to comment
8 hours ago, JorgeB said:

No, but you only posted the diags after rebooting, check the syslog server for a shutdown event, or post that after the next one.

IIRC, I grabbed those diags right after booting up from the server shutting down on it's own. I will gather another set of diags immediately after it shuts down to be sure, though. 

Thanks for the help

Link to comment
Posted (edited)

Update to this issue. I realized something a few nights ago that I should have figured out a long time ago. When I was having this problem months ago (before it magically went away) I had a GPU installed that I was planning to use for Plex. The shutdowns started happening, and I was so frustrated I gave up setting up plex for transcoding, and eventually took the card out. I left the server offline for sometime after that, still too frustrated to continue dealing with it. When I powered it back up it was fine and no longer crashed. I then put a different GPU in the server to give the plex transcoding another go, and sometime in the next few days/week the shutdowns came back. After I installed the GPU, but before it started shutting down, I had an issue where unraid wouldnt display the GUI through the onboard display output but instead used the GPU and I verified this by plugging in a monitor to the GPU.  Top fix this I went into the BIOS and forced it to use the onboard graphics. After rebooting, the boot sequence would show on the onboard video port (not just the mobo boot process, but unraid as well, like the blue screen where you can chose which mode to boot, gui/nongui/safemode/etc) but as soon as it got the the point where it would show the login screen, nothing, just a blank black screen. Ther server was still accessible through the webgui. I posted in the Nvidia plugin support page and the author said it was due to not being on 6.10 (was still on 6.9 something) and said it should be resolved after I update. I was hesitant to try and update unraid, because by now the shutdowns were happening again, and I didnt want it to shutdown mid upgrade. Eventually I did it anyway, and Im now on 6.10.3. The display output issue still isnt solved, but thats an issue for the nvida plugin guy I guess. And the shutdowns continued. Fast forward  to a few days ago and It hit me that the shutdowns only seem to happen when I have a GPU installed. I initially ruled out the GPU as the cause, since I ended up switching GPUs, so it's not the specific GPU, but rather seems to happen when ANY gpu is installed. Why this could be, I haven't the foggiest. But to confirm this I took the GPU out yesterday and the server is running fine for 28 hours now - no shutdowns. 

 

I continue to be puzzled by this issue and Im hoping that one of you brilliant people has an idea why this would be happening. 

Cheers for any help as usual

Edited by 2Piececombo
grammar, and clarity
Link to comment
On 8/7/2022 at 2:33 AM, 2Piececombo said:

I upgraded the PSU a while back, it's an 850w EVGA. the GPU is only a p600, doesnt even require additional power, just the pcie power, so it shouldnt be a power issue

 

I have a similar issue since two month. I am quite new to unraid, my experience is only a few months starting from 6.9 to the current 6.11.0-rc3. During this time I've configured unraid constantly so it's quite difficult to me to understand what's changed in my setup that causing these reboots. The only suspicion I had was docker. I've noticed that with docker service up and running (also with docker apps stopped) the issue was more frequent. Sometimes reboot 3 / 4 times in a hour... sometimes it stay up 24 hours.. sometimes days before rebooting. I found several posts regarding a possible problem with docker macvlan/ipvlan ... but the various solutions did not bring improvements. 

I don't know if it's a sign but after the last update to 6.11.0-rc3 no reboots since 5 days, now.

I even bought myself a new motherboard / cpu / ram ready for a replacement as often the answer in the forum is that it is a hardware problem. I'm just waiting for the next reboot before proceeding with replacing most of the hardware.

I'm glad there is also a possible culprit in the GPU .... since I don't have an integrated video card, I have a small GPU installed... could i remove it to see if it's the culprit? 🙂

 

PS:I conclude by saying that I am quite frustrated with this situation as I approached unraid to leave QNAP forever... but my old NAS is still here due to this unraid instability

Edited by alexbn71
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.