Random shut downs.


Recommended Posts

My server has been randomly shutting down for a while. I've tested basically every possible component, replaced CPUs, RAM, motherboard, PSU, and HBA. Tested the RAM for over 24 hours with no errors. Sometimes it shuts down within a minute or two of booting up, sometimes it lasts for hours. It even lasted around a month at once point.  But eventually it always shuts down. It's a Tyan S7012 motherboard. The IPMI remains accessible, and there is no event created. I have the server syslogging to a second unraid server, and I see nothing that identified a problem there. Im not good enough with the diagnostics to find a problem, so I'm hoping someone else can take a look at it and find something. Im wondering if something with unraid is causing it, because ive booted into both windows and linux which have not crashed once. It only seems to occur when booted into unraid. I'm not sure what that could mean. Someone suggested I replace the USB, but it seems unlikely that this would be the cause. I simply don't know what else to look for or test or check. 

I have this server and one other plugged into the same UPS, and only this one shuts down so I dont think it's a power problem either.

 

Any help is greatly appreciated. my head is raw from all the scratching..

server diags 7-28-22.zip

Link to comment

I've replaced every piece of hardware, except the HDDs/SSDs and the USB boot drive. I assume you saw nothing in the diags that pointed to anything? Im just at a complete loss. The weird part to me is that this only happens while booted into unraid. Would it be worth it to replace the USB and reflash unraid to a new usb?

Link to comment
8 hours ago, JorgeB said:

No, but you only posted the diags after rebooting, check the syslog server for a shutdown event, or post that after the next one.

IIRC, I grabbed those diags right after booting up from the server shutting down on it's own. I will gather another set of diags immediately after it shuts down to be sure, though. 

Thanks for the help

Link to comment

Update to this issue. I realized something a few nights ago that I should have figured out a long time ago. When I was having this problem months ago (before it magically went away) I had a GPU installed that I was planning to use for Plex. The shutdowns started happening, and I was so frustrated I gave up setting up plex for transcoding, and eventually took the card out. I left the server offline for sometime after that, still too frustrated to continue dealing with it. When I powered it back up it was fine and no longer crashed. I then put a different GPU in the server to give the plex transcoding another go, and sometime in the next few days/week the shutdowns came back. After I installed the GPU, but before it started shutting down, I had an issue where unraid wouldnt display the GUI through the onboard display output but instead used the GPU and I verified this by plugging in a monitor to the GPU.  Top fix this I went into the BIOS and forced it to use the onboard graphics. After rebooting, the boot sequence would show on the onboard video port (not just the mobo boot process, but unraid as well, like the blue screen where you can chose which mode to boot, gui/nongui/safemode/etc) but as soon as it got the the point where it would show the login screen, nothing, just a blank black screen. Ther server was still accessible through the webgui. I posted in the Nvidia plugin support page and the author said it was due to not being on 6.10 (was still on 6.9 something) and said it should be resolved after I update. I was hesitant to try and update unraid, because by now the shutdowns were happening again, and I didnt want it to shutdown mid upgrade. Eventually I did it anyway, and Im now on 6.10.3. The display output issue still isnt solved, but thats an issue for the nvida plugin guy I guess. And the shutdowns continued. Fast forward  to a few days ago and It hit me that the shutdowns only seem to happen when I have a GPU installed. I initially ruled out the GPU as the cause, since I ended up switching GPUs, so it's not the specific GPU, but rather seems to happen when ANY gpu is installed. Why this could be, I haven't the foggiest. But to confirm this I took the GPU out yesterday and the server is running fine for 28 hours now - no shutdowns. 

 

I continue to be puzzled by this issue and Im hoping that one of you brilliant people has an idea why this would be happening. 

Cheers for any help as usual

Edited by 2Piececombo
grammar, and clarity
Link to comment
18 hours ago, itimpi said:

Have you considered whether it is a power supply issue?    Installing a GPU could be adding a significant extra load on the PSU.

I upgraded the PSU a while back, it's an 850w EVGA. the GPU is only a p600, doesnt even require additional power, just the pcie power, so it shouldnt be a power issue

Link to comment
On 8/7/2022 at 2:33 AM, 2Piececombo said:

I upgraded the PSU a while back, it's an 850w EVGA. the GPU is only a p600, doesnt even require additional power, just the pcie power, so it shouldnt be a power issue

 

I have a similar issue since two month. I am quite new to unraid, my experience is only a few months starting from 6.9 to the current 6.11.0-rc3. During this time I've configured unraid constantly so it's quite difficult to me to understand what's changed in my setup that causing these reboots. The only suspicion I had was docker. I've noticed that with docker service up and running (also with docker apps stopped) the issue was more frequent. Sometimes reboot 3 / 4 times in a hour... sometimes it stay up 24 hours.. sometimes days before rebooting. I found several posts regarding a possible problem with docker macvlan/ipvlan ... but the various solutions did not bring improvements. 

I don't know if it's a sign but after the last update to 6.11.0-rc3 no reboots since 5 days, now.

I even bought myself a new motherboard / cpu / ram ready for a replacement as often the answer in the forum is that it is a hardware problem. I'm just waiting for the next reboot before proceeding with replacing most of the hardware.

I'm glad there is also a possible culprit in the GPU .... since I don't have an integrated video card, I have a small GPU installed... could i remove it to see if it's the culprit? 🙂

 

PS:I conclude by saying that I am quite frustrated with this situation as I approached unraid to leave QNAP forever... but my old NAS is still here due to this unraid instability

Edited by alexbn71
Link to comment
On 8/12/2022 at 8:33 AM, alexbn71 said:

 

I have a similar issue since two month. I am quite new to unraid, my experience is only a few months starting from 6.9 to the current 6.11.0-rc3. During this time I've configured unraid constantly so it's quite difficult to me to understand what's changed in my setup that causing these reboots. The only suspicion I had was docker. I've noticed that with docker service up and running (also with docker apps stopped) the issue was more frequent. Sometimes reboot 3 / 4 times in a hour... sometimes it stay up 24 hours.. sometimes days before rebooting. I found several posts regarding a possible problem with docker macvlan/ipvlan ... but the various solutions did not bring improvements. 

I don't know if it's a sign but after the last update to 6.11.0-rc3 no reboots since 5 days, now.

I even bought myself a new motherboard / cpu / ram ready for a replacement as often the answer in the forum is that it is a hardware problem. I'm just waiting for the next reboot before proceeding with replacing most of the hardware.

I'm glad there is also a possible culprit in the GPU .... since I don't have an integrated video card, I have a small GPU installed... could i remove it to see if it's the culprit? 🙂

 

PS:I conclude by saying that I am quite frustrated with this situation as I approached unraid to leave QNAP forever... but my old NAS is still here due to this unraid instability

 

What GPU do you have installed in your server out of curiosity? You said your server reboots, which is slightly different than the shutdown issue that I've been facing. Is your server truly rebooting on it's own? Or is it shutting down and being automatically powered back on via BIOS settings, or manually?

 

Take the GPU out, and run it like that for a while and see if the issue goes away

Link to comment

I don't rememebr... a very very cheap GPU. Anyway I'll take it out today since the reboot happened again yesterday 🙁

Quote

 which is slightly different than the shutdown

I think no... my server reboot because I've configured the BIOS to restart on power failure... (and generic failure?) It's stay unpowered only if a regular shutdown is invoked.

 

In the next few hours I will try to remove the GPU

 

Edit: removed! it's a GTX650

Edited by alexbn71
Link to comment
18 hours ago, alexbn71 said:

I don't rememebr... a very very cheap GPU. Anyway I'll take it out today since the reboot happened again yesterday 🙁

I think no... my server reboot because I've configured the BIOS to restart on power failure... (and generic failure?) It's stay unpowered only if a regular shutdown is invoked.

 

In the next few hours I will try to remove the GPU

 

Edit: removed! it's a GTX650

Has it crashed again since?

Link to comment
5 hours ago, 2Piececombo said:

Nope. Solid as a rock since I took the GPU out..

 

The only thing that bothers me is that my current motherboard has no integrated GPU and so in the end I would have to replace it with the one I bought recently .... or use it without GPU and in case of problems install it temporarily every time.... what the hell is wrong with GPUs and Unraid? 🤔

Link to comment
4 hours ago, trurl said:

 

31 minutes ago, alexbn71 said:

I've already enabled syslog server but I have never found anything interesting about my problem. I've tried also to mirror syslog to flash suspecting that something was not written in time by the syslog server. The problem is that when the reboot occurs nothing is logged in the instant immediately before this event....

 

image.thumb.png.ba9101be7c547c82dd8e753234932a42.png

 

Same thing for me as well. Syslogging to a second unraid server and there's nothing of note there. The logs just end when the shutdown occurs. 

Link to comment
10 minutes ago, 2Piececombo said:

Same thing for me as well. Syslogging to a second unraid server and there's nothing of note there. The logs just end when the shutdown occurs. 

This strongly suggests something at the hardware level.  Culprit I would first look for would be inadequate cooling causing a thermal related shutdown or problems with the power supply.

Link to comment
7 minutes ago, itimpi said:

This strongly suggests something at the hardware level.  Culprit I would first look for would be inadequate cooling causing a thermal related shutdown or problems with the power supply.

I have extensively tested cooling by booting into another OS and running a stress test for well over an hour. The PSU has been replaced previously as well with a 850w Gold rated supply from evga.  It should be noted that this shutdown only occurs while in unraid. I can leave the GPU in and boot into something else and there are no issues.

Every piece of hardware has been replaced/tested out. All RAM, both CPUs, the mobo, HBA, and PSU. There is no hardware issue here. This shutdown have occurred with 2 different GPUs. Neither of which even use additional power from the PSU

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.