July 29, 20223 yr My server has been randomly shutting down for a while. I've tested basically every possible component, replaced CPUs, RAM, motherboard, PSU, and HBA. Tested the RAM for over 24 hours with no errors. Sometimes it shuts down within a minute or two of booting up, sometimes it lasts for hours. It even lasted around a month at once point. But eventually it always shuts down. It's a Tyan S7012 motherboard. The IPMI remains accessible, and there is no event created. I have the server syslogging to a second unraid server, and I see nothing that identified a problem there. Im not good enough with the diagnostics to find a problem, so I'm hoping someone else can take a look at it and find something. Im wondering if something with unraid is causing it, because ive booted into both windows and linux which have not crashed once. It only seems to occur when booted into unraid. I'm not sure what that could mean. Someone suggested I replace the USB, but it seems unlikely that this would be the cause. I simply don't know what else to look for or test or check. I have this server and one other plugged into the same UPS, and only this one shuts down so I dont think it's a power problem either. Any help is greatly appreciated. my head is raw from all the scratching.. server diags 7-28-22.zip
July 29, 20223 yr Community Expert If you don't see a shutdown event initiated in the remote syslog it's likely a hardware problem, though the hardware you replaced should have fixed that, if a shutdown event is being logged there could be other reasons.
July 29, 20223 yr Author I've replaced every piece of hardware, except the HDDs/SSDs and the USB boot drive. I assume you saw nothing in the diags that pointed to anything? Im just at a complete loss. The weird part to me is that this only happens while booted into unraid. Would it be worth it to replace the USB and reflash unraid to a new usb?
July 30, 20223 yr Community Expert 9 hours ago, 2Piececombo said: I assume you saw nothing in the diags that pointed to anything? No, but you only posted the diags after rebooting, check the syslog server for a shutdown event, or post that after the next one.
July 30, 20223 yr Author 8 hours ago, JorgeB said: No, but you only posted the diags after rebooting, check the syslog server for a shutdown event, or post that after the next one. IIRC, I grabbed those diags right after booting up from the server shutting down on it's own. I will gather another set of diags immediately after it shuts down to be sure, though. Thanks for the help
July 30, 20223 yr 4 minutes ago, 2Piececombo said: I grabbed those diags right after booting up from the server shutting down on it's own. Exactly. Did you read the link for syslog server? It details how to get the logs from before the shutdown event, as the logs are normally lost on restart because they are in RAM.
July 30, 20223 yr Author Oh, I misunderstood what you were asking. The syslog I showed you is from a remote syslog server, so it should include everything right til the very second it died.
August 6, 20223 yr Author Update to this issue. I realized something a few nights ago that I should have figured out a long time ago. When I was having this problem months ago (before it magically went away) I had a GPU installed that I was planning to use for Plex. The shutdowns started happening, and I was so frustrated I gave up setting up plex for transcoding, and eventually took the card out. I left the server offline for sometime after that, still too frustrated to continue dealing with it. When I powered it back up it was fine and no longer crashed. I then put a different GPU in the server to give the plex transcoding another go, and sometime in the next few days/week the shutdowns came back. After I installed the GPU, but before it started shutting down, I had an issue where unraid wouldnt display the GUI through the onboard display output but instead used the GPU and I verified this by plugging in a monitor to the GPU. Top fix this I went into the BIOS and forced it to use the onboard graphics. After rebooting, the boot sequence would show on the onboard video port (not just the mobo boot process, but unraid as well, like the blue screen where you can chose which mode to boot, gui/nongui/safemode/etc) but as soon as it got the the point where it would show the login screen, nothing, just a blank black screen. Ther server was still accessible through the webgui. I posted in the Nvidia plugin support page and the author said it was due to not being on 6.10 (was still on 6.9 something) and said it should be resolved after I update. I was hesitant to try and update unraid, because by now the shutdowns were happening again, and I didnt want it to shutdown mid upgrade. Eventually I did it anyway, and Im now on 6.10.3. The display output issue still isnt solved, but thats an issue for the nvida plugin guy I guess. And the shutdowns continued. Fast forward to a few days ago and It hit me that the shutdowns only seem to happen when I have a GPU installed. I initially ruled out the GPU as the cause, since I ended up switching GPUs, so it's not the specific GPU, but rather seems to happen when ANY gpu is installed. Why this could be, I haven't the foggiest. But to confirm this I took the GPU out yesterday and the server is running fine for 28 hours now - no shutdowns. I continue to be puzzled by this issue and Im hoping that one of you brilliant people has an idea why this would be happening. Cheers for any help as usual Edited August 6, 20223 yr by 2Piececombo grammar, and clarity
August 6, 20223 yr Community Expert Have you considered whether it is a power supply issue? Installing a GPU could be adding a significant extra load on the PSU.
August 7, 20223 yr Author 18 hours ago, itimpi said: Have you considered whether it is a power supply issue? Installing a GPU could be adding a significant extra load on the PSU. I upgraded the PSU a while back, it's an 850w EVGA. the GPU is only a p600, doesnt even require additional power, just the pcie power, so it shouldnt be a power issue
August 12, 20223 yr On 8/7/2022 at 2:33 AM, 2Piececombo said: I upgraded the PSU a while back, it's an 850w EVGA. the GPU is only a p600, doesnt even require additional power, just the pcie power, so it shouldnt be a power issue I have a similar issue since two month. I am quite new to unraid, my experience is only a few months starting from 6.9 to the current 6.11.0-rc3. During this time I've configured unraid constantly so it's quite difficult to me to understand what's changed in my setup that causing these reboots. The only suspicion I had was docker. I've noticed that with docker service up and running (also with docker apps stopped) the issue was more frequent. Sometimes reboot 3 / 4 times in a hour... sometimes it stay up 24 hours.. sometimes days before rebooting. I found several posts regarding a possible problem with docker macvlan/ipvlan ... but the various solutions did not bring improvements. I don't know if it's a sign but after the last update to 6.11.0-rc3 no reboots since 5 days, now. I even bought myself a new motherboard / cpu / ram ready for a replacement as often the answer in the forum is that it is a hardware problem. I'm just waiting for the next reboot before proceeding with replacing most of the hardware. I'm glad there is also a possible culprit in the GPU .... since I don't have an integrated video card, I have a small GPU installed... could i remove it to see if it's the culprit? 🙂 PS:I conclude by saying that I am quite frustrated with this situation as I approached unraid to leave QNAP forever... but my old NAS is still here due to this unraid instability Edited August 12, 20223 yr by alexbn71
August 13, 20223 yr Author On 8/12/2022 at 8:33 AM, alexbn71 said: I have a similar issue since two month. I am quite new to unraid, my experience is only a few months starting from 6.9 to the current 6.11.0-rc3. During this time I've configured unraid constantly so it's quite difficult to me to understand what's changed in my setup that causing these reboots. The only suspicion I had was docker. I've noticed that with docker service up and running (also with docker apps stopped) the issue was more frequent. Sometimes reboot 3 / 4 times in a hour... sometimes it stay up 24 hours.. sometimes days before rebooting. I found several posts regarding a possible problem with docker macvlan/ipvlan ... but the various solutions did not bring improvements. I don't know if it's a sign but after the last update to 6.11.0-rc3 no reboots since 5 days, now. I even bought myself a new motherboard / cpu / ram ready for a replacement as often the answer in the forum is that it is a hardware problem. I'm just waiting for the next reboot before proceeding with replacing most of the hardware. I'm glad there is also a possible culprit in the GPU .... since I don't have an integrated video card, I have a small GPU installed... could i remove it to see if it's the culprit? 🙂 PS:I conclude by saying that I am quite frustrated with this situation as I approached unraid to leave QNAP forever... but my old NAS is still here due to this unraid instability What GPU do you have installed in your server out of curiosity? You said your server reboots, which is slightly different than the shutdown issue that I've been facing. Is your server truly rebooting on it's own? Or is it shutting down and being automatically powered back on via BIOS settings, or manually? Take the GPU out, and run it like that for a while and see if the issue goes away
August 14, 20223 yr I don't rememebr... a very very cheap GPU. Anyway I'll take it out today since the reboot happened again yesterday 🙁 Quote which is slightly different than the shutdown I think no... my server reboot because I've configured the BIOS to restart on power failure... (and generic failure?) It's stay unpowered only if a regular shutdown is invoked. In the next few hours I will try to remove the GPU Edit: removed! it's a GTX650 Edited August 14, 20223 yr by alexbn71
August 15, 20223 yr Author 18 hours ago, alexbn71 said: I don't rememebr... a very very cheap GPU. Anyway I'll take it out today since the reboot happened again yesterday 🙁 I think no... my server reboot because I've configured the BIOS to restart on power failure... (and generic failure?) It's stay unpowered only if a regular shutdown is invoked. In the next few hours I will try to remove the GPU Edit: removed! it's a GTX650 Has it crashed again since?
August 15, 20223 yr I'm close to the milestone of 24 hours without a GPU... I'll keep you updated Update: 3 days passed Edited August 17, 20223 yr by alexbn71
August 17, 20223 yr On 8/15/2022 at 3:21 AM, 2Piececombo said: Has it crashed again since? And yours? crashed again since 6 August?
August 18, 20223 yr Author 14 hours ago, alexbn71 said: And yours? crashed again since 6 August? Nope. Solid as a rock since I took the GPU out..
August 18, 20223 yr 5 hours ago, 2Piececombo said: Nope. Solid as a rock since I took the GPU out.. The only thing that bothers me is that my current motherboard has no integrated GPU and so in the end I would have to replace it with the one I bought recently .... or use it without GPU and in case of problems install it temporarily every time.... what the hell is wrong with GPUs and Unraid? 🤔
August 18, 20223 yr Community Expert On 8/12/2022 at 10:33 AM, alexbn71 said: similar issue attach diagnostics to your NEXT post in this thread
August 19, 20223 yr 16 hours ago, trurl said: attach diagnostics to your NEXT post in this thread Here my diagnostics... mininas-diagnostics-20220819-0821.zip
August 19, 20223 yr 3 hours ago, trurl said: setup syslog server I've already enabled syslog server but I have never found anything interesting about my problem. I've tried also to mirror syslog to flash suspecting that something was not written in time by the syslog server. The problem is that when the reboot occurs nothing is logged in the instant immediately before this event....
August 19, 20223 yr Author 4 hours ago, trurl said: setup syslog server 31 minutes ago, alexbn71 said: I've already enabled syslog server but I have never found anything interesting about my problem. I've tried also to mirror syslog to flash suspecting that something was not written in time by the syslog server. The problem is that when the reboot occurs nothing is logged in the instant immediately before this event.... Same thing for me as well. Syslogging to a second unraid server and there's nothing of note there. The logs just end when the shutdown occurs.
August 19, 20223 yr Community Expert 10 minutes ago, 2Piececombo said: Same thing for me as well. Syslogging to a second unraid server and there's nothing of note there. The logs just end when the shutdown occurs. This strongly suggests something at the hardware level. Culprit I would first look for would be inadequate cooling causing a thermal related shutdown or problems with the power supply.
August 19, 20223 yr Author 7 minutes ago, itimpi said: This strongly suggests something at the hardware level. Culprit I would first look for would be inadequate cooling causing a thermal related shutdown or problems with the power supply. I have extensively tested cooling by booting into another OS and running a stress test for well over an hour. The PSU has been replaced previously as well with a 850w Gold rated supply from evga. It should be noted that this shutdown only occurs while in unraid. I can leave the GPU in and boot into something else and there are no issues. Every piece of hardware has been replaced/tested out. All RAM, both CPUs, the mobo, HBA, and PSU. There is no hardware issue here. This shutdown have occurred with 2 different GPUs. Neither of which even use additional power from the PSU
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.