2Piececombo Posted July 29, 2022 Share Posted July 29, 2022 My server has been randomly shutting down for a while. I've tested basically every possible component, replaced CPUs, RAM, motherboard, PSU, and HBA. Tested the RAM for over 24 hours with no errors. Sometimes it shuts down within a minute or two of booting up, sometimes it lasts for hours. It even lasted around a month at once point. But eventually it always shuts down. It's a Tyan S7012 motherboard. The IPMI remains accessible, and there is no event created. I have the server syslogging to a second unraid server, and I see nothing that identified a problem there. Im not good enough with the diagnostics to find a problem, so I'm hoping someone else can take a look at it and find something. Im wondering if something with unraid is causing it, because ive booted into both windows and linux which have not crashed once. It only seems to occur when booted into unraid. I'm not sure what that could mean. Someone suggested I replace the USB, but it seems unlikely that this would be the cause. I simply don't know what else to look for or test or check. I have this server and one other plugged into the same UPS, and only this one shuts down so I dont think it's a power problem either. Any help is greatly appreciated. my head is raw from all the scratching.. server diags 7-28-22.zip Quote Link to comment
JorgeB Posted July 29, 2022 Share Posted July 29, 2022 If you don't see a shutdown event initiated in the remote syslog it's likely a hardware problem, though the hardware you replaced should have fixed that, if a shutdown event is being logged there could be other reasons. Quote Link to comment
2Piececombo Posted July 29, 2022 Author Share Posted July 29, 2022 I've replaced every piece of hardware, except the HDDs/SSDs and the USB boot drive. I assume you saw nothing in the diags that pointed to anything? Im just at a complete loss. The weird part to me is that this only happens while booted into unraid. Would it be worth it to replace the USB and reflash unraid to a new usb? Quote Link to comment
JorgeB Posted July 30, 2022 Share Posted July 30, 2022 9 hours ago, 2Piececombo said: I assume you saw nothing in the diags that pointed to anything? No, but you only posted the diags after rebooting, check the syslog server for a shutdown event, or post that after the next one. Quote Link to comment
2Piececombo Posted July 30, 2022 Author Share Posted July 30, 2022 8 hours ago, JorgeB said: No, but you only posted the diags after rebooting, check the syslog server for a shutdown event, or post that after the next one. IIRC, I grabbed those diags right after booting up from the server shutting down on it's own. I will gather another set of diags immediately after it shuts down to be sure, though. Thanks for the help Quote Link to comment
JonathanM Posted July 30, 2022 Share Posted July 30, 2022 4 minutes ago, 2Piececombo said: I grabbed those diags right after booting up from the server shutting down on it's own. Exactly. Did you read the link for syslog server? It details how to get the logs from before the shutdown event, as the logs are normally lost on restart because they are in RAM. Quote Link to comment
2Piececombo Posted July 30, 2022 Author Share Posted July 30, 2022 Oh, I misunderstood what you were asking. The syslog I showed you is from a remote syslog server, so it should include everything right til the very second it died. Quote Link to comment
2Piececombo Posted August 6, 2022 Author Share Posted August 6, 2022 (edited) Update to this issue. I realized something a few nights ago that I should have figured out a long time ago. When I was having this problem months ago (before it magically went away) I had a GPU installed that I was planning to use for Plex. The shutdowns started happening, and I was so frustrated I gave up setting up plex for transcoding, and eventually took the card out. I left the server offline for sometime after that, still too frustrated to continue dealing with it. When I powered it back up it was fine and no longer crashed. I then put a different GPU in the server to give the plex transcoding another go, and sometime in the next few days/week the shutdowns came back. After I installed the GPU, but before it started shutting down, I had an issue where unraid wouldnt display the GUI through the onboard display output but instead used the GPU and I verified this by plugging in a monitor to the GPU. Top fix this I went into the BIOS and forced it to use the onboard graphics. After rebooting, the boot sequence would show on the onboard video port (not just the mobo boot process, but unraid as well, like the blue screen where you can chose which mode to boot, gui/nongui/safemode/etc) but as soon as it got the the point where it would show the login screen, nothing, just a blank black screen. Ther server was still accessible through the webgui. I posted in the Nvidia plugin support page and the author said it was due to not being on 6.10 (was still on 6.9 something) and said it should be resolved after I update. I was hesitant to try and update unraid, because by now the shutdowns were happening again, and I didnt want it to shutdown mid upgrade. Eventually I did it anyway, and Im now on 6.10.3. The display output issue still isnt solved, but thats an issue for the nvida plugin guy I guess. And the shutdowns continued. Fast forward to a few days ago and It hit me that the shutdowns only seem to happen when I have a GPU installed. I initially ruled out the GPU as the cause, since I ended up switching GPUs, so it's not the specific GPU, but rather seems to happen when ANY gpu is installed. Why this could be, I haven't the foggiest. But to confirm this I took the GPU out yesterday and the server is running fine for 28 hours now - no shutdowns. I continue to be puzzled by this issue and Im hoping that one of you brilliant people has an idea why this would be happening. Cheers for any help as usual Edited August 6, 2022 by 2Piececombo grammar, and clarity Quote Link to comment
itimpi Posted August 6, 2022 Share Posted August 6, 2022 Have you considered whether it is a power supply issue? Installing a GPU could be adding a significant extra load on the PSU. Quote Link to comment
2Piececombo Posted August 7, 2022 Author Share Posted August 7, 2022 18 hours ago, itimpi said: Have you considered whether it is a power supply issue? Installing a GPU could be adding a significant extra load on the PSU. I upgraded the PSU a while back, it's an 850w EVGA. the GPU is only a p600, doesnt even require additional power, just the pcie power, so it shouldnt be a power issue Quote Link to comment
alexbn71 Posted August 12, 2022 Share Posted August 12, 2022 (edited) On 8/7/2022 at 2:33 AM, 2Piececombo said: I upgraded the PSU a while back, it's an 850w EVGA. the GPU is only a p600, doesnt even require additional power, just the pcie power, so it shouldnt be a power issue I have a similar issue since two month. I am quite new to unraid, my experience is only a few months starting from 6.9 to the current 6.11.0-rc3. During this time I've configured unraid constantly so it's quite difficult to me to understand what's changed in my setup that causing these reboots. The only suspicion I had was docker. I've noticed that with docker service up and running (also with docker apps stopped) the issue was more frequent. Sometimes reboot 3 / 4 times in a hour... sometimes it stay up 24 hours.. sometimes days before rebooting. I found several posts regarding a possible problem with docker macvlan/ipvlan ... but the various solutions did not bring improvements. I don't know if it's a sign but after the last update to 6.11.0-rc3 no reboots since 5 days, now. I even bought myself a new motherboard / cpu / ram ready for a replacement as often the answer in the forum is that it is a hardware problem. I'm just waiting for the next reboot before proceeding with replacing most of the hardware. I'm glad there is also a possible culprit in the GPU .... since I don't have an integrated video card, I have a small GPU installed... could i remove it to see if it's the culprit? 🙂 PS:I conclude by saying that I am quite frustrated with this situation as I approached unraid to leave QNAP forever... but my old NAS is still here due to this unraid instability Edited August 12, 2022 by alexbn71 Quote Link to comment
2Piececombo Posted August 13, 2022 Author Share Posted August 13, 2022 On 8/12/2022 at 8:33 AM, alexbn71 said: I have a similar issue since two month. I am quite new to unraid, my experience is only a few months starting from 6.9 to the current 6.11.0-rc3. During this time I've configured unraid constantly so it's quite difficult to me to understand what's changed in my setup that causing these reboots. The only suspicion I had was docker. I've noticed that with docker service up and running (also with docker apps stopped) the issue was more frequent. Sometimes reboot 3 / 4 times in a hour... sometimes it stay up 24 hours.. sometimes days before rebooting. I found several posts regarding a possible problem with docker macvlan/ipvlan ... but the various solutions did not bring improvements. I don't know if it's a sign but after the last update to 6.11.0-rc3 no reboots since 5 days, now. I even bought myself a new motherboard / cpu / ram ready for a replacement as often the answer in the forum is that it is a hardware problem. I'm just waiting for the next reboot before proceeding with replacing most of the hardware. I'm glad there is also a possible culprit in the GPU .... since I don't have an integrated video card, I have a small GPU installed... could i remove it to see if it's the culprit? 🙂 PS:I conclude by saying that I am quite frustrated with this situation as I approached unraid to leave QNAP forever... but my old NAS is still here due to this unraid instability What GPU do you have installed in your server out of curiosity? You said your server reboots, which is slightly different than the shutdown issue that I've been facing. Is your server truly rebooting on it's own? Or is it shutting down and being automatically powered back on via BIOS settings, or manually? Take the GPU out, and run it like that for a while and see if the issue goes away Quote Link to comment
alexbn71 Posted August 14, 2022 Share Posted August 14, 2022 (edited) I don't rememebr... a very very cheap GPU. Anyway I'll take it out today since the reboot happened again yesterday 🙁 Quote which is slightly different than the shutdown I think no... my server reboot because I've configured the BIOS to restart on power failure... (and generic failure?) It's stay unpowered only if a regular shutdown is invoked. In the next few hours I will try to remove the GPU Edit: removed! it's a GTX650 Edited August 14, 2022 by alexbn71 Quote Link to comment
2Piececombo Posted August 15, 2022 Author Share Posted August 15, 2022 18 hours ago, alexbn71 said: I don't rememebr... a very very cheap GPU. Anyway I'll take it out today since the reboot happened again yesterday 🙁 I think no... my server reboot because I've configured the BIOS to restart on power failure... (and generic failure?) It's stay unpowered only if a regular shutdown is invoked. In the next few hours I will try to remove the GPU Edit: removed! it's a GTX650 Has it crashed again since? Quote Link to comment
alexbn71 Posted August 15, 2022 Share Posted August 15, 2022 (edited) I'm close to the milestone of 24 hours without a GPU... I'll keep you updated Update: 3 days passed Edited August 17, 2022 by alexbn71 Quote Link to comment
alexbn71 Posted August 17, 2022 Share Posted August 17, 2022 On 8/15/2022 at 3:21 AM, 2Piececombo said: Has it crashed again since? And yours? crashed again since 6 August? Quote Link to comment
2Piececombo Posted August 18, 2022 Author Share Posted August 18, 2022 14 hours ago, alexbn71 said: And yours? crashed again since 6 August? Nope. Solid as a rock since I took the GPU out.. Quote Link to comment
alexbn71 Posted August 18, 2022 Share Posted August 18, 2022 5 hours ago, 2Piececombo said: Nope. Solid as a rock since I took the GPU out.. The only thing that bothers me is that my current motherboard has no integrated GPU and so in the end I would have to replace it with the one I bought recently .... or use it without GPU and in case of problems install it temporarily every time.... what the hell is wrong with GPUs and Unraid? 🤔 Quote Link to comment
trurl Posted August 18, 2022 Share Posted August 18, 2022 On 8/12/2022 at 10:33 AM, alexbn71 said: similar issue attach diagnostics to your NEXT post in this thread Quote Link to comment
alexbn71 Posted August 19, 2022 Share Posted August 19, 2022 16 hours ago, trurl said: attach diagnostics to your NEXT post in this thread Here my diagnostics... mininas-diagnostics-20220819-0821.zip Quote Link to comment
alexbn71 Posted August 19, 2022 Share Posted August 19, 2022 3 hours ago, trurl said: setup syslog server I've already enabled syslog server but I have never found anything interesting about my problem. I've tried also to mirror syslog to flash suspecting that something was not written in time by the syslog server. The problem is that when the reboot occurs nothing is logged in the instant immediately before this event.... Quote Link to comment
2Piececombo Posted August 19, 2022 Author Share Posted August 19, 2022 4 hours ago, trurl said: setup syslog server 31 minutes ago, alexbn71 said: I've already enabled syslog server but I have never found anything interesting about my problem. I've tried also to mirror syslog to flash suspecting that something was not written in time by the syslog server. The problem is that when the reboot occurs nothing is logged in the instant immediately before this event.... Same thing for me as well. Syslogging to a second unraid server and there's nothing of note there. The logs just end when the shutdown occurs. Quote Link to comment
itimpi Posted August 19, 2022 Share Posted August 19, 2022 10 minutes ago, 2Piececombo said: Same thing for me as well. Syslogging to a second unraid server and there's nothing of note there. The logs just end when the shutdown occurs. This strongly suggests something at the hardware level. Culprit I would first look for would be inadequate cooling causing a thermal related shutdown or problems with the power supply. Quote Link to comment
2Piececombo Posted August 19, 2022 Author Share Posted August 19, 2022 7 minutes ago, itimpi said: This strongly suggests something at the hardware level. Culprit I would first look for would be inadequate cooling causing a thermal related shutdown or problems with the power supply. I have extensively tested cooling by booting into another OS and running a stress test for well over an hour. The PSU has been replaced previously as well with a 850w Gold rated supply from evga. It should be noted that this shutdown only occurs while in unraid. I can leave the GPU in and boot into something else and there are no issues. Every piece of hardware has been replaced/tested out. All RAM, both CPUs, the mobo, HBA, and PSU. There is no hardware issue here. This shutdown have occurred with 2 different GPUs. Neither of which even use additional power from the PSU Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.