March 29, 20224 yr My server has been running great for a couple years now until recently. I made the mistake of making several changes at once, so I can't pinpoint what the issue is. Things I recently did: replaced a GPU removed Nvidia drivers to use new GPU for passthrough upgraded CPUs upgraded Unraid added new windows 11 VM I noticed this happening after the CPUs and GPU upgrade (which is the same time I removed the Nvidia drivers). The weird thing is I first noticed an issue when rebooting. The server would go down for the reboot, but wouldn't turn back on automatically. I had to manually turn it back on. Fast forward a few hours later and I notice I can't connect to the server. I go and check it and discover it's turned off. I boot it back up and it's showing an unclean shutdown. This happened a few more times so I swapped the power supply thinking that was the issue. Still happening at random. I ended up swapping out the CPUs to my old ones to make sure they weren't the issue. It seems to run more stable for a bit but then another shutdown. Today only I've already had it crash three times while trying to get diagnostics and syslog saved. Any help is greatly appreciated. I've safely powered the server off for now while I wait to troubleshoot more. krieger-diagnostics-20220329-1627.zip
March 29, 20224 yr Community Expert Set up the Syslog server as instructed in this link: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/page/2/?tab=comments#comment-781601 I would use the "Mirror syslog to flash: " one as your problem seems to occur within a few hours. Post up that syslog after the problem occurs in a new post. ============= 7 minutes ago, m1a8x2 said: Things I recently did: replaced a GPU removed Nvidia drivers to use new GPU for passthrough upgraded CPUs upgraded Unraid added new windows 11 VM You have made a large number of changes. Did you do them all at once or did you do one as a time? --- Hopefully with enough time between to allow you to determine which one triggered the problem I would like to suggest that after seeing that you added a new GPU and CPU to the system, did you evaluate the increased current consumption to assure that you did not exceed any current rating on the power supply? (Modern PS do not tolerate any current and/or power rating being exceeded before they shutdown. Be careful as there is a lot of specmanship that happens with PS's!)
March 29, 20224 yr Author 27 minutes ago, Frank1940 said: Set up the Syslog server as instructed in this link: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/page/2/?tab=comments#comment-781601 I would use the "Mirror syslog to flash: " one as your problem seems to occur within a few hours. Post up that syslog after the problem occurs in a new post. ============= You have made a large number of changes. Did you do them all at once or did you do one as a time? --- Hopefully with enough time between to allow you to determine which one triggered the problem I would like to suggest that after seeing that you added a new GPU and CPU to the system, did you evaluate the increased current consumption to assure that you did not exceed any current rating on the power supply? (Modern PS do not tolerate any current and/or power rating being exceeded before they shutdown. Be careful as there is a lot of specmanship that happens with PS's!) Sorry, I forgot to attach the syslog. I had already setup to mirror. Attaching now. I forgot to mention I also added my USB PCI-e card for passthrough (which was previously used in my server for a VM passthrough). I did the CPU swap and removed the GPU in one go. I then removed the Nvidia driver and next added the new GPU. Probably not enough time in between, but I did undo each change one at a time. I removed the GPU and USB card first (but didn't put back the old one or the Nvidia driver), then I replaced the power supply (500w to an 800w, which I calculated should be enough). After that, I went back to the old CPUs. Then things seemed okay so I added the GPU back. Then the USB card. Crashes became more frequent during that time. syslog.txt
March 30, 20224 yr Community Expert I looked over the syslog and didn't see anything just prior to most of the restarts BUT I am not Linux syslog Guru! You do have a lot of rsyslog errors and I don't what is going on there.... Most of the time, these are hardware related. Make sure that there is no way that the server reset button is not be pushed (or is defective). (Those LED lights are attractive to some pets and small children. Some folks have even eliminated it by unplugging it from the MB.) Do you have ECC Memory? IF not, have you run a 24 hour memtst?
March 30, 20224 yr Author Yeah, I have no idea what those rsyslog entries are about. I've removed the leads going to the reset switch headers already as I have two kids and a dog (but they're never never in my office with me being in here too). And yes, all of my Memory is ECC.
April 1, 20224 yr Author I removed the GPU and USB card from the server and went down to four sticks of 8gb ECC memory. That is back to how I had the server before I made all the upgrades apart from the new PSU. It ran for about 5 hours fine, at which point I decided to try a reboot. Again, it never rebooted only shut down.
April 3, 20224 yr Author Solution Update: I removed the GPU and USB cards and have now been running for over 2 days without a crash. Still not positive which card was the issue.
April 3, 20224 yr Community Expert At this point, since you have identified the that you have an issue with these two cards, you should be thinking about starting a new thread. Begin by asking yourself a few questions. What are you intending to use these cards for? How are you using them? What is the hardware details on both cards? What MB are you using? Any other cards used in the system? I suspect that you are using them in some sort of a VM setting. There is a major section of the forum devoted to VM's with a lot of sub-sections. Pick the one that best corresponds to what you want to do. When actually writing up the initial post, remember to answer all of those question about Hardware.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.