Jump to content

Unresponsive server after monitor suspend


Go to solution Solved by JorgeB,

Recommended Posts

Hello all,

I've ran into a weird problem with an Unraid server of mine. This server has an i5-11400 CPU on an Asus B560M Plus Wifi motherboard, with 16 or 32 GB of DDR4 ram (more on that in a bit cause it seems to be involved in the problem). The unraid version started off with an older version but was reflashed with the most recent version today and the problem continued.

 

Let me start off by saying this server and hardware was running problem free for over a year. I was repurposing the server a little and adding more memory to it when this problem started occurring.

 

The problem seems to be when some type of suspend even occurs (it's exactly after 15 minutes of the server being turned on, no input on a connected keyboard, and the monitor goes into sleep mode at the time of the event). As soon as this happens, there are tons of different processes that start going to 100% load on the thread they are on. And there ends up being 10+ of them on different threads resulting in a completely maxed out CPU. This is some very odd behavior that I've never seen something quite like it in my 25 years working with computers. The system basically becomes unresponsive making troubleshooting and trying things to fix it very difficult. The oddest thing to me though is the fact that it only does this with 32 GB of memory installed and not with 16. If I just put one of the 16 GB sticks from the 32 GB kit in, it's fine. If I run 2 x 8 GB sticks, it's fine. But 2 x 16 GB and the problem occurs.


At first I just assumed I had gotten some bad sticks of memory. So I used memtest86 and sure enough I got hundreds of errors with the 2 x 16 GB sticks. Thinking ok, I've identified the problem I went and exchanged the memory and tried a new set of 2 x 16 GB sticks. This time 0 errors during memtest. But as soon as I go into unraid after exactly 15 minutes same exact problem.

 

I've tried enabling / disabling xmp, asus performance mode, and played around with a few other bios settings. Nothing seems to make a difference. I don't know if it's actually related to the VGA suspending, or if that is just a coincidence of timing with it suspending at the same time as something else.

 

I'm frankly a little baffled and hoping someone else might have some ideas. I don't really think this is an Unraid problem, it really seems like some kind of obscure hardware conflict or something (oh I did update my bios and all to be sure that wasn't the problem). But still posting here just in case someone has seen something like this.

 

Thanks

Link to comment

No, that plugin is not installed. I did reflash the flash drive today with a fresh install of unraid, and the problem occurred without any additional plugins installed. I didn't even configure the array, just let it sit there for 15 minutes and it started.

 

I just observed some additional behavior since I made the post a few minutes ago that makes me lean more toward some kind of hardware problem. I went through a series of reboots where the system didn't even post, and the VGA light remained lit on the motherboard indicating no graphics card.

 

This makes me lean toward somehow something must've happened to cause the gpu inside the 11400 to start having problems? I dunno, I suppose it could be something on the motherboard in relation to the memory that is causing a false symptom... like in my first post i'm baffled and I'm grasping at straws lol. I just reseated the CPU in the socket just to see if that makes a difference, will be able to tell in 15 minutes I suppose =p

 

Otherwise, unless someone else has ideas, I'm probably going to have to try ordering a new 11400 and see if swapping the cpu fixes it. Or maybe just jump up to a 13400 and replace the mobo too in order to be safe. =/ I dunno, hate just replacing parts when the system was working just fine a couple of days ago before I shut it down for the memory upgrade. It had been running for over 150 days when I shut it down.. who knows, maybe a cap popped somewhere and it was fine until it was powered down =/

Link to comment
6 hours ago, JorgeB said:

Stock Unraid should never suspend.

 

See here about this, but this only affects v6.12.x:

https://docs.unraid.net/unraid-os/release-notes/6.12.0#known-issues

Interesting, I didn't think I was running 6.12.x prior to the reflash yesterday, but maybe I had upgraded it at some point and forgot. I didn't really pay attention to the version before I reflashed it because I wasn't worried about the data or anything on this system as I was reformatting everything anyway.

 

I applied the fix in this post and so far on the initial test, I've made it to 20 minutes and the monitor turned off like it usually does at the 15 minute mark, but the system is still operating perfectly normal!

 

I'll still have to do some more testing to be sure this one isn't a fluke, and to see if there are hardware issues or not, cause of some of the other flakiness that I've observed during troubleshooting this. Like the VGA light staying on until I reseated the processor yesterday or for example this morning, when starting the system, a few times it posted fine, but the Unraid menu never loaded. Never came up and said that it couldn't find a boot device or anything like that, just sat at a black screen after the POST went away for at least 5 to 10 minutes before I power cycled it. Got to make sure I can get a clean power cycle on this machine, cause it's being repurposed as an offsite backup primarily with some server stuff running on it (why I wanted the additionally memory that started all of this) because the offsite location has better internet that my home system hehe. So I won't have physical access to it easily to make sure it reboots any time I need to reboot it, or there is an extended power outage or something.

 

I'm also just really glad to know about this potential issue with 6.12.x now cause I was about to pull the trigger on upgrading my primary nas from 6.11.x to 6.12.3, and it is also running an 11400 =p I guess I need to get better about reading release notes hehe

Edited by Menaan
Link to comment

I think I can call this solved now. I've got through multiple reboots now and gone past the monitor going to sleep time period quite a few times now. I was able to get reboots to be reliable by disabling fast boot in the bios. So, I think as of right now I can call this resolved. Will still have to monitor the hardware for a bit because of the flakiness I was seeing, but maybe that was just caused by some bios corruption with all the memory swapping and such.

 

Thanks for the help, if I do have further issues I'll report back.

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...