system become unresponsive unless I boot into safe mode


Recommended Posts

Hi, so I have been messing with my server recently installing some new drive and trying to get a VM up. 

 

I don't know what I have done to make the system become unresponsive but I suspect it was to do with CPU isolation (I slected CPU cores 3-6 to be isolated so I could have them dedicated to a VM). After chaging this setting I was prompted to reboot and now I have issues.

 

When the system boots I can ping it but no access the GUI, I cant even Telnet into the machine (I enter login and pass then the putty terminal closes down).

 

The only way I can get the machine up so I can interact with it is in safe mode. In safe mode I can get GUI, Telnet the works but as soon as I start the array it all goes unresponsive again.

 

SO I know I should provide my diagnostics however I'm unsure how to gathered them because as soon as the issues start I cant get anything from the machine.


Can I get them from in safe mode before the issues kick in or would that no be helpful?

 

Thanks

Link to comment
43 minutes ago, m0t0k0 said:

Can I get them from in safe mode before the issues kick in or would that no be helpful?

 

Yup.  Tools - Diagnostics

 

Even better would be to boot normally, and if you have access to the local command prompt (or if booting via the GUI mode to hit the "lime" startmenu, then Terminal) and enter in

diagnostics

 

Shut down the server, pull the flash drive, and upload the resulting file (logs folder on the flash drive) here

Link to comment

Unfortunately I'm unable to get even to the command prompt unless I start in safe mode. I have attached a copy of my diagnostics booted into safe mode.

 

I can even start the array and dockers in safe mode so not sure what's going on. Although I'm convinced its to do with CPU core allocation/isolation is there a way to edit this via a config file on the USB

universe-diagnostics-20210430-1702.zip

Link to comment

I took a glance and the only thing I'm noticing is in your GUI diagnostic, the output of lscpu.txt indicates that VMX is disabled on your system due to an Itlb multihit vulnerability.

 

I legitimately don't know if that would cause this symptom, but my understanding(?) is that it will cause Virtualization not to function.

Link to comment
7 hours ago, codefaux said:

I took a glance and the only thing I'm noticing is in your GUI diagnostic, the output of lscpu.txt indicates that VMX is disabled on your system due to an Itlb multihit vulnerability.

 

I legitimately don't know if that would cause this symptom, but my understanding(?) is that it will cause Virtualization not to function.

 

Thanks codefaux could this be because I set the VM manger to disabled when I was able to get into the GUI?

 

I did manage to get the unraid back up and running, I think it was because my BIOS set the RAM to the XMP profile (this should be supported by my MOBO and CPU).

I set it manually to a lower speed from the JDEC spec of the RAM and its working!

Well mostly now I cant get swag container to start but other than that its good

 

Also I found the CPU isolation setting in the syslinux.cfg file on the USB and removed it so that may have helped too

Link to comment

Overclocking in Unraid -- or any mission-critical situation, ever -- is a very very bad idea, unless you're okay with silent data corruption. XMP is an overclocking profile applied blindly by the BIOS because it "should be working" but XMP is not a guaranteed stable configuration under literally any condition. I realize you may not have intentionally done this, I'm mostly leaving this here for future forum-go-ers who search, because it is a thing I see constantly cause significant problems.

 

The webUI/GUI settings would not control this, it is specifically being disabled by the Linux Kernel itself due to errata (known security and/or stability issues) with your processor, IE how it is physically constructed and the microcode it runs. This is normally also OS-independant, but not always.

 

On 5/1/2021 at 1:54 AM, m0t0k0 said:

CPU isolation setting in the syslinux.cfg file

I actually was unaware of where that lived, but I'll keep a note. Thank you.

 

 

On 5/1/2021 at 1:54 AM, m0t0k0 said:

cant get swag container to start

Have you looked into logs? Could be nothing, could be signs of a larger issue persisting.

Edited by codefaux
XMPP -> XMP ... XMPP is a chat protocol.....
Link to comment
On 5/1/2021 at 10:09 AM, codefaux said:

Overclocking in Unraid -- or any mission-critical situation, ever -- is a very very bad idea, unless you're okay with silent data corruption. XMPP is an overclocking profile applied blindly by the BIOS because it "should be working" but XMPP is not a guaranteed stable configuration under literally any condition. I realize you may not have intentionally done this, I'm mostly leaving this here for future forum-go-ers who search, because it is a thing I see constantly cause significant problems.

 

The webUI/GUI settings would not control this, it is specifically being disabled by the Linux Kernel itself due to errata (known security and/or stability issues) with your processor, IE how it is physically constructed and the microcode it runs. This is normally also OS-independant, but not always.

 

I actually was unaware of where that lived, but I'll keep a note. Thank you.

 

 

Have you looked into logs? Could be nothing, could be signs of a larger issue persisting.

 

Thanks for the help diagnosing this.

 

The memory thing was due to me being unfamiliar with how asrock set the timings. Normally if you select the XMP profile it will change the RAM speed and the timings but it seems that at lest my board had left the timing as they were before. I have now manually selected the speed 2666Mhz and relevant timings from the XMP profile and all is good.

Its not an overclock really at 2666Mhz as its within the spec for the CPU, MOBO and RAM. But defiantly tripped me up and I understand your point of going for stability with a server build.

 

I also worked out the docker issues using the posts from this thread. I deleted and the docker network and rebuilt then recreated the customer docker network proxynet which swag was running on.

 

Link to comment
On 5/2/2021 at 1:05 PM, m0t0k0 said:

XMP profile

Also, to clear it up - XMP is overclocking. I mentioned that, but maybe I wasn't quite clear enough.

 

https://www.intel.com/content/www/us/en/gaming/extreme-memory-profile-xmp.html

 

XMP is overclocking. There's no conditional. Full stop. If you want more colorful words, XMP is the Paint-By-Numbers version of Overclocking, which makes it less safe than actual overclocking, because with actual overclocking users increment by tiny margins until they start to destabilize and then back off in a careful, slow, calibrated approach. Actual overclocking takes time, effort, and produces a stable result.

 

Do not overclock your Unraid server or anything else mission-critical.

 

RAM "rated for" an XMP profile of XYZ just means "it overclocks to this and is stable in our tests(TM)" -- it does not guarantee real-world stability in any way. I said that, but I don't really think you heard me. It's like an herbal cancer cure. At best, if they're being honest, it worked for them and has absolutely no guarantee that it will work for you. If it ran without overclocking on a set of timings and clock frequencies, that would be its base clock frequency, IE what's stored in the SPD, IE what you get when you set your RAM to Auto. When you select Auto on your motherboard, it tends to select either A) the fastest rated configuration, or B) the fastest configuration your system can achieve due to hardware or compatibility limitations.

 

I'll say again, to future lurkers. Do not overclock RAM, do not manually increase the speed "because it's only a few hundred mhz" or because "it seems slow" -- buy faster RAM if you want faster RAM. Overclocking RAM (XMP or manuallyrisks the safety of your data. Don't do it.

Link to comment
4 hours ago, codefaux said:

because "it seems slow"

Truth. I have yet to see a server type system "feel" faster with that sort of overclocking. Synthetic benchmarks may show small improvements, but nothing that actually effects real world loads significantly.

 

I HAVE seen timing issues with XMP cause micro stutters and brief freezing even if it didn't outright crash.

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.