Jump to content

Unraid server freezes | WEBUI no longer accessible | keyboard input also not possible #Unraid_Version_6.12.8


Recommended Posts

Posted (edited)

I did a "SMART extended self-test" on both M.2. SSDs. 0 Errors. Everything seems to work?

 

I have 2x SK hynix Gold P31 2TB PCIe NVMe Gen3 M.2 2280 interne SSD, bis zu 3500 MB/s, kompakt, Formfaktor SSD - Internes Solid State Drive mit 128-Layer NAND Flash, Festkörper-Laufwerk

 

 

I also added "pcie_aspm=off" to my config.

 

But I think that wouldn´t fix my freezes?

 

 

 

Edited by UNRA1DUser
Link to comment
Posted (edited)
1 hour ago, JorgeB said:

Probably not, post a new persistent syslog if it happens again.

I just bootet into the BIOS and wanted to check some settings (Didn´t changed anything). After 1-2 Reboots the USB drive is not recognized anymore. Maybe that´s something I should analyse. Could it happen that after some Days the USB device is not recognized anymore and thats why Unraid freezes?

 

My USB Device (https://www.amazon.de/gp/product/B07D1KCL2Z/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&psc=1) is connected via this Adapter "https://www.amazon.de/gp/product/B08N4LQJJN/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&th=1" directly on the Mainboard. (USB Device is new also checked for Errors.)

 

BIOS USB Settings:

 

Legacy USB Support: Disabled

XHCI Hand-off: Disabled

USB Mass Storage Driver Support: Enabled

Port 60/64 Emulation: Disabled

Fast boot: Disabled Link

CSM Support: Disabled

Windows 10 Features: Other OS

 

Secure Boot: Enabled (Because of EFI?)

 

Power Loading: Auto (Set it now to Disabled to test it)

 

I am booting Unraid with EFI.

 

Does any other BIOS Setting is important? If I move my USB Device with Adapter to another Onboard Port it would work again with the same settings.

Edited by UNRA1DUser
Link to comment

You want Secure Boot to be disabled.  You do not need it just because you are using UEFI boot and Unraid does not support it anyway.

 

I would think you could also enable Legacy USB support and CSM support.   This would give you the option of alternatively booting in legacy mode (but does not stop UEFI boot).  Not sure if it helps in any other way.

Link to comment
Posted (edited)
25 minutes ago, itimpi said:

You want Secure Boot to be disabled.  You do not need it just because you are using UEFI boot and Unraid does not support it anyway.

 

I would think you could also enable Legacy USB support and CSM support.   This would give you the option of alternatively booting in legacy mode (but does not stop UEFI boot).  Not sure if it helps in any other way.

Thanks for you´re answer.

 

I disabled secure Boot and Enabled Legacy USB support.

 

If I enable CSM support I get the following settings:

LAN PXE Boot Option ROM: Disabled (I think I don´t need it)

Storage Boot Option Control: UEFI (I can choose UEFI, Legacy or Do not launch)

Other PCI devices: UEFI (I can choose UEFI, Legacy or Do not launch)

 

Should be both on Legacy ?

 

 

I also found settings for

"Intel Platform Trust Technology (PTT)" should that be also disabled? It is currently Enabled.

"Software Guard Extensions (SGX)" is set to Software Controlled. Should I disable it? Is it used?

 

Edited by UNRA1DUser
Link to comment
27 minutes ago, JorgeB said:

Flash drive dropped offline, try recreating it or replacing it.

Can I recreate it with my old files ? Or should I create it completely new and just move all my docker xml and VMs over ?

 

all also bought a new one today. If re creating is not working. 

Link to comment
19 hours ago, JorgeB said:

Create a new one and restore the config folder.

I recreated the USB Stick. I am now able to reboot normally again.

 

But I still have the same boot sequence like in the Video ->

 

 

Is that the normal boot behavior with "Not automatically fixing this" and this big block of numbers ?

Link to comment

I created the USB-Stick completely new and also didn´t copied any configs. The Parity-Sync is also running since yesterday. But at 00:40 AM the Server gots a Message to terminate? I didn´t shutdown the Server. What happend? I wasn´t on the PC/Server since 09:00 PM :D And today morning I just have to login again and enter my Encryption Key for the HDDs. Whats going on here?

 

Can somebody check the log and help me, please?

 

 

 

tower-syslog-20240420-0730.zip

Link to comment

Unfortunately the syslog you posted (and the version automatically included when getting diagnostics) is the RAM version that starts afresh every time the system is booted. so we do not know what happened prior to the reboot.  You should enable the syslog server (probably with the option to Mirror to Flash set) to get a syslog that survives a reboot so we can see what preceded the reboot.  The mirror to flash option is the easiest to set up (and if used the file is then automatically included in any diagnostics), but if you are worried about excessive wear on the flash drive you can put your server's address into the remote server field.  

 

When you say the system got a 'message' to terminate what do you mean?  If you mean it started a tidy shutdown is there any chance someone/something (e.g. a cat) could have pressed on the power button to trigger a shutdown?    If you simply mean it rebooted itself then this is normally a hardware issue of some kind.

Link to comment
2 minutes ago, itimpi said:

Unfortunately the syslog you posted (and the version automatically included when getting diagnostics) is the RAM version that starts afresh every time the system is booted. so we do not know what happened prior to the reboot.  You should enable the syslog server (probably with the option to Mirror to Flash set) to get a syslog that survives a reboot so we can see what preceded the reboot.  The mirror to flash option is the easiest to set up (and if used the file is then automatically included in any diagnostics), but if you are worried about excessive wear on the flash drive you can put your server's address into the remote server field.  

 

When you say the system got a 'message' to terminate what do you mean?  If you mean it started a tidy shutdown is there any chance someone/something (e.g. a cat) could have pressed on the power button to trigger a shutdown?    If you simply mean it rebooted itself then this is normally a hardware issue of some kind.

 

I just activated the Syslog Server to mirror to the Flash drive. I don´t have a cat or any other animal. And nobody was entering that room after 09:00 PM.

 

But what Hardware Issue it is? I checked mostly everything. Checked all drives with a SMART test, CPU Benchmark, Changed PSU to a new one, Memtest over several hours. Changed the USB-Stick, also created it new.

 

In mid of the Parity-Sync the Server decided to reboot. And the Server also rebooted at other states. Not only in mid of the Parity-Sync.

 

 

I mean those Messages in the Syslog.

 

"Received signal 15; terminating."

image.thumb.png.a7ba194bee1882e6ad5f439aff56e89c.png

 

"BERT: [Hardware Error]: Skipped 1 error records"

image.thumb.png.d7afbf2fe629bf73f80bc7fccbfaf904.png

 

"kernel reports TIME_ERROR: 0x41: Clock Unsynchronized" -> Often my set time in BIOS is flipping back to something else.

image.thumb.png.e75957a83b2d01db4ff8fdcc4195d542.png

 

 

ACPI Warning:

image.thumb.png.fdb87fa1c724d1c7fda7d5209b2f651f.png

Link to comment

If the sever is simply rebooting rather than shutting itself down then this almost invariably indicates a hardware issue with the commonest being power or thermal type issues.   If you get either of these then nothing will show up in the logs.

Link to comment
3 hours ago, itimpi said:

If the sever is simply rebooting rather than shutting itself down then this almost invariably indicates a hardware issue with the commonest being power or thermal type issues.   If you get either of these then nothing will show up in the logs.

But the interesting thing is, I switched to a new PSU and I had the same issue in unraid normal mode.

 

And I tested a CPU Benchmark in Unraid safe mode without plugins and everything. (I tested also with only plugins on). And everything works without Issues for a long time.

 

But I also found a setting in BIOS its called Power Loading. -> Enables or disables dummy load. When the power supply is at low load, a self-protection will activate causing it to shutdown or fail. If this occurs, please set to Enabled. Auto lets the BIOS automatically configure this setting.

Maybe I should try that ?

Link to comment
  • 2 weeks later...
27 minutes ago, UNRA1DUser said:

The interesting thing is what Hardware?

Can be caused by different components, most likely in my experience would be mostly in this order: RAM, PSU, board or CPU

 

28 minutes ago, UNRA1DUser said:

Can it be a HDD or M.2 SSD ? Or a connection lost to the USB Stick ?

Won't say it's not possible but it's unlikely.

Link to comment
  • 3 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...