[SOLVED] Unraid server (6.9.x 6.10.x) crash and is unreachable after WOL


terox
Go to solution Solved by terox,

Recommended Posts

Hello!

I have an stable 6.9.2 unraid server with a minimal configuration. I installed the S3 plugin from Dynamix to sleep the system during the inactivity. The server go to sleep sucessfully and responds also sucessfully to WOL signal.

 

The problem comes when the system wake up. It is unreachable because is crashed. I don't know why. I don't know where collect log to for diagnosticate the issues. Whe it occurs I can't access via website and SSH.

 

By other hand, if I run in a safe mode (without plugins) it runs and wakeup succesfully. I don't understand why because the only plugin that i have installed is the S3. I tried also in a normal mode without docker, VM... and other configurations. I don't understand why in safe mode it works and I can reach the server via SSH and with the browser.

 

It is really annoying.

 

I don't know how I can read the syslog after the "wake up" to read the error and why all is falling. I need help to send more information in that way. Any solution, link to post are welcome (I searched a lot, but no real solutions)... but I read that there are a lot of people with simular issues (not responding, crash...)

 

I hope that together find a solution


Thank you so much

Link to comment
1 minute ago, terox said:

the previous string with white and black text logs don't appear.

What does that mean?

 

Those diagnostics appear to be after a powerup, and not after a WOL.  Are you able to use the local keyboard / monitor at all after waking up?

 

Top of my head:  You've got a couple of VMs running and a couple of devices (sound etc) presumably passed through to the VM.  Are you passing through a video card to them?  Are you using the iGPU for passthrough?  I don't see any video being "isolated" from the system to prevent the OS from using it.

Link to comment

When the computer is "sleeping" after some time of inactivity, and the signal of the WOL appears I can't access to computer. It doesn't respond.

 

Quote

the previous string with white and black text logs don't appear.

Sorry by the soft explanation. I mean that I can see the "syslog" on screen monitor whem server boots, but not when it is wake up from LAN. I can't write anything before and after. I haven't a terminal, only logs (maybe I am wrong here). The key: after WOL action, I can't see anything in screen monitor.

 

45 minutes ago, Squid said:

Top of my head:  You've got a couple of VMs running and a couple of devices (sound etc) presumably passed through to the VM.  Are you passing through a video card to them?  Are you using the iGPU for passthrough?  I don't see any video being "isolated" from the system to prevent the OS from using it.

I have two VM and a NVIDIA graphic card atteched to both. I normally use only one VM at same time, because it is impossible use both. When I force the sleep state, both VM are powered off. Only dockers are running.
 

45 minutes ago, Squid said:

Are you using the iGPU for passthrough?  I don't see any video being "isolated" from the system to prevent the OS from using it.

The iGPU is not used in any VM, but I don't know what you mean from passing passthrough or isoloted. Maybe the issue could come from there? What do you want to try?
I tried some hours ago with VM service disabled. But same results. Maybe is the iGPU, but... why when it is restored from a S* state?

Thanks

systemdevices.png

Edited by terox
More details
Link to comment
2 hours ago, terox said:

I don't understand why in safe mode it works and I can reach the server via SSH and with the browser.

 

It is really annoying.

 

To be honest, I would simple say Unraid not S3 aware, official never say support S3.

 

Longtime ago, I test S3 with Unraid, sometimes it will crash, but same hardware wouldn't got S3 problem under Windows and different platforms also work well too.

 

I don't think S3 was transparency in terms of software,  during hardware wakeup, all components have different timing to return working state, if OS / software not aware this, crash or unexpected outcome should be normal.

Edited by Vr2Io
Link to comment
3 minutes ago, Vr2Io said:

 

To be honest, I would simple say Unraid not S3 aware, official never say support S3.

 

Longtime ago, I test S3 with Unraid, sometimes it will crash, but same hardware wouldn't got S3 problem under Windows.

 

I don't think S3 was transparency in terms of software,  during hardware wakeup, all components have different timing to return working state, if OS / software not aware this, crash or unexpected outcome should be normal.

 

Maybe, but I think that is important. I usually run my server 24/7 but some months ago I don't need processes running all the time. Also the electricity bills in Spain are touching the sky. We are in a serious trouble. I am searching solutions. I am sure, that there are some solution or explanation.

Edited by terox
Link to comment
2 hours ago, terox said:

The problem comes when the system wake up. It is unreachable because is crashed

UnRAID does not "officially" support S3 sleep.  That is why the only solution is a plugin.  On some hardware running unRAID, it works great.  On other hardware it does not work at all and some have mixed results. 

 

My first unRAID server would go to S3 sleep and wake up with no problems at all.  Ran like that for years.  When I changed motherboards (no software changes at all and the same version of unRAID), it would sleep and never wake up.  Not only would it not wake up, it would produce no video at all when I hard reset the server.  Everything appeared to be running fine, but there was no video output.  I was able to RMA the board and got a new one that I never put to sleep.  I just run it 24/7 and let the drives spin down after activity.  It consumed less than 40 watts when idle.

 

I have been through another motherboard change since then and I just no longer trust S3 sleep because I never know how it will behave on the hardware.

 

I am not saying S3 sleep/wake cannot work on your system.  What I am saying is there are no guarantees it will work properly and Limetech can't officially support it due to many different hardware combinations they would have to test and certify for S3 sleep under unRAID with a specific Linux kernel.

Link to comment

Hi,

I just set up my first home-server and have a similar issue, cause I have an E-350 AMD Brazos CPU with integrated HD 6310 and cannot wake up properly from S3, cause there's no video signal, no WebGUI, no shares etc.

 

This guy a few years ago solved the same problem that I have, but on Unraid 5,

and so it seems that I need to build a custom kernel and add the radeon driver in there. Then it worked at least on Unraid 5 for him.

 

Does anyone know how to do that for Unraid Release 6.9.2 ?

Cause here it says,

https://wiki.unraid.net/index.php/Building_a_custom_kernel

"This page was created for unRAID v5 (32 bit), has NOT been updated for v6!!! If you wish to adapt it, you would need a current kernel to match your chosen unRAID release, 64 bit packages from Slackware 14.1 (I think!), and who knows what else!"

 

Thank you very much

Link to comment
20 minutes ago, Squid said:

WOW, amazing!!

That worked like a charm. The system recovers from S3 perfectly fine now. I already thought I might need to buy a new system but you saved me.

I can't thank you enough. 👏

 

Haven't checked WOL, but will do that later.

Link to comment
16 hours ago, media-fort said:

WOW, amazing!!

That worked like a charm. The system recovers from S3 perfectly fine now. I already thought I might need to buy a new system but you saved me.

I can't thank you enough. 👏

 

Haven't checked WOL, but will do that later.

 

Very promising! I am now out of home for work for two days, but I will try and I will post the results.

 

In my case, I had set the "g" setting manually each time because the plugin S3 of Dynamic didn't work. Nothing important if I can patch it from user scripts of from a PR in GitHub repository :).

 

@media-fortdo you have some VM with a PCI video card connected by passtrough? In that case where are your video output, in video card on in the integrated igpu output?

 

@Squidshould install video drivers for igpu, although I am use (or passtrought to VM the PCI card, NVIDIA in my case)?

 

Thanks!

Link to comment

WOL works perfect, too. Even from a remote device from somewhere on the planet via WireGuard VPN.

 

Yes, WOL ONLY works after a reboot of my server after adding this line to the "go" file in the "config" folder of the usb stick.

ethtool -s eth0 wol g

Cause otherwise, after each reboot, this value is reset back to "d" on my server. Strange..

I use a MSI E350IA-E45 Motherboard.

 

@teroxSry, cannot help you with that, in my case the PCI-E Slot is blocked by a passive heatsink. So it can't be used at all. I can only use the iGPU (HD 6310), hoping that thing might be enough for basic video transcoding. So I'm always using the HDMI output of the motherboard and that works well.

Link to comment
23 minutes ago, media-fort said:

WOL works perfect, too. Even from a remote device from somewhere on the planet via WireGuard VPN.

 

Yes, WOL ONLY works after a reboot of my server after adding this line to the "go" file in the "config" folder of the usb stick.

ethtool -s eth0 wol g

Cause otherwise, after each reboot, this value is reset back to "d" on my server. Strange..

I use a MSI E350IA-E45 Motherboard.

 

@teroxSry, cannot help you with that, in my case the PCI-E Slot is blocked by a passive heatsink. So it can't be used at all. I can only use the iGPU (HD 6310), hoping that thing might be enough for basic video transcoding. So I'm always using the HDMI output of the motherboard and that works well.


That sounds promising. I will be able to try in two days. You have the same experiences that other people with the “g” param. One question, are you using a router with WoW, sending magic packets to the public IP on the WireGuard port? I read that router must be compatible.

 

29 minutes ago, Squid said:

I suppose.  I don't use sleep and it's not officially supported by the OS. so just pointed to a possibility


I will feedback you with the results in a few days

 

Thank you

 

Link to comment
22 hours ago, terox said:

are you using a router with WoW, sending magic packets to the public IP on the WireGuard port? I read that router must be compatible.

I'm using a FritzBox 6490 Cable. I think this one is able to wake up devices via the FritzBox Webinterface with Wake-on-LAN.

So maybe your right and you might need a compatible router.

 

But I figured out a pretty bad drawback.

The write speeds to the server are okay (max. 70-100 MB/s), but the read speeds are a joke (max. 35-42 MB/s). And it seems to be a network related issue, cause copying files from one array drive to another works with fullspeed.

The poor read speeds appear everywhere exactly the same, with SMB, FTP and NFS and it also doesn't matter which device reads from the server (tested with Windows 10, Android devices...). 

 

So it seems to be related to the RTL8111E LAN-Chip and/or the E-350 CPU.

 

During writing the CPU load is almost 100% all the time. But during reading the load fluctuates quite much and averages at about 50-80%

Any ideas how to enhance reading speeds?

 

-----------------------------------------------------------------------------------

Edit: 31.01.2022

 

The reason for the slow read speeds was definately using the "bridging" mode. Cause after choosing "no" for bridging under Network settings the read speeds increased from about 330 Mbit/s to 880 Mbit/s, and that's almost as fast as the write speeds.

(All of these results were logged with "iperf3")

 

Sadly the integrated HD 6310 graphic chip is way to slow for video transcoding via Jellyfin / Plex etc.

Only 360p with 700Kbps seems to work without stuttering...

Edited by media-fort
Solved my question by myself
Link to comment

Bad new from my side.

I enabled the i915 driver for my Intel integrated GPU and then rebooted the server.
After force the S3 sleep with the plugins, nothing new happen: crashed.

 

I also tried installing the NVIDIA drivers, and the same.

 

I keep investigating. I need read the logs after crash, but I don’t know how because they are cleaned or lost.

Link to comment
Just now, ChatNoir said:

The logs are in RAM (with all of the OS) so it is lost on reboot.

If you want persistant logs you need to set up a syslog server.

Thank for the annotation.

I keep investigating because in safe mode, with the array mounted and all dockers working as normal and other services like SMB, it works like a charm. What things change in safe mode? What things (separately of plugins) change?

Link to comment
10 minutes ago, terox said:

Something more are changing… there are some official documentation?

No documentation that I know of.   It has been frequently stated that the only difference is loading plugins, but I guess it does not mean that there is no other difference (although I have never noticed one).  
 

Having said that I believe there is a folder on the cache drive (cannot remember. It’s name - probably something like ‘extras’) that can contain packages that are auto-loaded in normal mode and may not be in Safe Mode.   The whole idea of Safe Mode is to avoid loading any software components that are not part of the standard Unraid release.

 

Link to comment
46 minutes ago, itimpi said:

No documentation that I know of.   It has been frequently stated that the only difference is loading plugins, but I guess it does not mean that there is no other difference (although I have never noticed one).  
 

Having said that I believe there is a folder on the cache drive (cannot remember. It’s name - probably something like ‘extras’) that can contain packages that are auto-loaded in normal mode and may not be in Safe Mode.   The whole idea of Safe Mode is to avoid loading any software components that are not part of the standard Unraid release.

 

 

Could be interesting what changes are enabled in "unraidsafemode".

 

I checked kernel loaded modules and are the same in safe and normal mode. Is something loaded in normal mode that is not present in safe.

 

Could be the isolcpus? Is the only parameter not present in the normal boot. But I don't understand what is the effect.
image.thumb.png.79f927a89f342135060ff4640bc0ddbb.png

Link to comment

I created a new entry in boot without isolcpus... and it worked as expected. I will try in next days to understand why:

  • It's a configuration question of pinned cpus?
  • What are the implications of isolcpus? Why its related with the sleep mode? Maybe it's a bug?

If someone have some clue, could be very helpful.

 

Thank you for all of you help

Link to comment
  • terox changed the title to [SOLVED] Unraid server (6.9.x 6.10.x) crash and is unreachable after WOL

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.