loosing network connection / server crashing


Recommended Posts

after several houres uptime, i'm loosing the network connection on my unraid and possibly the unraid system is crashing

this is the error message, i can see in the syslog on the open telnet-session, before connection is lost.

Tower kernel: traps: emhttpd[21569] trap divide error ip:416c36 sp:2b384fe45e20 error:0 in emhttpd[400000+21000]

I tried to create diagnostics, but nothing happens after running the command (over ipmi remote console)

 

Link to comment
  • 2 weeks later...

6.4 is still unstable for me. loosing connection after a few hours (sometimes only a few minutes after boot).

attached a syslog. cannot run diagnostics when this happend (hangs on "Starting diagnostics collection...". nothing happens even after waiting for several minutes)

i disabled ntp, because ntpd spams the log with hangs and restarts.

 

6.3.5 is running stable without any problems

syslog190617.txt

Link to comment

pstate disabling is still relevant for Haswell CPUs, they don't scale down particularly well (which is annoying)

 

I still use that flag in my config.

 

To clarify, without pstate disabled; the CPU will still scale up and down in frequency... Just incredibly sporadically, mostly sticking at quite high clock speeds. With pstate disabled my Xeon throttles down to 800MHz at idle and jumps up to 1000, 1200, etc. MHz as and when it needs it.

Link to comment
54 minutes ago, nexusmaniac said:

pstate disabling is still relevant for Haswell CPUs, they don't scale down particularly well (which is annoying)

 

I still use that flag in my config.

 

To clarify, without pstate disabled; the CPU will still scale up and down in frequency... Just incredibly sporadically, mostly sticking at quite high clock speeds. With pstate disabled my Xeon throttles down to 800MHz at idle and jumps up to 1000, 1200, etc. MHz as and when it needs it.

 

Have you tried this in 6.3.5 or 6.4.0-rc?  This might have been fixed.

Link to comment
5 minutes ago, nexusmaniac said:

The same behavior is still present

 

Wondering if you've researched this?  That is, do other distros exhibit this problem (and suggest the pstate disable in the kernel command line)?  We might be missing something simple such as some kernel config option turned off.

Link to comment
12 minutes ago, limetech said:

 

Wondering if you've researched this?  That is, do other distros exhibit this problem (and suggest the pstate disable in the kernel command line)?  We might be missing something simple such as some kernel config option turned off.

 

I've done limited research, enough to know that disabling pstate mostly resolves the issue (at the cost of it disabling Intel Turbo Boost)

 

It scales completely fine on Ubuntu, I can test it again now for you, I'll stick 17.04 on a USB and get back to you in a sec :)

(and I'll manually bump the kernel to 4.11.6) after I confirm it's working as expected with the pstate driver enabled on the stock kernel

 

Edit: Does that sound helpful enough? Or were you thinking of a different distro entirely?

Edited by nexusmaniac
Link to comment
17 minutes ago, nexusmaniac said:

Edit: Does that sound helpful enough? Or were you thinking of a different distro entirely?

 

Yes that's great, thanks!  If you confirm all working with 4.11.6 it would be helpful to post the kernel .config file, as well as output from 'lsmod'.

Link to comment
54 minutes ago, limetech said:

 

Yes that's great, thanks!  If you confirm all working with 4.11.6 it would be helpful to post the kernel .config file, as well as output from 'lsmod'.

 

Hi,

 

That took longer than expected!! But here we go:

 

Kernel config attached, lsmod output attached, the CPU is a Pentium G3450 (vs a Xeon 1231v3 in my unRAID machine) but they're both Haswell based CPUs.

 

I'm just installing Ubuntu 17.04 on a 1231v3 based machine now, I've seen the expected results before so I'm sure the same will be true this time, but I shall test it to be 100% sure :)

 

I hope the attached files are helpful

lsmod.txt

config-4.11.6-041106-generic

Link to comment

Here's the lsmod & kernel conf from the Xeon build. The CPU freq is still quite ... 'bouncy' it didn't idle like it has in the past, I've double checked the scheduler though - both of these systems are using the "intel_pstate" driver. Potential 'issue' with the Haswell Refresh line? As opposed to just Haswell? Both CPUs are part of the "Haswell Refresh"

 

Ubuntu Desktop 17.04, 4.11.6-generic

(Xeon) config-4.11.6-041106-generic

(xeon)lsmod

Edited by nexusmaniac
Link to comment
12 hours ago, nexusmaniac said:

Ubuntu Desktop 17.04, 4.11.6-generic

 

Please try out -rc6.  We mainly did a module comparison and found one we leave out: INTEL_POWERCLAMP:


"Enable this to enable Intel PowerClamp idle injection driver. This enforce idle time which results in more package C-state residency. The user interface is exposed via generic thermal framework."

 

If this doesn't work, we'll do a deeper analysis of the .config differences.  Thanks for your help!

Link to comment
26 minutes ago, limetech said:

 

Please try out -rc6.  We mainly did a module comparison and found one we leave out: INTEL_POWERCLAMP:


"Enable this to enable Intel PowerClamp idle injection driver. This enforce idle time which results in more package C-state residency. The user interface is exposed via generic thermal framework."

 

If this doesn't work, we'll do a deeper analysis of the .config differences.  Thanks for your help!

Hi :)

 

I've updated to -rc6!

 

Freq still seems to be bouncing all over the place :P (Netdata image attached)

 

Happy to do anything else to assist, I know it's not the biggest 'issue' in the world, there's so much debate on the internet about acpi / intel_pstate haha!

 

Not sure if it's an option or not, haven't gotten around to looking through the pstate docs... But I see on the 'Tips and Tweaks' plugin, with pstate left enabled I can change between powersave & performance (both of which work, powersave jumps between 800MHz and max turbo speed (3.4-3.7~GHz) very quickly and when the server is idle. Performance works as expected and pins every core to 3.4GHz + boost when needed.

 

With intel_pstate=disable ACPI kicks in and I can drop all the way to 800MHz for 80% of the time, with small peaks when containers are doing there thing or the array is accessed, etc.

 

So 2 things I don't know!

 

- If the ondemand scheduler exist on the pstate driver?

And

- Which scheduler runs on Ubuntu's pstate driver, presumably powersave? And I'm not sure if Intel have exclusive control? Or whether ubuntu / the kernel guys can (or have) tweaked this :)

cpu freq.png

Link to comment
  • 2 months later...

I just wanted to say that I'm seeing the same behavior with my Haswell (e3-1225v3) machine.

 

intel_pstate seems to be too aggressive with clocking up compared to acpi-cpufreq.

 

For example I always have a Windows VM idling.

 

acpi-cpufreq with "ondemand" and the default up_threshold 95 (=raises clockspeed when core is above 95% load) throttles the CPU to 800 Mhz and usually to ~1700 Mhz when light load is applied or 3200 Mhz when full load is applied. This gives me very nice power consumption.

 

On the other hand with intel_pstate and "powersave" my cores never drop below 3200 Mhz when the Win VM is idling and idle power consumption raises by 15W. I think it is too aggressive with clocking up, but I can't find the equivalent of up_threshold for the intel driver.

 

this thread seems to describe the problem: 

 

Edited by lionceau
Link to comment

The linux kernel documentation mentions that intel_pstate parameters can indeed be tweaked.

 

https://www.kernel.org/doc/html/latest/admin-guide/pm/intel_pstate.html

 

Maybe it would be possible to compare the settings and import those from Ubuntu to Unraid?

 

Quote

 

Tuning Interface in debugfs

The powersave algorithm provided by intel_pstate for the Core line of processors in the active mode is based on a PID controller whose parameters were chosen to address a number of different use cases at the same time. However, it still is possible to fine-tune it to a specific workload and the debugfs interface under /sys/kernel/debug/pstate_snb/ is provided for this purpose. [Note that the pstate_snb directory will be present only if the specific P-state selection algorithm matching the interface in it actually is in use.]

The following files present in that directory can be used to modify the PID controller parameters at run time:

deadband
d_gain_pct
i_gain_pct
p_gain_pct
sample_rate_ms
setpoint

Note, however, that achieving desirable results this way generally requires expert-level understanding of the power vs performance tradeoff, so extra care is recommended when attempting to do that.

 

 

Supposedly it should be possible to set intel_pstate=passive and use all generic governors while retaining intel functionality like turbo p-states, but I tried it in unraid and it only displays the two intel_pstate driver governors (powersave and performance).

 

 

Edited by lionceau
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.