Hoopster

June 17, 2023

2 hours ago, ich777 said:

Can you please test and try to disable VT-d in the BIOS, remove the i915.conf file (or move it to the root from your USB Boot device), reboot and see if that works?

Yes, the server boots into 6.12.0 if I disable Intel Virtualization Technologies in the BIOS and with no i915.conf file in modprobe.d

iGPU is still enabled as /dev/dri with the appropriate files exists and I am able to hardware transcode in Plex.

This is also an acceptable short-term solution on this server as, for now, I do not intend to run any VMs on the server.

@ich777 UPDATE: After a couple of more reboots, diabled Vt-d no longer solved the problem. The server was back to the prior no-boot behavior. After restoring the i915.conf file with the options line, it again boots successfully. That appears to be the only "permanent" solution.

June 17, 2023

12 hours ago, JorgeB said:

Can this be closed?

As far as I am concerned, yes, it can be closed as the server in question boots consistently and without errors since adding the indicated line to i915.conf.

June 17, 2023

5 hours ago, SimonF said:

yes you can plug into another machine create or edit the file in the flash and add to file.

@VeeTECH That's what I did to add "options i915 enable_dc=0" to the i915.conf file on the flash drive.

June 16, 2023

36 minutes ago, SimonF said:

echo "options i915 enable_dc=0" >> /boot/config/modprobe.d/i915.conf

That seems to have worked. Thank you. I guess I just missed that as it appears to have been mentioned previously.

It makes sense as, even in 6.11.5, I had to disable low power mode for the HandBrake presets using QSV (lowpower=0 in More Settings in the Video tab) or all encodes attempted with QSV crashed.

June 16, 2023

1 hour ago, JorgeB said:

Try blacklisting the i915 driver:

https://docs.unraid.net/unraid-os/release-notes/6.10.0#linux-kernel

I was afraid that might be necessary as the video seemed to indicate the boot process stopped with loading the i915 driver.

Blacklisting i915 allowed the server to boot properly but this server has only the iGPU for graphics and is primarily a Plex server.

Apparently, the Linux kernel in 6.11.5 has no problems with the i5-11600 CPU/iGPU but the kernel in 6.12.0 does not like it. 🙁

July 28, 2022

I updated to 6.11.0-rc1 on my backup server. Update went well.

The only thing I have noticed is that motherboard and CPU temps are not showing in GUI dashboard. They were present in 6.10.3. I am using the IPMI plugin and that continues to report CPU and MB temps in the GUI footer.

There was also a motherboard BIOS update available (security mitigations) and I did that update before the unRAID update. That could be the cause I suppose but, again, IPMI plugin still detects sensors and displays the temps.

backupnas-diagnostics-20220728-0912.zip

April 7, 2022

1 hour ago, flyize said:

Are you using Alder Lake?

Coffee Lake (Xeon E-2288G)

April 7, 2022

1 hour ago, NightOps said:

Won't uninstalling Intel GPU Top plugin roll back the blacklisting of the i915 driver and keep HW transcoding from working?

I am still running 6.9.2 on that server and the iGPU is enabled via the touch method which creates and empty i915.conf file in modprobe.d

If that file did not exist, i915 would remain blacklisted (the default in 6.9.2) and no HW transcoding would take place.

April 7, 2022

1 hour ago, Lee Kim Tatt said:

CoreFreq? I didn't install that, but using Intel GPU Top? Any relation between them?

I uninstalled Intel GPU Top and GPU Statistics and my server hangs stopped. I also uninstalled CoreFreq as a precaution, but, that was later.

March 14, 2022

@bonienl Rebooted a second time in Safe Mode. This time I got the Nchan error as seen below just a few seconds after starting the array.

March 14, 2022

7 hours ago, bonienl said:

You diagnostics show several plugins installed. Can you retest in safe mode to rule out the plugins?

Booted in safe mode. Let the GUI sit for about 10 minutes, no Nchan errors although I continued to see the problem with slow populating of data in the Main tab and terminal sessions either closing or taking a while to display the command prompt.

Plugins installed are the following:

image.png.36b724f92b9b89da58fa21d37e1cc0ba.png

I will see if I can determine which one may be causing the Nchan error.

March 13, 2022

4 minutes ago, Hoopster said:

The error is still occurring but less frequently. It seems to settle down a bit if I am not actively doing things in the GUI for a few minutes. As soon as I start moving around in the GUI, back it comes with increasing frequency.

I am also seeing that when I switch away from the Main tab in the GUI and then return to that tab, it takes a few seconds for data to load on the Main tab. Data in other tabs loads instantly.

@bonienl Not seeing the Nchan error or Main tab issue in Chrome; just Firefox (version 98.0).

Also, in Firefox only, opening a terminal session takes a few seconds to display the command prompt (or the window just closes); instantaneous through Chrome.

March 13, 2022

1 minute ago, bonienl said:

Is your server heavily loaded at startup?

Can you post your diagnostics when this happens.

The error is still occurring but less frequently. It seems to settle down a bit if I am not actively doing things in the GUI for a few minutes. As soon as I start moving around in the GUI, back it comes with increasing frequency.

I am also seeing that when I switch away from the Main tab in the GUI and then return to that tab, it takes a few seconds for data to load on the Main tab. Data in other tabs loads instantly.

This server should not be very busy at startup. It is my backup server with fewer Docker containers and plugins than the main server and no VMs.

Diagnostics attached.

backupnas-diagnostics-20220313-1152.zip

March 13, 2022

20 hours ago, bonienl said:

I have not encountered this in my testing with any of the main browsers.

In my case Nchan communication only gets broken when a device goes in powersaving or sleep mode, and I need to "wake-up" the communication.

Just powered on the server again and brought up the GUI. The Nchan error comes up after about 10 seconds. If I click OK it goes away then reappears a few seconds later. I have dismissed it 7 times now and it continues to pop up in the GUI.

image.png.3990c3cdf9c60541efb5371423ffb626.png

March 12, 2022

14 minutes ago, bonienl said:

Do you get this while actively working with the browser or after some time of inactivity and the system going in powersaving mode?

I get this frequently right after powering on the server and starting the GUI. It would appear after the GUI sat idle for a several seconds or while attempting to navigate between tabs.

Interestingly, the GUI sat idle for 15 minutes or more and the error is not appearing again. I have also tried navigating between tabs without issue.

Right after upgrading yesterday and restarting the server and GUI today, I would see the error pop up every 10 seconds or so. Now, it seem to have mysteriously disappeared. I'll report if I see it again.

March 12, 2022

After updating my backup server to 6.10.0-RC3, I am getting the error below frequently. I have closed and reopened the browser (Firefox) and restarted the server a couple of times. I have also checked that there are not tabs open to the GUI for that server on other computers.

image.png.c412e7f22758114732cb6f90a830406e.png

backupnas-diagnostics-20220312-1234.zip

February 18, 2022

3 minutes ago, snailtrails said:

That Xeon is coffee lake (9th gen intel) which Unraid is certified for

If you look at the hardware specs for the user that started this bug report as well as others in the thread he linked, you will see that there are many reports of i915 lockups with 9th generation Intel CPUs/iGPUs. It may be more prevalent in the 1tth and 12th generations that don't have official iGPU support in the i915 drivers but it seems to not be limited to those generations.

February 18, 2022

20 hours ago, NightOps said:

What CPU are you running? Are you using iGPU passthrough to Plex for transcoding?

Xeon E-2288G and yes I am using the iGPU (/dev/dri) for Plex and HandBrake transcoding. Server never locked up when transcoding was happening. It always seemed to happen during idle time.

I am not saying Intel-GPU-Top was the problem, but, I have not had a crash since removing it and GPU Statistics. Running unRAID 6.9.2 currently. Lockups previously happened on that version plus the two 6.10.0 RCs.

February 17, 2022

40 minutes ago, bearcat2004 said:

@Hoopster has your machine crashed at any point?

No, I am at 50+ days of uptime since removing the Intel-GPU-top and GPU Statistics plugins.

I also recently removed the CoreFreq plugin as there have been several reports of it locking up servers. This was not in response to a crash, just an extra precaution.

January 11, 2022

Just now, bearcat2004 said:

Where would I create this file if I were to create this file the same way?

From the terminal type 'touch /boot/config/modprobe.d/i915.conf'

Intel GPU Top tries to load i915 as well and I thought it might be interfering so I removed it.

January 11, 2022

UPDATE: I rolled back to 6.9.2 from 6.10.0 rc2 and still had random system hangs. 16 days ago I did the following and have not had an issue since then:

recreated i915.conf with the 'touch' method
uninstalled the Intel GPU Top plugin
uninstalled the GPU Statistics plugin
enabled Turbo Boost in the Tips and Tweaks plugin (it had been disabled because heavy sustained workloads had been causing crashes with Turbo Boost)

UPDATE: A few days ago, I also uninstalled the CoreFreq plugin. I had not had problems with it installed but I see that it has been recommended to others with server lockups to uninstall that plugin.

I am not declaring victory yet nor do I think the above is necessarily a solution. This just happens to be the longest I have ever gone without a system hang since I started having the problems last July.

- Over five months of running 6.92 and 6.10.x with no hangs and many QSV transcodes. The only thing that appears to cause a hang is when Turbo Boost is enabled and there is high CPU load. This only happened again after upgrading to 6.10.0. I have Turbo Boost off for now.

December 6, 2021

12 hours ago, bonienl said:

There is little to go on here

Agreed. There is virtually nothing to go on in my post. Perhaps I will upgrade again and get some diagnostics.

I did see another bug report about WSDD2 crashing and thought I must have a similar issue. My inability to access the server/shares via Windows only started with the 6.10.0-rc2 release. The server just disappears from the Windows network (randomly?) on all my Windows clients or was often never there when I rebooted the server.

Same with the Terminal issues (6.10.0-rc2 only). Most times it would never connect and just displayed "Press (Enter key symbol] to connect." I have no such problems in 6.9.2. If it did connect, it took an unusually long time to do so. Perhaps somehow related to the slowness I saw in moving between tabs and reading array info in Dashboard and Main.

I have changed nothing on my network recently other than updating firmware on my Ubiquiti Unifi router, switches and APs. That was done while running 6.10.0-rc1 with no adverse effects.

December 3, 2021

When these system hangs happen for me (started on 6.9.2 and have continued with 6.10.0 RC), there is nothing meaningful in the syslog around the time of the hang but this is sometimes reported in the IPMI log.

An OS Stop/Shutdown sounds like a kernel/driver issue to me. It's like the rest of the system is working but the OS just decided to shutdown.

FWIW, I have never seen these shutdowns as a result of hardware transcoding.

February 2, 2021

6 minutes ago, thecode said:

`-i br0` solve this or just reduce the rate.

Right, this is not a solution as it does not completely prevent the issue, but it does certainly reduce by a lot how often I see the problem on my servers.

February 2, 2021

1 hour ago, mivadebe said:

Also an issue here. Hope changing these settings will help. I'm currently 3 days in trial period and these kind of issues are not what I'm looking for:)

The WSD Daemon (wsdd) is not developed by Limetech. They have merely chosen to implement it in unRAID for those who wish to have an alternative to SMB/netBIOS for network discovery. There is probably not much they can do to fix this issue unless it is proven to be some interaction with wsdd and core unRAID services that causes the problem.

I have completely disabled SMB v1 on all my Windows clients and am using WSD exclusively. The -i br0 parameter has worked for me to mostly prevent the single-core 100% CPU usage issue. Every 3-4 months it may pop up, but a reboot fixes it (or the stop array, disable WSD, start array enable WSD procedure).

Hoopster

Posts

Joined

Last visited

Days Won

Content Type

Profiles

Forums

Downloads

Store

Gallery

Bug Reports

Documentation

Landing

Report Comments posted by Hoopster

Can't update to 6.12.0 on one server

Can't update to 6.12.0 on one server

Can't update to 6.12.0 on one server

Can't update to 6.12.0 on one server

Can't update to 6.12.0 on one server

Unraid OS version 6.11.0-rc1 available

[6.9.x - 6.11.x] intel i915 module causing system hangs with no report in syslog (not alder lake)

[6.9.x - 6.11.x] intel i915 module causing system hangs with no report in syslog (not alder lake)

[6.9.x - 6.11.x] intel i915 module causing system hangs with no report in syslog (not alder lake)

Unraid OS version 6.10.0-rc3 available

Unraid OS version 6.10.0-rc3 available

Unraid OS version 6.10.0-rc3 available

Unraid OS version 6.10.0-rc3 available

Unraid OS version 6.10.0-rc3 available

Unraid OS version 6.10.0-rc3 available

Unraid OS version 6.10.0-rc3 available

[6.9.x - 6.11.x] intel i915 module causing system hangs with no report in syslog (not alder lake)

[6.9.x - 6.11.x] intel i915 module causing system hangs with no report in syslog (not alder lake)

[6.9.x - 6.11.x] intel i915 module causing system hangs with no report in syslog (not alder lake)

[6.9.x - 6.11.x] intel i915 module causing system hangs with no report in syslog (not alder lake)

[6.9.x - 6.11.x] intel i915 module causing system hangs with no report in syslog (not alder lake)

Several issues with 6.10.0-rc2 (downgraded to 6.9.2)

[6.9.x - 6.11.x] intel i915 module causing system hangs with no report in syslog (not alder lake)

[6.8.0] <wsdd high cpu>

[6.8.0] <wsdd high cpu>