-
Posts
4565 -
Joined
-
Last visited
-
Days Won
26
Content Type
Profiles
Forums
Downloads
Store
Gallery
Bug Reports
Documentation
Landing
Report Comments posted by Hoopster
-
-
12 hours ago, JorgeB said:
Can this be closed?
As far as I am concerned, yes, it can be closed as the server in question boots consistently and without errors since adding the indicated line to i915.conf.
- 1
-
5 hours ago, SimonF said:
yes you can plug into another machine create or edit the file in the flash and add to file.
@VeeTECH That's what I did to add "options i915 enable_dc=0" to the i915.conf file on the flash drive.
-
36 minutes ago, SimonF said:
echo "options i915 enable_dc=0" >> /boot/config/modprobe.d/i915.conf
That seems to have worked. Thank you. I guess I just missed that as it appears to have been mentioned previously.
It makes sense as, even in 6.11.5, I had to disable low power mode for the HandBrake presets using QSV (lowpower=0 in More Settings in the Video tab) or all encodes attempted with QSV crashed.
-
1 hour ago, JorgeB said:
Try blacklisting the i915 driver:
https://docs.unraid.net/unraid-os/release-notes/6.10.0#linux-kernel
I was afraid that might be necessary as the video seemed to indicate the boot process stopped with loading the i915 driver.
Blacklisting i915 allowed the server to boot properly but this server has only the iGPU for graphics and is primarily a Plex server.
Apparently, the Linux kernel in 6.11.5 has no problems with the i5-11600 CPU/iGPU but the kernel in 6.12.0 does not like it. 🙁
-
I updated to 6.11.0-rc1 on my backup server. Update went well.
The only thing I have noticed is that motherboard and CPU temps are not showing in GUI dashboard. They were present in 6.10.3. I am using the IPMI plugin and that continues to report CPU and MB temps in the GUI footer.
There was also a motherboard BIOS update available (security mitigations) and I did that update before the unRAID update. That could be the cause I suppose but, again, IPMI plugin still detects sensors and displays the temps.
-
1 hour ago, flyize said:
Are you using Alder Lake?
Coffee Lake (Xeon E-2288G)
-
1 hour ago, NightOps said:
Won't uninstalling Intel GPU Top plugin roll back the blacklisting of the i915 driver and keep HW transcoding from working?
I am still running 6.9.2 on that server and the iGPU is enabled via the touch method which creates and empty i915.conf file in modprobe.d
If that file did not exist, i915 would remain blacklisted (the default in 6.9.2) and no HW transcoding would take place.
-
1 hour ago, Lee Kim Tatt said:
CoreFreq? I didn't install that, but using Intel GPU Top? Any relation between them?
I uninstalled Intel GPU Top and GPU Statistics and my server hangs stopped. I also uninstalled CoreFreq as a precaution, but, that was later.
-
@bonienl Rebooted a second time in Safe Mode. This time I got the Nchan error as seen below just a few seconds after starting the array.
-
7 hours ago, bonienl said:
You diagnostics show several plugins installed. Can you retest in safe mode to rule out the plugins?
Booted in safe mode. Let the GUI sit for about 10 minutes, no Nchan errors although I continued to see the problem with slow populating of data in the Main tab and terminal sessions either closing or taking a while to display the command prompt.
Plugins installed are the following:
I will see if I can determine which one may be causing the Nchan error.
-
4 minutes ago, Hoopster said:
The error is still occurring but less frequently. It seems to settle down a bit if I am not actively doing things in the GUI for a few minutes. As soon as I start moving around in the GUI, back it comes with increasing frequency.
I am also seeing that when I switch away from the Main tab in the GUI and then return to that tab, it takes a few seconds for data to load on the Main tab. Data in other tabs loads instantly.
@bonienl Not seeing the Nchan error or Main tab issue in Chrome; just Firefox (version 98.0).
Also, in Firefox only, opening a terminal session takes a few seconds to display the command prompt (or the window just closes); instantaneous through Chrome.
-
1 minute ago, bonienl said:
The error is still occurring but less frequently. It seems to settle down a bit if I am not actively doing things in the GUI for a few minutes. As soon as I start moving around in the GUI, back it comes with increasing frequency.
I am also seeing that when I switch away from the Main tab in the GUI and then return to that tab, it takes a few seconds for data to load on the Main tab. Data in other tabs loads instantly.
This server should not be very busy at startup. It is my backup server with fewer Docker containers and plugins than the main server and no VMs.
Diagnostics attached.
-
20 hours ago, bonienl said:
I have not encountered this in my testing with any of the main browsers.
In my case Nchan communication only gets broken when a device goes in powersaving or sleep mode, and I need to "wake-up" the communication.
Just powered on the server again and brought up the GUI. The Nchan error comes up after about 10 seconds. If I click OK it goes away then reappears a few seconds later. I have dismissed it 7 times now and it continues to pop up in the GUI.
-
14 minutes ago, bonienl said:
Do you get this while actively working with the browser or after some time of inactivity and the system going in powersaving mode?
I get this frequently right after powering on the server and starting the GUI. It would appear after the GUI sat idle for a several seconds or while attempting to navigate between tabs.
Interestingly, the GUI sat idle for 15 minutes or more and the error is not appearing again. I have also tried navigating between tabs without issue.
Right after upgrading yesterday and restarting the server and GUI today, I would see the error pop up every 10 seconds or so. Now, it seem to have mysteriously disappeared. I'll report if I see it again.
-
After updating my backup server to 6.10.0-RC3, I am getting the error below frequently. I have closed and reopened the browser (Firefox) and restarted the server a couple of times. I have also checked that there are not tabs open to the GUI for that server on other computers.
-
3 minutes ago, snailtrails said:
That Xeon is coffee lake (9th gen intel) which Unraid is certified for
If you look at the hardware specs for the user that started this bug report as well as others in the thread he linked, you will see that there are many reports of i915 lockups with 9th generation Intel CPUs/iGPUs. It may be more prevalent in the 1tth and 12th generations that don't have official iGPU support in the i915 drivers but it seems to not be limited to those generations.
-
20 hours ago, NightOps said:
What CPU are you running? Are you using iGPU passthrough to Plex for transcoding?
Xeon E-2288G and yes I am using the iGPU (/dev/dri) for Plex and HandBrake transcoding. Server never locked up when transcoding was happening. It always seemed to happen during idle time.
I am not saying Intel-GPU-Top was the problem, but, I have not had a crash since removing it and GPU Statistics. Running unRAID 6.9.2 currently. Lockups previously happened on that version plus the two 6.10.0 RCs.
-
40 minutes ago, bearcat2004 said:
@Hoopster has your machine crashed at any point?
No, I am at 50+ days of uptime since removing the Intel-GPU-top and GPU Statistics plugins.
I also recently removed the CoreFreq plugin as there have been several reports of it locking up servers. This was not in response to a crash, just an extra precaution.
- 1
-
Just now, bearcat2004 said:
Where would I create this file if I were to create this file the same way?
From the terminal type 'touch /boot/config/modprobe.d/i915.conf'
Intel GPU Top tries to load i915 as well and I thought it might be interfering so I removed it.
- 1
-
UPDATE: I rolled back to 6.9.2 from 6.10.0 rc2 and still had random system hangs. 16 days ago I did the following and have not had an issue since then:
- recreated i915.conf with the 'touch' method
- uninstalled the Intel GPU Top plugin
- uninstalled the GPU Statistics plugin
- enabled Turbo Boost in the Tips and Tweaks plugin (it had been disabled because heavy sustained workloads had been causing crashes with Turbo Boost)
UPDATE: A few days ago, I also uninstalled the CoreFreq plugin. I had not had problems with it installed but I see that it has been recommended to others with server lockups to uninstall that plugin.
I am not declaring victory yet nor do I think the above is necessarily a solution. This just happens to be the longest I have ever gone without a system hang since I started having the problems last July.
- Over five months of running 6.92 and 6.10.x with no hangs and many QSV transcodes. The only thing that appears to cause a hang is when Turbo Boost is enabled and there is high CPU load. This only happened again after upgrading to 6.10.0. I have Turbo Boost off for now.
- 1
-
12 hours ago, bonienl said:
There is little to go on here
Agreed. There is virtually nothing to go on in my post. Perhaps I will upgrade again and get some diagnostics.
I did see another bug report about WSDD2 crashing and thought I must have a similar issue. My inability to access the server/shares via Windows only started with the 6.10.0-rc2 release. The server just disappears from the Windows network (randomly?) on all my Windows clients or was often never there when I rebooted the server.
Same with the Terminal issues (6.10.0-rc2 only). Most times it would never connect and just displayed "Press (Enter key symbol] to connect." I have no such problems in 6.9.2. If it did connect, it took an unusually long time to do so. Perhaps somehow related to the slowness I saw in moving between tabs and reading array info in Dashboard and Main.
I have changed nothing on my network recently other than updating firmware on my Ubiquiti Unifi router, switches and APs. That was done while running 6.10.0-rc1 with no adverse effects.
-
When these system hangs happen for me (started on 6.9.2 and have continued with 6.10.0 RC), there is nothing meaningful in the syslog around the time of the hang but this is sometimes reported in the IPMI log.
An OS Stop/Shutdown sounds like a kernel/driver issue to me. It's like the rest of the system is working but the OS just decided to shutdown.
FWIW, I have never seen these shutdowns as a result of hardware transcoding.
- 2
-
6 minutes ago, thecode said:
`-i br0` solve this or just reduce the rate.
Right, this is not a solution as it does not completely prevent the issue, but it does certainly reduce by a lot how often I see the problem on my servers.
- 1
-
1 hour ago, mivadebe said:
Also an issue here. Hope changing these settings will help. I'm currently 3 days in trial period and these kind of issues are not what I'm looking for:)
The WSD Daemon (wsdd) is not developed by Limetech. They have merely chosen to implement it in unRAID for those who wish to have an alternative to SMB/netBIOS for network discovery. There is probably not much they can do to fix this issue unless it is proven to be some interaction with wsdd and core unRAID services that causes the problem.
I have completely disabled SMB v1 on all my Windows clients and am using WSD exclusively. The -i br0 parameter has worked for me to mostly prevent the single-core 100% CPU usage issue. Every 3-4 months it may pop up, but a reboot fixes it (or the stop array, disable WSD, start array enable WSD procedure).
Can't update to 6.12.0 on one server
in Stable Releases
Posted · Edited by Hoopster
Yes, the server boots into 6.12.0 if I disable Intel Virtualization Technologies in the BIOS and with no i915.conf file in modprobe.d
iGPU is still enabled as /dev/dri with the appropriate files exists and I am able to hardware transcode in Plex.
This is also an acceptable short-term solution on this server as, for now, I do not intend to run any VMs on the server.
@ich777 UPDATE: After a couple of more reboots, diabled Vt-d no longer solved the problem. The server was back to the prior no-boot behavior. After restoring the i915.conf file with the options line, it again boots successfully. That appears to be the only "permanent" solution.