Hoopster

Members
  • Posts

    4565
  • Joined

  • Last visited

  • Days Won

    26

Report Comments posted by Hoopster

  1. 2 hours ago, ich777 said:

    Can you please test and try to disable VT-d in the BIOS, remove the i915.conf file (or move it to the root from your USB Boot device), reboot and see if that works?

    Yes, the server boots into 6.12.0 if I disable Intel Virtualization Technologies in the BIOS and with no i915.conf file in modprobe.d

     

    iGPU is still enabled as /dev/dri with the appropriate files exists and I am able to hardware transcode in Plex.

     

    This is also an acceptable short-term solution on this server as, for now, I do not intend to run any VMs on the server.

     

    @ich777 UPDATE: After a couple of more reboots, diabled Vt-d no longer solved the problem.  The server was back to the prior no-boot behavior.  After restoring the i915.conf file with the options line, it again boots successfully.  That appears to be the only "permanent" solution.

    • Like 2
  2. 36 minutes ago, SimonF said:

    echo "options i915 enable_dc=0" >> /boot/config/modprobe.d/i915.conf

    That seems to have worked.  Thank you.  I guess I just missed that as it appears to have been mentioned previously.

     

    It makes sense as, even in 6.11.5, I had to disable low power mode for the HandBrake presets using QSV (lowpower=0 in More Settings in the Video tab) or all encodes attempted with QSV crashed.

  3. 1 hour ago, JorgeB said:

    I was afraid that might be necessary as the video seemed to indicate the boot process stopped with loading the i915 driver.

     

    Blacklisting i915 allowed the server to boot properly but this server has only the iGPU for graphics and is primarily a Plex server.

     

    Apparently, the Linux kernel in 6.11.5 has no problems with the i5-11600 CPU/iGPU but the kernel in 6.12.0 does not like it.  🙁

  4. I updated to 6.11.0-rc1 on my backup server.  Update went well. 

     

    The only thing I have noticed is that motherboard and CPU temps are not showing in GUI dashboard.  They were present in 6.10.3.  I am using the IPMI plugin and that continues to report CPU and MB temps in the GUI footer.

     

    There was also a motherboard BIOS update available (security mitigations) and I did that update before the unRAID update.  That could be the cause I suppose but, again, IPMI plugin still detects sensors and displays the temps.

     

     

    backupnas-diagnostics-20220728-0912.zip

  5. 1 hour ago, NightOps said:

    Won't uninstalling Intel GPU Top plugin roll back the blacklisting of the i915 driver and keep HW transcoding from working?

    I am still running 6.9.2 on that server and the iGPU is enabled via the touch method which creates and empty i915.conf file in modprobe.d

     

    If that file did not exist, i915 would remain blacklisted (the default in 6.9.2) and no HW transcoding would take place.

  6. 7 hours ago, bonienl said:

    You diagnostics show several plugins installed. Can you retest in safe mode to rule out the plugins?

    Booted in safe mode. Let the GUI sit for about 10 minutes, no Nchan errors although I continued to see the problem with slow populating of data in the Main tab and terminal sessions either closing or taking a while to display the command prompt.

     

    Plugins installed are the following:

    image.png.36b724f92b9b89da58fa21d37e1cc0ba.png

     

    I will see if I can determine which one may be causing the Nchan error.

  7. 4 minutes ago, Hoopster said:

    The error is still occurring but less frequently.  It seems to settle down a bit if I am not actively doing things in the GUI for a few minutes.  As soon as I start moving around in the GUI, back it comes with increasing frequency.

     

    I am also seeing that when I switch away from the Main tab in the GUI and then return to that tab, it takes a few seconds for data to load on the Main tab.  Data in other tabs loads instantly.

    @bonienl Not seeing the Nchan error or Main tab issue in Chrome; just Firefox (version 98.0). 

     

    Also, in Firefox only, opening  a terminal session takes a few seconds to display the command prompt (or the window just closes); instantaneous through Chrome.

  8. 1 minute ago, bonienl said:

    Is your server heavily loaded at startup?

    Can you post your diagnostics when this happens.

     

    The error is still occurring but less frequently.  It seems to settle down a bit if I am not actively doing things in the GUI for a few minutes.  As soon as I start moving around in the GUI, back it comes with increasing frequency.

     

    I am also seeing that when I switch away from the Main tab in the GUI and then return to that tab, it takes a few seconds for data to load on the Main tab.  Data in other tabs loads instantly.

     

    This server should not be very busy at startup.  It is my backup server with fewer Docker containers and plugins than the main server and no VMs.

     

    Diagnostics attached.

     

    backupnas-diagnostics-20220313-1152.zip

  9. 20 hours ago, bonienl said:

    I have not encountered this in my testing with any of the main browsers.

    In my case Nchan communication only gets broken when a device goes in powersaving or sleep mode, and I need to "wake-up" the communication.

    Just powered on the server again and brought up the GUI.  The Nchan error comes up after about 10 seconds.  If I click OK it goes away then reappears a few seconds later.  I have dismissed it 7 times now and it continues to pop up in the GUI.

     

    image.png.3990c3cdf9c60541efb5371423ffb626.png

  10. 14 minutes ago, bonienl said:

    Do you get this while actively working with the browser or after some time of inactivity and the system going in powersaving mode?

    I get this frequently right after powering on the server and starting the GUI.  It would appear after the GUI sat idle for a several seconds or while attempting to navigate between tabs.

     

    Interestingly, the GUI sat idle for 15 minutes or more and the error is not appearing again.  I have also tried navigating between tabs without issue.

     

    Right after upgrading yesterday and restarting the server and GUI today, I would see the error pop up every 10 seconds or so.  Now, it seem to have mysteriously disappeared.  I'll report if I see it again.

  11. 3 minutes ago, snailtrails said:

    That Xeon is coffee lake (9th gen intel) which Unraid is certified for

    If you look at the hardware specs for the user that started this bug report as well as others in the thread he linked, you will see that there are many reports of i915 lockups with 9th generation Intel CPUs/iGPUs.  It may be more prevalent in the 1tth and 12th generations that don't have official iGPU support in the i915 drivers but it seems to not be limited to those generations.

  12. 20 hours ago, NightOps said:

    What CPU are you running?  Are you using iGPU passthrough to Plex for transcoding?

    Xeon E-2288G and yes I am using the iGPU (/dev/dri) for Plex and HandBrake transcoding.  Server never locked up when transcoding was happening.  It always seemed to happen during idle time. 

     

    I am not saying Intel-GPU-Top was the problem, but, I have not had a crash since removing it and GPU Statistics.  Running unRAID 6.9.2 currently.  Lockups previously happened on that version plus the two 6.10.0 RCs.

  13. UPDATE:  I rolled back to 6.9.2 from 6.10.0 rc2 and still had random system hangs.  16 days ago I did the following and have not had an issue since then:

    • recreated i915.conf with the 'touch' method
    • uninstalled the Intel GPU Top plugin
    • uninstalled the GPU Statistics plugin
    • enabled Turbo Boost in the Tips and Tweaks plugin (it had been disabled because heavy sustained workloads had been causing crashes with Turbo Boost)

     

    UPDATE: A few days ago, I also uninstalled the CoreFreq  plugin.  I had not had problems with it installed but I see that it has been recommended to others with server lockups to uninstall that plugin.

     

    I am not declaring victory yet nor do I think the above is necessarily a solution.  This just happens to be the longest I have ever gone without a system hang since I started having the problems last July.

     

    - Over five months of running 6.92 and 6.10.x with no hangs and many QSV transcodes.  The only thing that appears to cause a hang is when Turbo Boost is enabled and there is high CPU load.  This only happened again after upgrading to 6.10.0.  I have Turbo Boost off for now.

    • Like 1
  14. 12 hours ago, bonienl said:

    There is little to go on here

    Agreed.  There is virtually nothing to go on in my post.  Perhaps I will upgrade again and get some diagnostics.

     

    I did see another bug report about WSDD2 crashing and thought I must have a similar issue.  My inability to access the server/shares via Windows only started with the 6.10.0-rc2 release.  The server just disappears from the Windows network (randomly?) on all my Windows clients or was often never there when I rebooted the server.

     

    Same with the Terminal issues (6.10.0-rc2 only).  Most times it would never connect and just displayed "Press (Enter key symbol] to connect."  I have no such problems in 6.9.2.  If it did connect, it took an unusually long time to do so.  Perhaps somehow related to the slowness I saw in moving between tabs and reading array info in Dashboard and Main.

     

    I have changed nothing on my network recently other than updating firmware on my Ubiquiti Unifi router, switches and APs.  That was done while running 6.10.0-rc1 with no adverse effects.

  15. When these system hangs happen for me (started on 6.9.2 and have continued with 6.10.0 RC), there is nothing meaningful in the syslog around the time of the hang but this is sometimes reported in the IPMI log.

     

    image.thumb.png.5e821275e783d280ef04e91ce8417db7.png

     

    An OS Stop/Shutdown sounds like a kernel/driver issue to me.  It's like the rest of the system is working but the OS just decided to shutdown.

     

    FWIW, I have never seen these shutdowns as a result of hardware transcoding.

    • Like 2
  16. 1 hour ago, mivadebe said:

    Also an issue here. Hope changing these settings will help. I'm currently 3 days in trial period and these kind of issues are not what I'm looking for:)

    The WSD Daemon (wsdd) is not developed by Limetech.  They have merely chosen to implement it in unRAID for those who wish to have an alternative to SMB/netBIOS for network discovery.  There is probably not much they can do to fix this issue unless it is proven to be some interaction with wsdd and core unRAID services that causes the problem.

     

    I have completely disabled SMB v1 on all my Windows clients and am using WSD exclusively.  The -i br0 parameter has worked for me to mostly prevent the single-core 100% CPU usage issue.  Every 3-4 months it may pop up, but a reboot fixes it (or the stop array, disable WSD, start array enable WSD procedure).