Jump to content
  • (6.11.3) iGPU kick off


    Rockikone
    • Solved Urgent

    Hello,
    I would like to report a bug.
    My system is based on the following hardware:

     

    Motherboard GIGABYTE MW34-SP0-00 (W680 Chipset, Intel AlderLake CPU 12700k, 64 GB DDR4 ECC Ram, Nvidia 1070 GPU

     

    In the version Unraid 6.11 I get the following error. I only use the iGPU for Docker. The following Dockers are connected to /dev/dri/renderD128 in the configuration PLEX and Frigate.

     

    The iGPU always disappears from the system after 1-2 days of running Unraid. It then also no longer appears in the SystemDevices.

    To have the iGPU back, I have to restart the server. After that it is visible again in the System Devices under point 3.

    So I always notice the error, because Plex brings an error message that a movie can not be transcoded. Then I know the iGPU has been kicked out again.

    Attached are two screenshots from System Devices with "With iGPU" is after reboot. And "Without iGPU" which shows the loss of the iGPU. The iGPU is listed in PCI Devices and IOMMU groups and item 3.

     

    Furthermore the diagnostic files with and without iGPU.

     

    I only use the Nvidia GPU for VM's. Currently, however, also temporarily for Plex, as long as the iGPU problem exists.

    For questions just contact me.

    Greetings

     

    With iGPU.png

    Without iGPU.png

    With - homeserver-diagnostics-20221112-1117.zip Without-homeserver-diagnostics-20221112-1105.zip




    User Feedback

    Recommended Comments

    Not sure to the reason why but the kernel is removing the PCI Device. There is a segfault in guacd, does a docker crash or stop at this point also.

     

    guacamole is guacd

    Nov 11 09:56:48 Homeserver  avahi-daemon[8560]: Joining mDNS multicast group on interface vnet2.IPv6 with address fe80::fc54:ff:fe48:290e.
    Nov 11 09:56:48 Homeserver  avahi-daemon[8560]: New relevant interface vnet2.IPv6 for mDNS.
    Nov 11 09:56:48 Homeserver  avahi-daemon[8560]: Registering new address record for fe80::fc54:ff:fe48:290e on vnet2.*.
    Nov 11 09:56:48 Homeserver kernel: x86/split lock detection: #AC: CPU 1/KVM/11395 took a split_lock trap at address: 0x7fe6108c
    Nov 11 09:56:48 Homeserver kernel: x86/split lock detection: #AC: CPU 2/KVM/11396 took a split_lock trap at address: 0x7fe6108c
    Nov 11 09:56:48 Homeserver kernel: x86/split lock detection: #AC: CPU 5/KVM/11399 took a split_lock trap at address: 0x7fe6108c
    Nov 11 10:09:22 Homeserver kernel: guacd[20327]: segfault at 10 ip 0000152c8b4ce802 sp 0000152c8a5fcc80 error 4 in libguac-client-rdp.so.0.0.0[152c8b4bc000+1a000]
    Nov 11 10:09:22 Homeserver kernel: Code: 00 be 03 00 00 00 48 89 df e8 9a e7 fe ff b8 01 00 00 00 e9 7a ff ff ff 53 48 8b 07 48 89 fb 48 89 de 48 8b 40 10 48 8b 40 20 <48> 8b 78 10 e8 35 e7 fe ff 8b 43 18 85 c0 74 0e 31 c0 5b c3 66 2e
    Nov 11 10:10:29 Homeserver  avahi-daemon[8560]: Interface vnet2.IPv6 no longer relevant for mDNS.
    Nov 11 10:10:29 Homeserver  avahi-daemon[8560]: Leaving mDNS multicast group on interface vnet2.IPv6 with address fe80::fc54:ff:fe48:290e.
    Nov 11 10:10:29 Homeserver kernel: br0.25: port 4(vnet2) entered disabled state
    Nov 11 10:10:29 Homeserver kernel: device vnet2 left promiscuous mode
    Nov 11 10:10:29 Homeserver kernel: br0.25: port 4(vnet2) entered disabled state
    Nov 11 10:10:29 Homeserver  avahi-daemon[8560]: Withdrawing address record for fe80::fc54:ff:fe48:290e on vnet2.
    Nov 11 10:10:31 Homeserver  acpid: input device has been disconnected, fd 8
    Nov 11 10:10:32 Homeserver kernel: pci 0000:00:02.0: Removing from iommu group 3

     

    Edited by SimonF
    Link to comment

    Look for a BIOS update, you could also try adding:

    split_lock_detect=off

    to syslinux boot options and see if it helps.

    Link to comment

    @JorgeB

     

    There is no update available.  I had already looked in advance, before creating the error message. I'll try the syslinux boot parameter once now and give feedback.
    Thanks for helping

    Link to comment

    @SimonF

    You write that a Docker with the guacd identifier is possibly responsible for ejecting the iGPU.
    I have now found out that guacd stands for the container Guacamole. This was with me on Autstart at boot. I have now deactivated this once and restarted.
    Let's see if it was the problem.
    Greetings

    Link to comment

    @SimonF

    Okay, the reason the iGPU was kicked was because of the Guacamole Docker. This was running without special rights.

    I have been using this for several years. I don't know now if it's the new hardware environment or Unraid 6.11.

    I have now uninstalled Guacamole and the iGPU remains in the system.

    Greetings

    • Like 2
    Link to comment

    @JorgeB

    @SimonF

     

    Unfortunately, I have to open this error message again.
    In the meantime, the iGPU has gone bye-bye twice again.
    Yesterday for the last time. Since the server was already running for 6 days without problems.
    I can narrow down the error pretty well in the meantime.
    Yesterday I had to work on a Windows VM, which runs on the Unraid server.
    At 15:37 I was done with the work and shut down the VM (in the VM). I was connected with RDP.

    As you can see in the log, the iGPU flew out of the system at 15:37. So the story is related to the VM environment.
    The iGPU is only used for Docker and Unraid. It is not mounted in any VM. There is no dGPU installed in the system anymore, only the iGPU.
    I don't know now if the problem is with Unraid or the bios. But the error is unfortunately still there and is related to the VM environment!

     

    Greetings

     

    Nov 29 15:37:46 Homeserver  avahi-daemon[7715]: Interface vnet2.IPv6 no longer relevant for mDNS.
    Nov 29 15:37:46 Homeserver  avahi-daemon[7715]: Leaving mDNS multicast group on interface vnet2.IPv6 with address fe80::fc54:ff:fe0e:bc60.
    Nov 29 15:37:46 Homeserver kernel: br0.25: port 4(vnet2) entered disabled state
    Nov 29 15:37:46 Homeserver kernel: device vnet2 left promiscuous mode
    Nov 29 15:37:46 Homeserver kernel: br0.25: port 4(vnet2) entered disabled state
    Nov 29 15:37:46 Homeserver kernel: sdc: sdc1 sdc2 sdc3 sdc4
    Nov 29 15:37:46 Homeserver  avahi-daemon[7715]: Withdrawing address record for fe80::fc54:ff:fe0e:bc60 on vnet2.
    Nov 29 15:37:46 Homeserver unassigned.devices: Disk with serial 'Samsung_SSD_850_EVO_250GB_S2R6NB0J531111V', mountpoint 'Samsung_SSD_850_EVO_250GB_S2R6NB0J531111V-part2' is set as passed through.
    Nov 29 15:37:46 Homeserver unassigned.devices: Disk with serial 'Samsung_SSD_850_EVO_250GB_S2R6NB0J531111V', mountpoint 'Samsung_SSD_850_EVO_250GB_S2R6NB0J531111V-part1' is set as passed through.
    Nov 29 15:37:46 Homeserver unassigned.devices: Disk with serial 'Samsung_SSD_850_EVO_250GB_S2R6NB0J531111V', mountpoint 'S2R6NB0J531111V-part4' is set as passed through.
    Nov 29 15:37:46 Homeserver unassigned.devices: Disk with serial 'Samsung_SSD_850_EVO_250GB_S2R6NB0J531111V', mountpoint 'S2R6NB0J531111V-part3' is set as passed through.
    Nov 29 15:37:49 Homeserver kernel: sdc: sdc1 sdc2 sdc3 sdc4
    Nov 29 15:37:49 Homeserver kernel: sdc: sdc1 sdc2 sdc3 sdc4
    Nov 29 15:37:49 Homeserver  acpid: input device has been disconnected, fd 8
    Nov 29 15:37:49 Homeserver unassigned.devices: Disk with serial 'Samsung_SSD_850_EVO_250GB_S2R6NB0J531111V', mountpoint 'Samsung_SSD_850_EVO_250GB_S2R6NB0J531111V-part2' is set as passed through.
    Nov 29 15:37:49 Homeserver unassigned.devices: Disk with serial 'Samsung_SSD_850_EVO_250GB_S2R6NB0J531111V', mountpoint 'S2R6NB0J531111V-part3' is set as passed through.
    Nov 29 15:37:49 Homeserver unassigned.devices: Disk with serial 'Samsung_SSD_850_EVO_250GB_S2R6NB0J531111V', mountpoint 'S2R6NB0J531111V-part4' is set as passed through.
    Nov 29 15:37:49 Homeserver unassigned.devices: Disk with serial 'Samsung_SSD_850_EVO_250GB_S2R6NB0J531111V', mountpoint 'Samsung_SSD_850_EVO_250GB_S2R6NB0J531111V-part1' is set as passed through.
    Nov 29 15:37:49 Homeserver unassigned.devices: Disk with serial 'Samsung_SSD_850_EVO_250GB_S2R6NB0J531111V', mountpoint 'S2R6NB0J531111V-part4' is set as passed through.
    Nov 29 15:37:49 Homeserver unassigned.devices: Disk with serial 'Samsung_SSD_850_EVO_250GB_S2R6NB0J531111V', mountpoint 'Samsung_SSD_850_EVO_250GB_S2R6NB0J531111V-part1' is set as passed through.
    Nov 29 15:37:49 Homeserver unassigned.devices: Disk with serial 'Samsung_SSD_850_EVO_250GB_S2R6NB0J531111V', mountpoint 'S2R6NB0J531111V-part3' is set as passed through.
    Nov 29 15:37:49 Homeserver unassigned.devices: Disk with serial 'Samsung_SSD_850_EVO_250GB_S2R6NB0J531111V', mountpoint 'Samsung_SSD_850_EVO_250GB_S2R6NB0J531111V-part2' is set as passed through.
    Nov 29 15:37:49 Homeserver kernel: pci 0000:00:02.0: Removing from iommu group 3
    Nov 29 15:51:29 Homeserver kernel: hrtimer: interrupt took 12313 ns

     

    homeserver-diagnostics-20221130-0727.zip

    • Like 1
    Link to comment

    @JorgeB

    @SimonF

     

    Error could be found and fixed in the German subforum now. The disks were previously installed in a Coffee Lake Xeon system. There I used the plugin gvt-g. When changing the server to AlderLake, the plugin was uninstalled beforehand, but apparently there were still config remnants in the file /etc/libvirt/hooks/qemu.
    I have now deleted the libvirt file and recreated the VM's. Error is gone. I would never have thought of that.

    I wish you all a great 2023

     

    Greetings from Bavaria

     

    Thomas

     

    • Like 1
    Link to comment
    2 minutes ago, Rockikone said:

    @JorgeB

    @SimonF

     

    Error could be found and fixed in the German subforum now. The disks were previously installed in a Coffee Lake Xeon system. There I used the plugin gvt-g. When changing the server to AlderLake, the plugin was uninstalled beforehand, but apparently there were still config remnants in the file /etc/libvirt/hooks/qemu.
    I have now deleted the libvirt file and recreated the VM's. Error is gone. I would never have thought of that.

    I wish you all a great 2023

     

    Greetings from Bavaria

     

    Thomas

     

    Great news Thomas, have a good 2023 regards Simon.

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.

×
×
  • Create New...