(6.11.3) iGPU kick off

SimonF · November 12, 2022

Not sure to the reason why but the kernel is removing the PCI Device. There is a segfault in guacd, does a docker crash or stop at this point also.

guacamole is guacd

Nov 11 09:56:48 Homeserver  avahi-daemon[8560]: Joining mDNS multicast group on interface vnet2.IPv6 with address fe80::fc54:ff:fe48:290e.
Nov 11 09:56:48 Homeserver  avahi-daemon[8560]: New relevant interface vnet2.IPv6 for mDNS.
Nov 11 09:56:48 Homeserver  avahi-daemon[8560]: Registering new address record for fe80::fc54:ff:fe48:290e on vnet2.*.
Nov 11 09:56:48 Homeserver kernel: x86/split lock detection: #AC: CPU 1/KVM/11395 took a split_lock trap at address: 0x7fe6108c
Nov 11 09:56:48 Homeserver kernel: x86/split lock detection: #AC: CPU 2/KVM/11396 took a split_lock trap at address: 0x7fe6108c
Nov 11 09:56:48 Homeserver kernel: x86/split lock detection: #AC: CPU 5/KVM/11399 took a split_lock trap at address: 0x7fe6108c
Nov 11 10:09:22 Homeserver kernel: guacd[20327]: segfault at 10 ip 0000152c8b4ce802 sp 0000152c8a5fcc80 error 4 in libguac-client-rdp.so.0.0.0[152c8b4bc000+1a000]
Nov 11 10:09:22 Homeserver kernel: Code: 00 be 03 00 00 00 48 89 df e8 9a e7 fe ff b8 01 00 00 00 e9 7a ff ff ff 53 48 8b 07 48 89 fb 48 89 de 48 8b 40 10 48 8b 40 20 <48> 8b 78 10 e8 35 e7 fe ff 8b 43 18 85 c0 74 0e 31 c0 5b c3 66 2e
Nov 11 10:10:29 Homeserver  avahi-daemon[8560]: Interface vnet2.IPv6 no longer relevant for mDNS.
Nov 11 10:10:29 Homeserver  avahi-daemon[8560]: Leaving mDNS multicast group on interface vnet2.IPv6 with address fe80::fc54:ff:fe48:290e.
Nov 11 10:10:29 Homeserver kernel: br0.25: port 4(vnet2) entered disabled state
Nov 11 10:10:29 Homeserver kernel: device vnet2 left promiscuous mode
Nov 11 10:10:29 Homeserver kernel: br0.25: port 4(vnet2) entered disabled state
Nov 11 10:10:29 Homeserver  avahi-daemon[8560]: Withdrawing address record for fe80::fc54:ff:fe48:290e on vnet2.
Nov 11 10:10:31 Homeserver  acpid: input device has been disconnected, fd 8
Nov 11 10:10:32 Homeserver kernel: pci 0000:00:02.0: Removing from iommu group 3

Edited November 12, 2022 by SimonF

JorgeB · November 12, 2022

Look for a BIOS update, you could also try adding:

split_lock_detect=off

to syslinux boot options and see if it helps.

Rockikone · November 12, 2022

@JorgeB

There is no update available. I had already looked in advance, before creating the error message. I'll try the syslinux boot parameter once now and give feedback.
Thanks for helping

Rockikone · November 16, 2022

@JorgeB

No luck. The iGPU (System Devices Nr 3) is gone again after 3 Days work.

I had removed the Nvidia Gpu, maybe there is the problem.

Wait for new Bios. The last one is from 01.07.2022 😞

Greetings

homeserver-diagnostics-20221116-0726.zip

Edited November 16, 2022 by Rockikone

Rockikone · November 18, 2022

@SimonF

You write that a Docker with the guacd identifier is possibly responsible for ejecting the iGPU.
I have now found out that guacd stands for the container Guacamole. This was with me on Autstart at boot. I have now deactivated this once and restarted.
Let's see if it was the problem.
Greetings

Rockikone · November 23, 2022

@SimonF

Okay, the reason the iGPU was kicked was because of the Guacamole Docker. This was running without special rights.

I have been using this for several years. I don't know now if it's the new hardware environment or Unraid 6.11.

I have now uninstalled Guacamole and the iGPU remains in the system.

Greetings

Rockikone · November 23, 2022

Changed Status to Solved

Rockikone · November 30, 2022

Changed Status to Open

Rockikone · November 30, 2022

@JorgeB

@SimonF

Unfortunately, I have to open this error message again.
In the meantime, the iGPU has gone bye-bye twice again.
Yesterday for the last time. Since the server was already running for 6 days without problems.
I can narrow down the error pretty well in the meantime.
Yesterday I had to work on a Windows VM, which runs on the Unraid server.
At 15:37 I was done with the work and shut down the VM (in the VM). I was connected with RDP.

As you can see in the log, the iGPU flew out of the system at 15:37. So the story is related to the VM environment.
The iGPU is only used for Docker and Unraid. It is not mounted in any VM. There is no dGPU installed in the system anymore, only the iGPU.
I don't know now if the problem is with Unraid or the bios. But the error is unfortunately still there and is related to the VM environment!

Greetings

Nov 29 15:37:46 Homeserver  avahi-daemon[7715]: Interface vnet2.IPv6 no longer relevant for mDNS.
Nov 29 15:37:46 Homeserver  avahi-daemon[7715]: Leaving mDNS multicast group on interface vnet2.IPv6 with address fe80::fc54:ff:fe0e:bc60.
Nov 29 15:37:46 Homeserver kernel: br0.25: port 4(vnet2) entered disabled state
Nov 29 15:37:46 Homeserver kernel: device vnet2 left promiscuous mode
Nov 29 15:37:46 Homeserver kernel: br0.25: port 4(vnet2) entered disabled state
Nov 29 15:37:46 Homeserver kernel: sdc: sdc1 sdc2 sdc3 sdc4
Nov 29 15:37:46 Homeserver  avahi-daemon[7715]: Withdrawing address record for fe80::fc54:ff:fe0e:bc60 on vnet2.
Nov 29 15:37:46 Homeserver unassigned.devices: Disk with serial 'Samsung_SSD_850_EVO_250GB_S2R6NB0J531111V', mountpoint 'Samsung_SSD_850_EVO_250GB_S2R6NB0J531111V-part2' is set as passed through.
Nov 29 15:37:46 Homeserver unassigned.devices: Disk with serial 'Samsung_SSD_850_EVO_250GB_S2R6NB0J531111V', mountpoint 'Samsung_SSD_850_EVO_250GB_S2R6NB0J531111V-part1' is set as passed through.
Nov 29 15:37:46 Homeserver unassigned.devices: Disk with serial 'Samsung_SSD_850_EVO_250GB_S2R6NB0J531111V', mountpoint 'S2R6NB0J531111V-part4' is set as passed through.
Nov 29 15:37:46 Homeserver unassigned.devices: Disk with serial 'Samsung_SSD_850_EVO_250GB_S2R6NB0J531111V', mountpoint 'S2R6NB0J531111V-part3' is set as passed through.
Nov 29 15:37:49 Homeserver kernel: sdc: sdc1 sdc2 sdc3 sdc4
Nov 29 15:37:49 Homeserver kernel: sdc: sdc1 sdc2 sdc3 sdc4
Nov 29 15:37:49 Homeserver  acpid: input device has been disconnected, fd 8
Nov 29 15:37:49 Homeserver unassigned.devices: Disk with serial 'Samsung_SSD_850_EVO_250GB_S2R6NB0J531111V', mountpoint 'Samsung_SSD_850_EVO_250GB_S2R6NB0J531111V-part2' is set as passed through.
Nov 29 15:37:49 Homeserver unassigned.devices: Disk with serial 'Samsung_SSD_850_EVO_250GB_S2R6NB0J531111V', mountpoint 'S2R6NB0J531111V-part3' is set as passed through.
Nov 29 15:37:49 Homeserver unassigned.devices: Disk with serial 'Samsung_SSD_850_EVO_250GB_S2R6NB0J531111V', mountpoint 'S2R6NB0J531111V-part4' is set as passed through.
Nov 29 15:37:49 Homeserver unassigned.devices: Disk with serial 'Samsung_SSD_850_EVO_250GB_S2R6NB0J531111V', mountpoint 'Samsung_SSD_850_EVO_250GB_S2R6NB0J531111V-part1' is set as passed through.
Nov 29 15:37:49 Homeserver unassigned.devices: Disk with serial 'Samsung_SSD_850_EVO_250GB_S2R6NB0J531111V', mountpoint 'S2R6NB0J531111V-part4' is set as passed through.
Nov 29 15:37:49 Homeserver unassigned.devices: Disk with serial 'Samsung_SSD_850_EVO_250GB_S2R6NB0J531111V', mountpoint 'Samsung_SSD_850_EVO_250GB_S2R6NB0J531111V-part1' is set as passed through.
Nov 29 15:37:49 Homeserver unassigned.devices: Disk with serial 'Samsung_SSD_850_EVO_250GB_S2R6NB0J531111V', mountpoint 'S2R6NB0J531111V-part3' is set as passed through.
Nov 29 15:37:49 Homeserver unassigned.devices: Disk with serial 'Samsung_SSD_850_EVO_250GB_S2R6NB0J531111V', mountpoint 'Samsung_SSD_850_EVO_250GB_S2R6NB0J531111V-part2' is set as passed through.
Nov 29 15:37:49 Homeserver kernel: pci 0000:00:02.0: Removing from iommu group 3
Nov 29 15:51:29 Homeserver kernel: hrtimer: interrupt took 12313 ns

homeserver-diagnostics-20221130-0727.zip

Rockikone · December 30, 2022

@JorgeB

@SimonF

Error could be found and fixed in the German subforum now. The disks were previously installed in a Coffee Lake Xeon system. There I used the plugin gvt-g. When changing the server to AlderLake, the plugin was uninstalled beforehand, but apparently there were still config remnants in the file /etc/libvirt/hooks/qemu.
I have now deleted the libvirt file and recreated the VM's. Error is gone. I would never have thought of that.

I wish you all a great 2023

Greetings from Bavaria

Thomas

Rockikone · December 30, 2022

Changed Status to Solved

SimonF · December 30, 2022

2 minutes ago, Rockikone said:

@JorgeB

@SimonF

Error could be found and fixed in the German subforum now. The disks were previously installed in a Coffee Lake Xeon system. There I used the plugin gvt-g. When changing the server to AlderLake, the plugin was uninstalled beforehand, but apparently there were still config remnants in the file /etc/libvirt/hooks/qemu.
I have now deleted the libvirt file and recreated the VM's. Error is gone. I would never have thought of that.

I wish you all a great 2023

Greetings from Bavaria

Thomas

Great news Thomas, have a good 2023 regards Simon.

(6.11.3) iGPU kick off

User Feedback

Recommended Comments

SimonF 1,162

Link to comment

JorgeB 8,223

Link to comment

Rockikone 44

Link to comment

Rockikone 44

Link to comment

Rockikone 44

Link to comment

Rockikone 44

Link to comment

Rockikone 44

Link to comment

Rockikone 44

Link to comment

Rockikone 44

Link to comment

Rockikone 44

Link to comment

Rockikone 44

Link to comment

SimonF 1,162

Link to comment

Join the conversation