Jump to content

GPU not visible/not found in system devices


21cmal

Recommended Posts

Hi guys, I have this device not found issue, my GPU is not found by Unraid.

Before this happens I passed through the GPU to a Linux VM and then since the VM was frozen I force stop it and I ran this user script:

#!/bin/bash
#
#replace xx\:xx.x with the number of your gpu and sound counterpart
#
#
echo "disconnecting amd graphics"
echo "1" | tee -a /sys/bus/pci/devices/0000\:06\:00.0/remove
echo "disconnecting amd sound counterpart"
echo "1" | tee -a /sys/bus/pci/devices/0000\:06\:00.1/remove
echo "entered suspended state press power button to continue"
echo -n mem > /sys/power/state
echo "reconnecting amd gpu and sound counterpart"
echo "1" | tee -a /sys/bus/pci/rescan
echo "AMD graphics card sucessfully reset"

It didn't complete and I kept it running for like a few mins and I just reboot my server, I think this is the reason why my GPU become not found by unraid.

 

Here is the vfio-pci log:

Loading config from /boot/config/vfio-pci.cfg
BIND=0000:03:00.0|1106:3483 0000:06:00.0|1002:67df 0000:06:00.1|1002:aaf0
---
Processing 0000:03:00.0 1106:3483
Vendor:Device 1106:3483 found at 0000:03:00.0

IOMMU group members (sans bridges):
/sys/bus/pci/devices/0000:03:00.0/iommu_group/devices/0000:03:00.0

Binding...
Unbound 0000:03:00.0 from xhci_hcd
Successfully bound the device 1106:3483 at 0000:03:00.0 to vfio-pci
---
Processing 0000:06:00.0 1002:67df
Error: Vendor:Device 1002:67df not found at 0000:06:00.0, unable to bind device
---
Processing 0000:06:00.1 1002:aaf0
Error: Device 0000:06:00.1 does not exist, unable to bind device
---
vfio-pci binding complete

Devices listed in /sys/bus/pci/drivers/vfio-pci:
lrwxrwxrwx 1 root root    0 Apr  1 21:39 0000:03:00.0 -> ../../../../devices/pci0000:00/0000:00:01.3/0000:01:00.2/0000:02:00.0/0000:03:00.0

The device is stated not found and unable to bind device, I've tried reseating the GPU, rebooting, upgrading and downgrading the server but nothing works out so far... Could it probably be a BIOS or hardware problem? Anyone have experienced this before and maybe know the solution? 

diagnostics-20230402-0548.zip

Link to comment
9 hours ago, 21cmal said:

Hi guys, I have this device not found issue, my GPU is not found by Unraid.

Before this happens I passed through the GPU to a Linux VM and then since the VM was frozen I force stop it and I ran this user script:

#!/bin/bash
#
#replace xx\:xx.x with the number of your gpu and sound counterpart
#
#
echo "disconnecting amd graphics"
echo "1" | tee -a /sys/bus/pci/devices/0000\:06\:00.0/remove
echo "disconnecting amd sound counterpart"
echo "1" | tee -a /sys/bus/pci/devices/0000\:06\:00.1/remove
echo "entered suspended state press power button to continue"
echo -n mem > /sys/power/state
echo "reconnecting amd gpu and sound counterpart"
echo "1" | tee -a /sys/bus/pci/rescan
echo "AMD graphics card sucessfully reset"

It didn't complete and I kept it running for like a few mins and I just reboot my server, I think this is the reason why my GPU become not found by unraid.

 

Here is the vfio-pci log:

Loading config from /boot/config/vfio-pci.cfg
BIND=0000:03:00.0|1106:3483 0000:06:00.0|1002:67df 0000:06:00.1|1002:aaf0
---
Processing 0000:03:00.0 1106:3483
Vendor:Device 1106:3483 found at 0000:03:00.0

IOMMU group members (sans bridges):
/sys/bus/pci/devices/0000:03:00.0/iommu_group/devices/0000:03:00.0

Binding...
Unbound 0000:03:00.0 from xhci_hcd
Successfully bound the device 1106:3483 at 0000:03:00.0 to vfio-pci
---
Processing 0000:06:00.0 1002:67df
Error: Vendor:Device 1002:67df not found at 0000:06:00.0, unable to bind device
---
Processing 0000:06:00.1 1002:aaf0
Error: Device 0000:06:00.1 does not exist, unable to bind device
---
vfio-pci binding complete

Devices listed in /sys/bus/pci/drivers/vfio-pci:
lrwxrwxrwx 1 root root    0 Apr  1 21:39 0000:03:00.0 -> ../../../../devices/pci0000:00/0000:00:01.3/0000:01:00.2/0000:02:00.0/0000:03:00.0

The device is stated not found and unable to bind device, I've tried reseating the GPU, rebooting, upgrading and downgrading the server but nothing works out so far... Could it probably be a BIOS or hardware problem? Anyone have experienced this before and maybe know the solution? 

diagnostics-20230402-0548.zip 129.14 kB · 1 download

GPU is not showing on the system, are you able to test on another system or check power to it?

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...