Andrea Nizzola

Members

Joined
December 18, 20205 yr
Last visited
February 4, 20251 yr

View Profile Find content

[Plugin] Nvidia-Driver
[Plugin] Nvidia-Driver

Andrea Nizzola replied to ich777's topic in Plugin Support

Thank you for the reply, here is the diagnostic tower-diagnostics-20220828-1815.zip
- August 28, 20223 yr
- 5918 replies
[Plugin] Nvidia-Driver
[Plugin] Nvidia-Driver

Andrea Nizzola replied to ich777's topic in Plugin Support

Hi all, a few months ago the transcoding gpu has stopped working properly, after about 6 months since I set up the transcoding the Nvidia driver stopped recognizing it. I'm not getting the error message: " NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running. " I have tried everything, turning docker on and off, changing bios settings, reinstalling the plugin, testing a bunch of different driver versions and even downgrading unraid build but nothing seem to fix it. Unraid is able to see the device connected as the Quadro P1000 (which is correct) and even booting directly into windows or passing it through works fine. I can benchmark it without any issues. Just to make sure that the gpu wasn't an issue I also tried a different one and it behaved in exactly the same way. I'm going to leave attached a part of the logs while the machine was turning on, it's clear that the driver is trying to communicate with the GPU however the connection doesn't work. I'm not sure what else to try at this point, please give me a hand. Thanks. Aug 28 03:13:25 Tower kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 241 Aug 28 03:13:25 Tower kernel: NVRM: This PCI I/O region assigned to your NVIDIA device is invalid: Aug 28 03:13:25 Tower kernel: NVRM: BAR0 is 0M @ 0x0 (PCI:0000:02:00.0) Aug 28 03:13:25 Tower kernel: nvidia: probe of 0000:02:00.0 failed with error -1 Aug 28 03:13:25 Tower kernel: NVRM: The NVIDIA probe routine was not called for 1 device(s). Aug 28 03:13:25 Tower kernel: NVRM: This can occur when a driver such as: Aug 28 03:13:25 Tower kernel: NVRM: nouveau, rivafb, nvidiafb or rivatv Aug 28 03:13:25 Tower kernel: NVRM: was loaded and obtained ownership of the NVIDIA device(s). Aug 28 03:13:25 Tower kernel: NVRM: Try unloading the conflicting kernel module (and/or Aug 28 03:13:25 Tower kernel: NVRM: reconfigure your kernel without the conflicting Aug 28 03:13:25 Tower kernel: NVRM: driver(s)), then try loading the NVIDIA kernel module Aug 28 03:13:25 Tower kernel: NVRM: again. Aug 28 03:13:25 Tower kernel: NVRM: The NVIDIA probe routine failed for 1 device(s). Aug 28 03:13:25 Tower kernel: NVRM: None of the NVIDIA devices were initialized. Aug 28 03:13:25 Tower kernel: nvidia-nvlink: Unregistered Nvlink Core, major device number 241 Aug 28 03:13:25 Tower kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 241 Aug 28 03:13:25 Tower kernel: NVRM: This PCI I/O region assigned to your NVIDIA device is invalid: Aug 28 03:13:25 Tower kernel: NVRM: BAR0 is 0M @ 0x0 (PCI:0000:02:00.0) Aug 28 03:13:25 Tower kernel: nvidia: probe of 0000:02:00.0 failed with error -1 Aug 28 03:13:25 Tower kernel: NVRM: The NVIDIA probe routine was not called for 1 device(s). Aug 28 03:13:25 Tower kernel: NVRM: This can occur when a driver such as: Aug 28 03:13:25 Tower kernel: NVRM: nouveau, rivafb, nvidiafb or rivatv Aug 28 03:13:25 Tower kernel: NVRM: was loaded and obtained ownership of the NVIDIA device(s). Aug 28 03:13:25 Tower kernel: NVRM: Try unloading the conflicting kernel module (and/or Aug 28 03:13:25 Tower kernel: NVRM: reconfigure your kernel without the conflicting Aug 28 03:13:25 Tower kernel: NVRM: driver(s)), then try loading the NVIDIA kernel module Aug 28 03:13:25 Tower kernel: NVRM: again. Aug 28 03:13:25 Tower kernel: NVRM: The NVIDIA probe routine failed for 1 device(s). Aug 28 03:13:25 Tower kernel: NVRM: None of the NVIDIA devices were initialized. Aug 28 03:13:25 Tower kernel: nvidia-nvlink: Unregistered Nvlink Core, major device number 241 Aug 28 03:14:11 Tower webGUI: Successful login user root from 192.168.1.101 Aug 28 03:14:17 Tower kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 241 Aug 28 03:14:17 Tower kernel: NVRM: This PCI I/O region assigned to your NVIDIA device is invalid: Aug 28 03:14:17 Tower kernel: NVRM: BAR0 is 0M @ 0x0 (PCI:0000:02:00.0) Aug 28 03:14:17 Tower kernel: nvidia: probe of 0000:02:00.0 failed with error -1 Aug 28 03:14:17 Tower kernel: NVRM: The NVIDIA probe routine was not called for 1 device(s). Aug 28 03:14:17 Tower kernel: NVRM: This can occur when a driver such as: Aug 28 03:14:17 Tower kernel: NVRM: nouveau, rivafb, nvidiafb or rivatv Aug 28 03:14:17 Tower kernel: NVRM: was loaded and obtained ownership of the NVIDIA device(s). Aug 28 03:14:17 Tower kernel: NVRM: Try unloading the conflicting kernel module (and/or Aug 28 03:14:17 Tower kernel: NVRM: reconfigure your kernel without the conflicting Aug 28 03:14:17 Tower kernel: NVRM: driver(s)), then try loading the NVIDIA kernel module Aug 28 03:14:17 Tower kernel: NVRM: again. Aug 28 03:14:17 Tower kernel: NVRM: The NVIDIA probe routine failed for 1 device(s). Aug 28 03:14:17 Tower kernel: NVRM: None of the NVIDIA devices were initialized. Aug 28 03:14:17 Tower kernel: nvidia-nvlink: Unregistered Nvlink Core, major device number 241 Aug 28 03:14:17 Tower kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 241 Aug 28 03:14:17 Tower kernel: NVRM: This PCI I/O region assigned to your NVIDIA device is invalid: Aug 28 03:14:17 Tower kernel: NVRM: BAR0 is 0M @ 0x0 (PCI:0000:02:00.0) Aug 28 03:14:17 Tower kernel: nvidia: probe of 0000:02:00.0 failed with error -1 Aug 28 03:14:17 Tower kernel: NVRM: The NVIDIA probe routine was not called for 1 device(s). Aug 28 03:14:17 Tower kernel: NVRM: This can occur when a driver such as: Aug 28 03:14:17 Tower kernel: NVRM: nouveau, rivafb, nvidiafb or rivatv Aug 28 03:14:17 Tower kernel: NVRM: was loaded and obtained ownership of the NVIDIA device(s). Aug 28 03:14:17 Tower kernel: NVRM: Try unloading the conflicting kernel module (and/or Aug 28 03:14:17 Tower kernel: NVRM: reconfigure your kernel without the conflicting Aug 28 03:14:17 Tower kernel: NVRM: driver(s)), then try loading the NVIDIA kernel module Aug 28 03:14:17 Tower kernel: NVRM: again. Aug 28 03:14:17 Tower kernel: NVRM: The NVIDIA probe routine failed for 1 device(s). Aug 28 03:14:17 Tower kernel: NVRM: None of the NVIDIA devices were initialized. Aug 28 03:14:17 Tower kernel: nvidia-nvlink: Unregistered Nvlink Core, major device number 241
- August 28, 20223 yr
- 5918 replies
GPU acting weirdly after hardware changes, please help
GPU acting weirdly after hardware changes, please help

Andrea Nizzola posted a topic in VM Engine (KVM)

I made this post apready however I have some updates so I deleted the old one and added all the new info here. So, here's the problem. A couple of days ago I upgraded my motherboard and cpu, (got a i9 10850k on Asrock motherboard) after setting everything up unraid started properly, the vm started properly and everything was working properly as it should. Except that the VM kept freezing even for up to 2 minutes at a time, all the vm cores went to 100% and stayed there until it unfroze. I had this vm running fine for about a year so I'm sure that my 2080 doesn't have any issues getting passed to a VM. I did a quick reboot of the server hoping that it would fix the issue but it made a bigger one, now every time I try to start the VM it pauses after the TianoCore logo comes up. If I try to resume it it comes up saying: " internal error: unable to execute QEMU command 'cont': Resetting the Virtual Machine is required ". I checked the logs and got the "qemu-system-x86_64: vfio_err_notifier_handler" warning with the GPU id and it's other components. I tried removing the GPU and using VNC which worked perfectly so I'm sure that the problem is related to the GPU passthrough, I made a different VM but didn't solve the issue. I really don't understand how it happened as it was working kind of fine and stopped working just after the Unraid reboot. It's been a couple of days since then, I have been thinkering and found out that the gpu works if I stop the vm, remove any of the gpu components, start the vm, stop the vm, add the componebt back. Well, it works until I reboot the server, which is a problem as I turn it off everynight. I'm guessing that this means that the gpu and drivers work fine, all the hardware is ok, the gpu is not being used by any other service. The problem is just that the server doesn't load the vm properly when it boots. If someone has any idea on how to fix this please let me know. Thank you
- January 12, 20224 yr
GPU not passing through on new hardware
GPU not passing through on new hardware

Andrea Nizzola replied to Andrea Nizzola's topic in VM Engine (KVM)

So, I got some more news, I was thinkering with it for a while and I figured out that if I remove the gpu usb component and put it back in it starts working properly until I reboot the server, once the server is rebooted it doesn't work, however if I unplug it and replug it it starts behaving properly. Hopefully someone can help with this
- January 12, 20224 yr
- 2 replies
GPU not passing through on new hardware
GPU not passing through on new hardware

Andrea Nizzola replied to Andrea Nizzola's topic in VM Engine (KVM)

So, I tried redumping the vbios but nothing changed, I tried resetting the motherboard bios but that also didn't fix it. can someone at least tell me that this is, I understand that it's an issue with the GPU but I don't know exactly what it means
- January 12, 20224 yr
- 2 replies
GPU not passing through on new hardware
GPU not passing through on new hardware

Andrea Nizzola posted a topic in VM Engine (KVM)

So, here's the problem. A couple of days ago I upgraded my motherboard and cpu, (got a i9 10850k on Asrock motherboard) after setting everything up unraid started properly, the vm started properly and everything was working properly as it should. Except that the VM kept freezing even for up to 2 minutes at a time, all the vm cores went to 100% and stayed there until it unfroze. I had this vm running fine for about a year so I'm sure that my 2080 doesn't have any issues getting passed to a VM. I did a quick reboot of the server hoping that it would fix the issue but it made a bigger one, now every time I try to start the VM it pauses after the TianoCore logo comes up. If I try to resume it it comes up saying: " internal error: unable to execute QEMU command 'cont': Resetting the Virtual Machine is required ". I checked the logs and got the "qemu-system-x86_64: vfio_err_notifier_handler" warning with the GPU id and it's other components. I tried removing the GPU and using VNC which worked perfectly so I'm sure that the problem is related to the GPU passthrough, I made a different VM but didn't solve the issue. I really don't understand how it happened as it was working kind of fine and stopped working just after the Unraid reboot. If someone has any idea on how to fix this please let me know. Thank you
- January 12, 20224 yr
- 2 replies

Andrea Nizzola

Joined

Last visited

Everything posted by Andrea Nizzola

[Plugin] Nvidia-Driver

[Plugin] Nvidia-Driver

GPU acting weirdly after hardware changes, please help

GPU not passing through on new hardware

GPU not passing through on new hardware

GPU not passing through on new hardware

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)