Andrea Nizzola

Members
  • Posts

    8
  • Joined

  • Last visited

Everything posted by Andrea Nizzola

  1. Thank you for the reply, here is the diagnostic tower-diagnostics-20220828-1815.zip
  2. Hi all, a few months ago the transcoding gpu has stopped working properly, after about 6 months since I set up the transcoding the Nvidia driver stopped recognizing it. I'm not getting the error message: " NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running. " I have tried everything, turning docker on and off, changing bios settings, reinstalling the plugin, testing a bunch of different driver versions and even downgrading unraid build but nothing seem to fix it. Unraid is able to see the device connected as the Quadro P1000 (which is correct) and even booting directly into windows or passing it through works fine. I can benchmark it without any issues. Just to make sure that the gpu wasn't an issue I also tried a different one and it behaved in exactly the same way. I'm going to leave attached a part of the logs while the machine was turning on, it's clear that the driver is trying to communicate with the GPU however the connection doesn't work. I'm not sure what else to try at this point, please give me a hand. Thanks. Aug 28 03:13:25 Tower kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 241 Aug 28 03:13:25 Tower kernel: NVRM: This PCI I/O region assigned to your NVIDIA device is invalid: Aug 28 03:13:25 Tower kernel: NVRM: BAR0 is 0M @ 0x0 (PCI:0000:02:00.0) Aug 28 03:13:25 Tower kernel: nvidia: probe of 0000:02:00.0 failed with error -1 Aug 28 03:13:25 Tower kernel: NVRM: The NVIDIA probe routine was not called for 1 device(s). Aug 28 03:13:25 Tower kernel: NVRM: This can occur when a driver such as: Aug 28 03:13:25 Tower kernel: NVRM: nouveau, rivafb, nvidiafb or rivatv Aug 28 03:13:25 Tower kernel: NVRM: was loaded and obtained ownership of the NVIDIA device(s). Aug 28 03:13:25 Tower kernel: NVRM: Try unloading the conflicting kernel module (and/or Aug 28 03:13:25 Tower kernel: NVRM: reconfigure your kernel without the conflicting Aug 28 03:13:25 Tower kernel: NVRM: driver(s)), then try loading the NVIDIA kernel module Aug 28 03:13:25 Tower kernel: NVRM: again. Aug 28 03:13:25 Tower kernel: NVRM: The NVIDIA probe routine failed for 1 device(s). Aug 28 03:13:25 Tower kernel: NVRM: None of the NVIDIA devices were initialized. Aug 28 03:13:25 Tower kernel: nvidia-nvlink: Unregistered Nvlink Core, major device number 241 Aug 28 03:13:25 Tower kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 241 Aug 28 03:13:25 Tower kernel: NVRM: This PCI I/O region assigned to your NVIDIA device is invalid: Aug 28 03:13:25 Tower kernel: NVRM: BAR0 is 0M @ 0x0 (PCI:0000:02:00.0) Aug 28 03:13:25 Tower kernel: nvidia: probe of 0000:02:00.0 failed with error -1 Aug 28 03:13:25 Tower kernel: NVRM: The NVIDIA probe routine was not called for 1 device(s). Aug 28 03:13:25 Tower kernel: NVRM: This can occur when a driver such as: Aug 28 03:13:25 Tower kernel: NVRM: nouveau, rivafb, nvidiafb or rivatv Aug 28 03:13:25 Tower kernel: NVRM: was loaded and obtained ownership of the NVIDIA device(s). Aug 28 03:13:25 Tower kernel: NVRM: Try unloading the conflicting kernel module (and/or Aug 28 03:13:25 Tower kernel: NVRM: reconfigure your kernel without the conflicting Aug 28 03:13:25 Tower kernel: NVRM: driver(s)), then try loading the NVIDIA kernel module Aug 28 03:13:25 Tower kernel: NVRM: again. Aug 28 03:13:25 Tower kernel: NVRM: The NVIDIA probe routine failed for 1 device(s). Aug 28 03:13:25 Tower kernel: NVRM: None of the NVIDIA devices were initialized. Aug 28 03:13:25 Tower kernel: nvidia-nvlink: Unregistered Nvlink Core, major device number 241 Aug 28 03:14:11 Tower webGUI: Successful login user root from 192.168.1.101 Aug 28 03:14:17 Tower kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 241 Aug 28 03:14:17 Tower kernel: NVRM: This PCI I/O region assigned to your NVIDIA device is invalid: Aug 28 03:14:17 Tower kernel: NVRM: BAR0 is 0M @ 0x0 (PCI:0000:02:00.0) Aug 28 03:14:17 Tower kernel: nvidia: probe of 0000:02:00.0 failed with error -1 Aug 28 03:14:17 Tower kernel: NVRM: The NVIDIA probe routine was not called for 1 device(s). Aug 28 03:14:17 Tower kernel: NVRM: This can occur when a driver such as: Aug 28 03:14:17 Tower kernel: NVRM: nouveau, rivafb, nvidiafb or rivatv Aug 28 03:14:17 Tower kernel: NVRM: was loaded and obtained ownership of the NVIDIA device(s). Aug 28 03:14:17 Tower kernel: NVRM: Try unloading the conflicting kernel module (and/or Aug 28 03:14:17 Tower kernel: NVRM: reconfigure your kernel without the conflicting Aug 28 03:14:17 Tower kernel: NVRM: driver(s)), then try loading the NVIDIA kernel module Aug 28 03:14:17 Tower kernel: NVRM: again. Aug 28 03:14:17 Tower kernel: NVRM: The NVIDIA probe routine failed for 1 device(s). Aug 28 03:14:17 Tower kernel: NVRM: None of the NVIDIA devices were initialized. Aug 28 03:14:17 Tower kernel: nvidia-nvlink: Unregistered Nvlink Core, major device number 241 Aug 28 03:14:17 Tower kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 241 Aug 28 03:14:17 Tower kernel: NVRM: This PCI I/O region assigned to your NVIDIA device is invalid: Aug 28 03:14:17 Tower kernel: NVRM: BAR0 is 0M @ 0x0 (PCI:0000:02:00.0) Aug 28 03:14:17 Tower kernel: nvidia: probe of 0000:02:00.0 failed with error -1 Aug 28 03:14:17 Tower kernel: NVRM: The NVIDIA probe routine was not called for 1 device(s). Aug 28 03:14:17 Tower kernel: NVRM: This can occur when a driver such as: Aug 28 03:14:17 Tower kernel: NVRM: nouveau, rivafb, nvidiafb or rivatv Aug 28 03:14:17 Tower kernel: NVRM: was loaded and obtained ownership of the NVIDIA device(s). Aug 28 03:14:17 Tower kernel: NVRM: Try unloading the conflicting kernel module (and/or Aug 28 03:14:17 Tower kernel: NVRM: reconfigure your kernel without the conflicting Aug 28 03:14:17 Tower kernel: NVRM: driver(s)), then try loading the NVIDIA kernel module Aug 28 03:14:17 Tower kernel: NVRM: again. Aug 28 03:14:17 Tower kernel: NVRM: The NVIDIA probe routine failed for 1 device(s). Aug 28 03:14:17 Tower kernel: NVRM: None of the NVIDIA devices were initialized. Aug 28 03:14:17 Tower kernel: nvidia-nvlink: Unregistered Nvlink Core, major device number 241
  3. I made this post apready however I have some updates so I deleted the old one and added all the new info here. So, here's the problem. A couple of days ago I upgraded my motherboard and cpu, (got a i9 10850k on Asrock motherboard) after setting everything up unraid started properly, the vm started properly and everything was working properly as it should. Except that the VM kept freezing even for up to 2 minutes at a time, all the vm cores went to 100% and stayed there until it unfroze. I had this vm running fine for about a year so I'm sure that my 2080 doesn't have any issues getting passed to a VM. I did a quick reboot of the server hoping that it would fix the issue but it made a bigger one, now every time I try to start the VM it pauses after the TianoCore logo comes up. If I try to resume it it comes up saying: " internal error: unable to execute QEMU command 'cont': Resetting the Virtual Machine is required ". I checked the logs and got the "qemu-system-x86_64: vfio_err_notifier_handler" warning with the GPU id and it's other components. I tried removing the GPU and using VNC which worked perfectly so I'm sure that the problem is related to the GPU passthrough, I made a different VM but didn't solve the issue. I really don't understand how it happened as it was working kind of fine and stopped working just after the Unraid reboot. It's been a couple of days since then, I have been thinkering and found out that the gpu works if I stop the vm, remove any of the gpu components, start the vm, stop the vm, add the componebt back. Well, it works until I reboot the server, which is a problem as I turn it off everynight. I'm guessing that this means that the gpu and drivers work fine, all the hardware is ok, the gpu is not being used by any other service. The problem is just that the server doesn't load the vm properly when it boots. If someone has any idea on how to fix this please let me know. Thank you
  4. So, I got some more news, I was thinkering with it for a while and I figured out that if I remove the gpu usb component and put it back in it starts working properly until I reboot the server, once the server is rebooted it doesn't work, however if I unplug it and replug it it starts behaving properly. Hopefully someone can help with this
  5. So, I tried redumping the vbios but nothing changed, I tried resetting the motherboard bios but that also didn't fix it. can someone at least tell me that this is, I understand that it's an issue with the GPU but I don't know exactly what it means
  6. So, here's the problem. A couple of days ago I upgraded my motherboard and cpu, (got a i9 10850k on Asrock motherboard) after setting everything up unraid started properly, the vm started properly and everything was working properly as it should. Except that the VM kept freezing even for up to 2 minutes at a time, all the vm cores went to 100% and stayed there until it unfroze. I had this vm running fine for about a year so I'm sure that my 2080 doesn't have any issues getting passed to a VM. I did a quick reboot of the server hoping that it would fix the issue but it made a bigger one, now every time I try to start the VM it pauses after the TianoCore logo comes up. If I try to resume it it comes up saying: " internal error: unable to execute QEMU command 'cont': Resetting the Virtual Machine is required ". I checked the logs and got the "qemu-system-x86_64: vfio_err_notifier_handler" warning with the GPU id and it's other components. I tried removing the GPU and using VNC which worked perfectly so I'm sure that the problem is related to the GPU passthrough, I made a different VM but didn't solve the issue. I really don't understand how it happened as it was working kind of fine and stopped working just after the Unraid reboot. If someone has any idea on how to fix this please let me know. Thank you
  7. I'm currently using Unraid 6.8.2. The system has just been installed and I'm having some issues with videos, not video output but video playing. I'm using the system with a Windows 10 VM as my daily driver and as a plex server. My CPU is an i5 9400f and I have 32GB of ram. The problem is that when I try to play videos they lag, for example in the VM on youtube everything at 1080p or over keeps lagging, it doesn't buffer, the audio stops for a few seconds while the video is still playing and then the video stops to let the audio catch up. (I'm sure it isn't my internet connection, all the other devices work fine and download speeds are fine too). All the other applications run perfectly fine, even intensive ones so the pass through is done properly. On the plex server everything at 4K doesn't even load, is says that the computer isn't powerful enough even if it used to work last month before I reinstalled it and it worked as soon as I installed it. Even with the VM off apparently it still isn't powerful enough. The few 4K videos that do work are super laggy even if it isn't transcoding. The funny thing is that the CPU and RAM never get maxed out, the most I've seen the cpu while streaming at 4k was 15% with VM off. I just don't understand, there is a serious bottleneck but it's a software problem, not hardware. I've asked on reddit a couple times but nobody was able to help, I really hope someone can help me solve this issue.
  8. I had the same problem with the same motherboard, the solution is to open the USB files, in there you should find a batch file called "make_bootable". Run that as administrator, it will ask you to press a button, enter works fine, after that if everything worked fine just put it back into the unraid machine and it should start right up.