• Posts

  • Joined

  • Last visited

Everything posted by mikeg_321

  1. That's right. My steps to install GPU driver were: Boot into Windows with just VNC Video Driver enabled- set the boot mode to Safe Mode (Like the screenshot in my earlier post) Shutdown and reconfig with the NVidia GPU passed through now and remove VNC config Boot up and you should be into Safe Mode with the GPU displaying things but with a basic display adapter driver Note the device instance of GPU/HDMI sound and enable MSI for both. Uncheck Safe Mode booting Reboot and cross your fingers. As long as the Device instance didn't change you should be up and running. Yeah, with a fresh install and MSI enabled on 6.8.3 with any luck it will stay enabled when you boot up on 6.9+. If not though try the above as it seems the device instance updates and/or MSI disables based on just unRAID changing version. Probably new HyperVisor triggers something in Windows looking like new hardware.
  2. I know hard to believe right! but Correct. Fully functional, Unigine Benchmark numbers are real good. Same or better than 6.8.3 VMs unRAID 6.10RC2 (Suspect 6.9.2 will work) Looks like I settled on Q35(5.1) and OVMF-TPM. Suspect newer version will work too. Was a recent fresh install - few days old with a lot of trial and error miles on it. So not virgin (I was able to enable MSI on it taking it from dead to working in obvious fashion) Nvidia driver is the Studio series (not Game Ready): (I don't believe the studio thing matters, it was just something I had tried before that didn't work out - i.e. it was still BSOD on that version before enabling MSI's) Version:511.09 Release Date:2022.1.4 Operating System:Windows 10 64-bit, Windows 11 Attaching my VM config so you can see others things I have in there. It does have a couple of the KVM hiding options in there (the stuff from one of your earlier posts). <kvm> <hidden state='on'/> </kvm>) Also passing in my GPU BIOS but that may or may not make a difference. All my VMs on 6.8.3 don't need the BIOS to run but I pass in anyhow. NOTE: the VM config file is from when I had VNC and GPU enabled - 2 video cards essentially. After it was working I just deleted the video lines pertaining to VLC and saved the config. Booted up fine and ran benchmarks like that. Agreed. It was needed just for audio in past. This is the part that confuses and slightly worries me. I thought mine were MSI enabled too when I went from 6.8 to 6.9. So either they were not or the upgrade itself triggered the Nvidia driver to disable it or there's more here than I thought. (Hope not). Acid test will be when I migrate my main server past 6.8. I think I'll go to 6.10RC2 in the next little bit here. A tool to handle this would be great. Just not sure how it'll work when you need to install the driver first then run the tool... but the install of the driver crashes the machine. I could only work around that by going to safe mode but sounds like you have some ideas here which would be fantastic! Maybe just run the script from Safe mode or early in the boot process before the GPU inits fully. Good luck and let me know if you need any other info. I'll keep an eye on this too and post back once I get my main system past 6.8 too. Workig_VM_Config w MSI ON.txt
  3. To enable MSI's we'll need to: Boot into the VM with only VNC graphics enabled Enable Safe Mode via MSCONFIG (use Search and type in MSCONFIG and click or run the program "System Configuration." Go to the Boot tab and select safe boot and I always enable network but is not necessarily needed for this. Press OK and power-down the VM Now within the unRAID VM interface - change graphics adapter and assign your Nvidia GPU (passed through to the VM as typically done for 6.8 and prior.) Boot back into the VM (The GPU should display Windows content to your display and be usable, but is not using the NVIDIA driver so should not crash. This is important because we need the GPU ID. Go to Device Manager --> Display Adapters -->GPU Properties. Click on Details and change drop-down to Device Instance Path Copy or note down the entire address as you'll need to locate that in the Registry. Now open up Regedit (Run:regedit) Find the device instance under: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Enum\PCI\<your device instance info> (screenshot Green boxes) Add 2 new keys and 1 Dword as per the screenshot - red boxes. new key: Interrupt Management new key: MessageSignaledInterruptProperties new Dword: MSISupported Set the DWORD value MSISupported to 1 for enabled: Close Regedit Now go back into MSCONFIG and disable Safe mode (reverse of enabling). Reboot and if all went well the GPU will function as expected. Reference to this post on enabling MSI's (It has more details and is where I heard of MSI's a while back) - note there is also a utility to enable MSI's but it doesn't seem to work in Safe Mode so the manual implementation is needed in this case.
  4. At last, some success and a viable work-around to this issue I think! Enable MSI's for the GPU. I was helping someone on this in another thread for an audio issue and it dawned on me that the audio portion of the GPU won't work well without MSI's enabled, maybe the GPU needs it now too. It's a bit of a chicken and egg scenario though... The "fix" needs to be implemented in the Windows registry with the GPU passed through and working, but the Nvidia driver install or any update to the GPU will undo the fix and cause a BSOD. We'll now always need to force the GPU to use Message Signalled Interrupts (MSI's). Something in the newer kernel or Hypervisor or both has made this a requirement now for some setups. Although I still think this is likely not biting everyone. It must also depend on your motherboard/CPU and interrupts in use etc I guess. To close out this painful experience. Here is a screenshot of the actual error that windows throws when the the GPU is initialised with Line Based Interrupts: That is also a clue as it indicates that there is something timing out related to nvlddmkm.sys (Nvidia driver). Windows waits and then times out waiting for this process and throws an error is how I read this (Video TDR Failure). Why Nvidia doesn't enable MSI's by default I don't know. If they did this would not be a problem for us and audio pass-through would also work better. (edited: or maybe this is the part that makes this hardware dependant - perhaps MSI is enabled by default on newer motherboards) This is a bit tricky like I said. I'll post how I did it below this shortly but involves using Safe Mode. I have had my VM running now for 2 hours doing benchmarks and surviving multiple reboots so I think this is the solution we need but I have yet to implement on my main unRAID server so hopefully not jumping the gun here. It won't survive driver updates so the process will likely need to be re-done after that. I suspect anything that slightly changes the address the GPU is referenced to in Windows will revert things back to a boot loop as I think the driver undoes the MSI changes when it installs or updates. It does for the audio part for sure based on my experiences.
  5. I tried a bunch more stuff last night including a change to UEFI and still nothing (Although was on 6.10RC2). I believe this is kernel/Nvidia driver interaction level stuff but must also have something to do with specific hardware. If not somewhat hardware dependent I would expect that more people would be saying me too on this thread. It feels like there must be a majority of folks on 6.9x plus with working Nvidia VM's in win 10. Maybe we need to start a poll or something... By the way, I spun up an Ubuntu VM and no issues at all. I have one more thing to try and then am also going to have to give up for a bit.
  6. Hi, It sounds like maybe you might need to enable MSI's for the Nvidia card. (MSI=Message Signalled Interrupts) I think it's a fairly common issue. This happens to me if I don't enable MSI on my VM's with my 1060/2070 cards. https://forums.unraid.net/topic/76035-help-struggling-with-nvidia-audio-passthrough/?tab=comments#comment-1076667. Not positive that is your issue, but something to at least check on. I'm afraid I am not sure how you would enable that on LibreELEC easily. It can be enabled in Linux and I believe the how-to for that is here. Sounds like you would need to tweak the image with a config file addition.
  7. @Hetimop. Sorry to hear you are having trouble too. What Motherboard are you using? I have this theory that the issue is linked to certain Motherboards maybe but would like to dis-prove that. Also thinking it's something in newer Kernels that disagrees with KVM/Nvidia and another factor (Maybe Motherboard or BIOS or the like...) I wonder if one of the unRAID 6.9 RC versions had a Kernel less than 5.4.34. I would like to try that just for fun if a person could find an RC release somewhere. Anyone know where I could get an old 6.9 release candidate to test with? @Brydezen. What did you want me to try out? The first part hiding the KVM stuff? or the 2nd part patching the driver? I have done the KVM hiding stuff already and didn't work 100% like you are seeing. That last part (Patching the driver) guide you posted is very outdated and I'm afraid would not even work any longer. Someone on the last pages basically stated that. What Nvidia driver version did you use where you are seeing the code 43? When I have a chance hopefully this week I'll run through the first guide (hiding vm with commands) part again and see if any luck. I'm also going to try the Nvidia studio driver instead. I used that in a Win11 VM recently and it fixed some other unrelated issues.
  8. I hope you are on to something! I was headed down that road at one point too but stopped. I was under the impression that with newer Nvidia drivers that VM's should no longer be an issue with error 43. Perhaps it's still an issue though despite them allowing VM's. Worth a try for sure. Let me know if you want me to test or check anything on my system. NVIDIA enables GeForce GPU passthrough for Windows virtual machines
  9. I feel your pain. I get the exact same scenarios too... I think sometimes if you make a change big enough to the VM config the GPU hardware info changes slightly and Windows notices that and the Nvidia driver doesn't load up fully/right away. It then sorts itself out a bit after Windows boots up and re-inits the GPU fully and it dies. That's why sometimes it appears to work. That's what I suspect anyhow. I popped the cork thinking it was fixed a couple times and then came back to a black screen and boot loop... I tried some older Nvidia drivers (About 2 years old) too and no luck.
  10. Well, I spun up a 2nd server (very similar Asrock Mother board as my main server EP2C602 based with 4 LAN ports). I tried fresh UnRaid 6.10RC2 and 6.9.0 installs. No tweaks, no Dockers or apps or special settings. Same results in VM's where they crash and reboot endlessly as OP described - error is Stop Code:Video TDR failure nvlddmkm.sys or dxgkrnl.sys. Seems to alternate maybe or could be 2 error screens after each other. (Can only see this if the Nvidia card is a 2nd graphics card and the primary is VLC.) I've tried all I can find on the forums to resolve with no luck at all like Multifunction=on and with/without Hyper-V and what feels like about 100 other different little tweaks. Nothing works.. I even tried a multitude of different BIOS settings and Vid card in different PCIe slots. The only thing I haven't done is get a SeaBios system running. I've tried but it just won't work (Black screens.. video card gets disabled by Windows). So I'm pretty much done as there's nothing more I can think to try. It would be helpful if there was some error in a log to see, but I'm not really sure where to look for a smoking gun type error that might shed more light. Diags from my test server taken after Windows VM crash attached for reference and hopeful for some help to narrow in on things if anyone has the skills to direct me. 🙂 I'm stuck on 6.8.3 for now which sucks as I also need the Radeon reset patch which is super simple to install on 6.10... I'd be happy for any suggestions on digging more into what might be causing the VM/Nvidia driver crash issue. Just not sure where to turn next. My gut tells me this is some mismatch with this M/B and the hypervisor or 5.x kernel as that is new as of UnRaid 6.9+ I believe. (was kernel 4.x before). tower-test 6_9_0-diagnostics-20211229-1754.zip
  11. I definitely did some customisation outside the UI way back on 6.3 ish versions, but I think I've taken most of those out (as long as I didn't forget as well). I mostly was stubbing devices and had to put in something else I think to split out IOMMU grouping. Newer UnRaid versions didn't need those options so I think I removed. I'll have to go and double check as maybe there is something in there but I kind of don't think so... but worth checking. Another reason for trying safe mode if that will work. It removes the bootloader/Grub kernel options I believe. - edit (Just remembered I rolled back to 6.8 so I guess I can't really check what I had as a 6.10 config.) I'm pretty much out of options here too... I spent at least 2 days solid trying various things and am stumped. I was thinking it must be our BIOS or something deep down that we can't fix that is not agreeing with the latest hypervisor code that was updated in the 6.9+ UnRaid version. Was going to try a different Bios but noticed you are on a newer bios than me so stopped on that path. I'm not sure how to get further help but suspect this is more a Hypervisor issue vs UnRaid problem... Would be nice if there was a way to get more help here though as I'm sure a few others must be impacted. I searched about 10-15 pages manually in the KVM topic 2-3 weeks back and found a few posts that are like ours. I think I was another thread on here where a guy used Seabios and had some luck, but was very vague so tough to tell if it was the same root cause/"fix". Even if that works it's not really a solution long term as I agree, OVMF seems to be the way to go from what I have seen... but it would be useful to know and may narrow things down. @Dythnire2022. What mother board and CPU are you using? If same or similar to myself/Brydezen that would help us narrow this down one way or another.
  12. Just FYI I'm on Legacy boot and have always been that way. (Click on Flash under Main and I have Server Boot Mode: Legacy & Permit UEFI boot mode unchecked. Have either of you tried a Seabios VM? That was going to be my next move and maybe booting in safe mode to eliminate docker conflicts (although that feels unlikely). I think you can boot a VM in safe mode, but not sure. I like your idea of a clean install of UnRaid. Attached my Diags for reference (Current working setup on 6.3). Will let you know if I get anywhere on this too. Just not sure when I'll have time to go back to 6.10 and test. tower-diagnostics-20211229-1020.zip
  13. I don't believe it's easy to disable Hyper-V once the template is created with it enabled first. You could try making just a new template with Hyper-V off and then pointing to the old V-Disk. I think that works, but you may want to search on that a bit more to be sure. FWIW my understanding is that very recently Nvidia stopped killing VM's with their driver when detected which I think is what the Code 43 was driven by. Your issue may be something else if you're using a very recent Nvidia driver in the VM - link to Nvidia announcement
  14. I believe there is an UnRaid app that can update the UnRaid kernel to avoid the reset issues as well. 6.9 and 6.10 have it available albeit in different flavors from what I recall. Unfortunately due to another issue I had to roll back so can't search the apps to find the exact name for you. If the above works then great maybe you don't need this. You might also try searching for Radeon or AMD in the UnRaid apps tab and you should see something come up showing an AMD log that talks about fixing the reset issue.
  15. Ok, thanks for the update. Appreciated. I am back on 6.8 now... I've wound up with a very stable WIn 10 environment after days of tweaking and reading. Happy to compare notes later if you like. Think I'll just keep things stable and enjoy Xmas and keep trying to figure this out on 6.10 when I have time. Yeah, I was hoping 6.10 would have resolved this. I'll post any new news too. I was doing the same as you in the past. 2 Gaming rigs by day and miners by night. I stopped mining a while ago but might have to check that out again.
  16. Hi, Did you manage to get this resolved? Believe I am having the same issue and also on an EP2C602. I am searching through the forums and have noticed a couple others with similar problems when they went to 6.9 and are on the same MB. I thought I tried a Seabios install but maybe not. Perhaps I'll try that next as it seems like maybe that worked for you? Also posted here and here. Do those sound like what you had or have as an issue?
  17. Hi All, I am late to the party but tried to go to both 6.9 and 6.10. recently from 6.8 As soon as any Windows VM with Nvidia GPU passed in initialises it crashes. (nvlddmkm.sys video TDR failure) I think I have this same issue as OP and am looking for a solution. It sounds like this thread diverged to a different issue with KVM crashing entirely vs just the VM's or am I reading it wrong and there is a solution for the VM/Nvidia issue? This same issue is posted here too where I also posted. Looking at that diagnostics I noticed I have a very similar CPU and same motherboard as OP so can't help but think that maybe that has something to do with this perhaps - i.e. hardware related issue. I have searched a lot on this problem and only found a few mentions of this problem. If it was a widespread 6.9/10 issue I think it would have more attention so again seems more specific to hardware or maybe some Sw conflict with a add-on or something. Anyhow, if this is solved would you guys elaborate further on the 'fix' and if not what is the best course of action to get more help? I was going to start a new thread and post all my info but if this is fixed want to skip that. Thanks
  18. Brydezen, Did you ever get this figured out? I just tried to go from 6.8.2 to 6.9x and having what appears to be the same thing as you described. I've tried a million things like fresh VM windows install, passing through bios and various settings and nothing fixes for me. I am on a legacy boot too. Same thing happens on 6.10RC2 too, I've gone back to 6.8.3 now. When I have more time I am going to go back to 6.9 and try posting a problem report with diagnostics etc and tackle further. My AMD cards work fine. 2 Nvidia ones don't RTX 2070 and a 1060. Only thing I have not done that I can think might have an impact is to remove all add-ons/plug-ins in case something conflixt but I kind of doubt it. I even ran with a VNC video and a 2nd vid card (Nvidia passed through). You can actually see Windows halt with an error once the Nvidia drivers kick in and reboot after. It kind of looks like we have some hardware in common too based on your diagnostics. I'm running dual Xeon's on an Asrock EP2C602. Are you by chance on that board or similar one too?
  19. I just want to say thank you for posting this information. Setting my Gigabyte RX480 video and audio to the same bus resolved the issue where the sound device was missing in my Mac VM's.
  20. regarding # sudo -u www-data /var/lib/zmeventnotification/bin/zm_event_start.sh 1 1 Traceback (most recent call last): File "/var/lib/zmeventnotification/bin/zm_detect.py", line 27, in <module> import zmes_hook_helpers.utils as utils ModuleNotFoundError: No module named 'zmes_hook_helpers' Does anyone happen to know or have the info that was posted at the github link above? The repo is archived and link to issue gone. I'm just looking at getting this going after installing quite some time ago with a very brief success before an update broke thingst. Looks like I have the above issue and am hoping to get the version of the unRaid docker I have installed working for now until I decide a new path forward. I'm concerned upgrading now will just cause more issue for me even though I'll be a bit outdated. That said if I upgrade to the latest deprecated full docker with hook support, does it actually work with people/object detection or is it broken? I don't necessarily need GPU support for now. It's a bit tough to tell from the recent comments what state it's in for issues. and thanks Dlandon for the efforts you have put in, it sounds like a bear to manage. I wish I had worked on this sooner... I'd like to think I would have been a supporter if I could have got this working and still will if you decide to go to some pay model. Easy to say after the fact I know... Heck, even if I can get the deprecated one working I'd be happy to contribute. I can't tell you how much time I have put in previously on getting basic zoneminder working on a VM let alone the object ML stuff which I never touched. I really think you have something here with the docker/hook processing when it works and was so looking forward to that ability/option.
  21. OK, yeah this seems to be somehow tied to RC1 and RC2 for me. I switched back to 6.8.3 and my Windows VM's are showing minimal CPU usage now and are much more responsive. I had only switched to RC1 as I needed a driver in that kernel version for a new PCIe card I was trying out. Reviewing the forum I saw at least two other posts similar to this (for later Betas) so I don't think we're the only ones with the issue. I'm going to have to stay on 6.8.3 for a while now as I don't have time to troubleshoot further. I'm happy to try and provide specific info on request as I can though if I can help. I would say whatever the cause is of this it's more than an annoyance though and pretty much makes using win 10 VMs' (maybe others) unusable on my system. Guess the next step when I have time is to try RC2 or 3 when it's ready in safe mode and see if it persists to rule out addons / dockers? - although I did test with 0 dockers and 1 VM spun up and the issues persisted.
  22. I am seeing what you have reported as well but am unsure if it is something really new or if I just didn't notice in previous versions. I read on the forums a similar report in past unRaid versions. I am going to attempt to roll back to the last stable release and check. Link to fourum post. The fix linked in there did not solve for me. In addition I am seeing my Windows VM's maxing all passed in CPU's and becoming unresponsive for minutes at a time. Further troubleshooting needed on my end which I hope to accomplish via the roll-back. I'll try and remember to post back if I find anything interesting.
  23. +1 I have a couple use cases like this. For now I pass through an entire USB card to VM's to circumvent but seems a bit of a physical resource waste and I'm running out of ports due to this.
  24. I have managed to now get GPU support mostly working by following the first post instructions as well. That said, I noticed by running watch nvidia-smi on my unRaid box that you can see the GPU working when using YOLO, but not when running face recognition. Again curious as to whether anyone else sees this or if I have done something incorrect. Some googling lead me to this link: Get CUDA working... I believe the DLIB section is pertinent, but not 100% sure. When I do ">>> dlib.DLIB_USE_CUDA" in Python/Docker it comes back False and maybe should be True. Any comments as to getting face recognition to work via GPU?
  25. I just installed this docker a couple days ago and am learning a bit as I go here. Long time zoneminder user, but noob with dockers. By following the first post and related links, to my surprise, I already have Object detection functioning and am working on face detection, but so far no luck. CUDA stuff will be next and I have set all the Docker ENV variables to yes to install so YOLO, CUDA, face detection software etc, in theory, should all be there. To help in my troubleshooting I just want to ask if anyone can confirm they have face detection actually working on the latest docker version. Does anyone have it working right now? That will help me narrow down to just config or if possibly some file or libraries etc could be missing or something in the actual docker container too. So far I see docker logs of the object detection running on an event but when I would expect the face detection to kick in there's just nothing/blank in the logs. Not even an error to go off so far. -EDIT- Did a force update/rebuild of the docker and face detection is now functional.