Team_Dango

Members

Joined
September 14, 20205 yr
Last visited
May 16, 20251 yr

View Profile Find content

Noob

Current rank (1/14)

Posts

Find content

14
Reputation
Neutral

3

The recent visitors block is disabled and is not being shown to other users.

[Support] Joshndroid's Docker Repo Support Thread
[Support] Joshndroid's Docker Repo Support Thread

Team_Dango replied to Joshndroid's topic in Docker Containers

Just installed the Baikal docker and I am hoping for advice on getting it correctly routed through NPM. The official baikal installation docs have an example Nginx config and mention needing to keep the "Specific" directory from being accessible. Is this handled by the docker container (it appears to have nginx internally), or do I need to add additional configuration in NPM? Thank you!
- December 5, 20232 yr
- 193 replies
[Support] jj9987 - PostgreSQL
[Support] jj9987 - PostgreSQL

Team_Dango replied to jj9987's topic in Docker Containers

I have an instance of Postgres11 set up and working with one application. Now I am looking to install a second application (Joplin) that also needs its own PostgreSQL database. What is the best way to do this? Having two apps share one database feels like a bad idea. Can I create a second database within one instance of Postgres11, or do I need to install a second instance? Thanks!
- August 20, 20214 yr
- 117 replies
Windows VM GPU Losing Signal, Requires Server Reboot
Windows VM GPU Losing Signal, Requires Server Reboot

Team_Dango replied to Team_Dango's topic in VM Engine (KVM)

Thank you for the suggestion. I gave that a shot and it seemed to help. The VM did not crash for several hours. I even for a moment thought it may have been fixed. But eventually it crashed again same as before, much to my disappointment. After that initial success I was not able to achieve the same level of stability on subsequent reboots. I also tried adding both "video=vesafb:off" and "video=efifb:off" to the syslinux config, which is something I saw suggested a few places. This did not help at all. If anything it was less stable. I should perhaps mention that I already have one extra parameter in my syslinux config: "pcie_aspm=off" which solved an error I started getting after adding the LSI card. I do not know if it could somehow have anything to do with the other issue. I have by now also tried downgrading my motherboard BIOS. The latest is 4001 (what I was using) so I tried the two previous releases, 3901 and 3803. Neither caused any change in behavior that I could see. After those tests I reset back to 4001. I tried a fresh install of a new Windows 10 VM with the same GPU. This proved difficult as I was getting more errors. Apr 30 15:02:53 Tower kernel: vfio-pci 0000:01:00.0: vfio_ecap_init: hiding ecap 0x1e@0x258 Apr 30 15:02:53 Tower kernel: vfio-pci 0000:01:00.0: vfio_ecap_init: hiding ecap 0x19@0x900 Apr 30 15:02:53 Tower kernel: vfio-pci 0000:01:00.0: No more image in the PCI ROM Apr 30 15:02:53 Tower kernel: vfio-pci 0000:04:00.0: vfio_ecap_init: hiding ecap 0x19@0x168 Apr 30 15:02:53 Tower kernel: vfio-pci 0000:04:00.0: vfio_ecap_init: hiding ecap 0x1e@0x190 Device 01:00.0 is the GPU and 04:00.0 is the USB controller passed through to the VM to which I connect the keyboard and mouse. The errors made it so the keyboard and mouse were not recognized inside the VM, which made installing Windows impossible. I was able to work around this by passing though the keyboard and mouse directly as USB devices. This allowed me to get through the Windows installation, and after a few reboots I was able to pass through the USB controller without error. However after installing the graphics drivers and heaven benchmark the VM again crashed as soon as the benchmark started. I am again very much out of ideas. As always, any help would be very much appreciated. Thank you.
- April 30, 20215 yr
- 14 replies
Windows VM GPU Losing Signal, Requires Server Reboot
Windows VM GPU Losing Signal, Requires Server Reboot

Team_Dango replied to Team_Dango's topic in VM Engine (KVM)

Update: I found the relevant messages in the logs when a crash happens: Apr 27 21:59:49 Tower kernel: vfio-pci 0000:02:00.0: vfio_bar_restore: reset recovery - restoring BARs ... Apr 27 21:59:52 Tower kernel: vfio-pci 0000:02:00.0: vfio_bar_restore: reset recovery - restoring BARs Apr 27 21:59:53 Tower kernel: vfio-pci 0000:02:00.0: timed out waiting for pending transaction; performing function level reset anyway Apr 27 21:59:54 Tower kernel: vfio-pci 0000:02:00.0: not ready 1023ms after FLR; waiting Apr 27 21:59:55 Tower kernel: vfio-pci 0000:02:00.0: not ready 2047ms after FLR; waiting Apr 27 21:59:57 Tower kernel: vfio-pci 0000:02:00.0: not ready 4095ms after FLR; waiting Apr 27 22:00:01 Tower kernel: vfio-pci 0000:02:00.0: not ready 8191ms after FLR; waiting Apr 27 22:00:10 Tower kernel: vfio-pci 0000:02:00.0: not ready 16383ms after FLR; waiting Apr 27 22:00:27 Tower kernel: vfio-pci 0000:02:00.0: not ready 32767ms after FLR; waiting Apr 27 22:01:01 Tower kernel: vfio-pci 0000:02:00.0: not ready 65535ms after FLR; giving up Apr 27 22:01:01 Tower kernel: vfio-pci 0000:02:00.0: vfio_bar_restore: reset recovery - restoring BARs Apr 27 22:01:01 Tower kernel: vfio-pci 0000:02:00.0: can't change power state from D0 to D3hot (config space inaccessible) Apr 27 22:01:01 Tower kernel: vfio-pci 0000:02:00.0: vfio_bar_restore: reset recovery - restoring BARs Apr 27 22:01:09 Tower kernel: vfio-pci 0000:02:00.0: vfio_bar_restore: reset recovery - restoring BARs Apr 27 22:01:09 Tower kernel: vfio-pci 0000:02:00.0: can't change power state from D0 to D3hot (config space inaccessible) Apr 27 22:01:09 Tower kernel: vfio-pci 0000:02:00.0: vfio_bar_restore: reset recovery - restoring BARs Apr 27 22:01:22 Tower kernel: vfio-pci 0000:02:00.3: vfio_bar_restore: reset recovery - restoring BARs Apr 27 22:01:22 Tower kernel: vfio-pci 0000:02:00.0: vfio_bar_restore: reset recovery - restoring BARs Apr 27 22:01:22 Tower kernel: vfio-pci 0000:02:00.1: vfio_bar_restore: reset recovery - restoring BARs Apr 27 22:01:22 Tower kernel: vfio-pci 0000:02:00.2: vfio_bar_restore: reset recovery - restoring BARs Apr 27 22:01:24 Tower kernel: vfio-pci 0000:02:00.3: vfio_bar_restore: reset recovery - restoring BARs Apr 27 22:01:24 Tower kernel: vfio-pci 0000:02:00.2: vfio_bar_restore: reset recovery - restoring BARs Apr 27 22:01:24 Tower kernel: vfio-pci 0000:02:00.1: vfio_bar_restore: reset recovery - restoring BARs Apr 27 22:01:24 Tower kernel: vfio-pci 0000:02:00.0: vfio_bar_restore: reset recovery - restoring BARs Apr 27 22:01:25 Tower kernel: vfio-pci 0000:02:00.0: timed out waiting for pending transaction; performing function level reset anyway Apr 27 22:01:26 Tower kernel: vfio-pci 0000:02:00.0: not ready 1023ms after FLR; waiting Apr 27 22:01:27 Tower kernel: vfio-pci 0000:02:00.0: not ready 2047ms after FLR; waiting Apr 27 22:01:29 Tower kernel: vfio-pci 0000:02:00.0: not ready 4095ms after FLR; waiting Apr 27 22:01:34 Tower kernel: vfio-pci 0000:02:00.0: not ready 8191ms after FLR; waiting Apr 27 22:01:42 Tower kernel: vfio-pci 0000:02:00.0: not ready 16383ms after FLR; waiting Apr 27 22:01:59 Tower kernel: vfio-pci 0000:02:00.0: not ready 32767ms after FLR; waiting Apr 27 22:02:35 Tower kernel: vfio-pci 0000:02:00.0: not ready 65535ms after FLR; giving up Apr 27 22:02:40 Tower kernel: vfio-pci 0000:02:00.2: vfio_bar_restore: reset recovery - restoring BARs Apr 27 22:02:40 Tower kernel: vfio-pci 0000:02:00.3: vfio_bar_restore: reset recovery - restoring BARs Apr 27 22:02:40 Tower kernel: vfio-pci 0000:02:00.0: vfio_bar_restore: reset recovery - restoring BARs Apr 27 22:02:40 Tower kernel: vfio-pci 0000:02:00.1: vfio_bar_restore: reset recovery - restoring BARs Apr 27 22:02:40 Tower kernel: vfio-pci 0000:02:00.2: vfio_bar_restore: reset recovery - restoring BARs Apr 27 22:02:40 Tower kernel: vfio-pci 0000:02:00.3: vfio_bar_restore: reset recovery - restoring BARs ... Apr 27 22:02:41 Tower kernel: vfio-pci 0000:02:00.0: vfio_bar_restore: reset recovery - restoring BARs The most useful message seems to be the "vfio_bar_restore: reset recovery - restoring BARs" (note: this message is repeated many times, I clipped most for readability). Googling this returns a decent number of results, though it seems to mostly be with AMD cards from what I can tell. Most of the solutions appear to involve the motherboard. I tried changing which slot the 2070S was in. Originally I had it in the primary (#2) slot with my M2000 (for Plex transcoding) in the #4 slot. I also have an LSI card in the #5 slot. I tried having the M2000 in the primary slot and the 2070S in the #3 slot (this is the way the MB manual recommends having three pcie devices, I originally had the M2000 a slot lower for better airflow to the 2070S). Unfortunately this did not fix the problem. I thought it had fixed it because the system was stable for a few hours but it eventually did crash. (The error log above is from that crash, which is why the GPU is now device 02.00.0). I have not yet tried downgrading to an older MB BIOS. That will be my next step, unless there are any other suggestions. If anyone has had experience with any of the above errors with Nvidia cards, please let me know. Thank you
- April 28, 20215 yr
- 14 replies
Windows VM GPU Losing Signal, Requires Server Reboot
Windows VM GPU Losing Signal, Requires Server Reboot

Team_Dango replied to Team_Dango's topic in VM Engine (KVM)

I don't have a way to monitor the passed-through GPU's temps from within Unraid (I didn't think that was possible, please correct me if I'm wrong). Within Windows, I haven't noticed any abnormal temperatures leading up to a crash. As I mentioned, it seems to happen most reliably when the GPU is under load, but it has happened other times as well. I really appreciate your help. I agree with your diagnostics. The root problem is whatever is causing the crash. From what I can tell the "internal error: ..." seems like a reasonable error to get after an unclean force-stop of the VM. I included that info because I have been having a hard time finding any other error messages or other indicators in the logs of either Unraid or the VM to help diagnose the issue. I will try to recreate the crash tonight and check the logs again to see if I missed anything. Assuming no other suggestions, I'm going to try fully reinstalling Windows on the VM. I thought for sure it was a hardware issue with the 2070S until I got the same error on the 3070, so now I am guessing it is a software issue. I suppose it could still be a hardware error at the motherboard level. For the record I am running an Asus X99 WS/IPMI LGA2011 with an Intel Xeon E5-2680 v3. The GPU is in the second slot, which appears to be the primary slot according to the MB manual. I am on the latest BIOS AFAIK. If a Windows reinstall doesn't change anything I'll try an older BIOS, though I would be surprised if that was the issue since the problem only started happening recently.
- April 26, 20215 yr
- 14 replies
Windows VM GPU Losing Signal, Requires Server Reboot
Windows VM GPU Losing Signal, Requires Server Reboot

Team_Dango replied to Team_Dango's topic in VM Engine (KVM)

That is the GPU.
- April 25, 20215 yr
- 14 replies
Windows VM GPU Losing Signal, Requires Server Reboot
Windows VM GPU Losing Signal, Requires Server Reboot

Team_Dango posted a topic in VM Engine (KVM)

I have a Windows 10 HTPC/gaming VM set up on my Unraid server. It has a dedicated Nvidia RTX 2070 Super. It worked fine for months, but lately it has been having issues where it suddenly stops outputting a signal to the TV. It seems to mostly happen when the GPU is under load or when an application is starting up, though I have had it happen as soon as Windows starts. Sometimes after the VM has crashed the GPU fans ramp to 100% and stay there until the server is rebooted. Also, after a VM crash, the Unraid GUI reports that all CPU threads allocated to the VM are at 100% for a couple of seconds, then just the first thread sits at 100% until the VM is stopped. The VM will not stop cleanly, it has to be force stopped. After this it cannot be started again until the entire server is rebooted. Trying to start the VM before rebooting gives this error (the device ID is the GPU): internal error: Unknown PCI header type '127' for device '0000:01:00.0' The VM issue does not appear to effect any other component running on the server. All docker applications and my other VM's continue running as normal after the Windows VM has crashed. Some searching indicated this might be a VBIOS issue. I was originally using a VBIOS from techpowerup that was modified as explained in SpaceInvaderOne's 2017 GPU passthrough video. I tried the userscript technique from SpaceInvaderOne's newer video to dump my own VBIOS. Using this VBIOS file did not fix the issue. Finally, I tried swapping out the 2070 for the 3070 I have in my desktop machine. I used SpaceInvaderOne's script to dump the VBIOS and did a clean GPU driver reinstall on the VM. At first it seemed like the issue had been resolved, but after a few minutes running heaven benchmark the VM crashed exactly as it had with the 2070. I am now out of ideas. Any advice would be much appreciated. Thank you.
- April 24, 20215 yr
- 14 replies
VNC Failed to Connect After Adding New VM
VNC Failed to Connect After Adding New VM

Team_Dango replied to Team_Dango's topic in VM Engine (KVM)

It turns out the problem was not with the server but rather with Chrome. I was able to get noVNC to connect using Edge, which prompted me to restart Chrome and then it worked. I had tried using Edge to connect previously with no luck, but I don't think I had tried again since restoring the libvirt image. Glad it is working now, but I still do not know why adding the new VM broke things in the first place.
- March 15, 20215 yr
- 2 replies
Team_Dango started following VNC Failed to Connect After Adding New VM
- March 15, 20215 yr
VNC Failed to Connect After Adding New VM
VNC Failed to Connect After Adding New VM

Team_Dango posted a topic in VM Engine (KVM)

I've had an Ubuntu VM running on my server for several months now. I connect to it using the noVNC option built into Unraid. Yesterday I tired adding a second Ubuntu VM for a new task. After starting the new VM I was unable to connect to either it or the old VM. NoVNC simply reports "Failed to connect to server". I deleted the new VM but still could not connect to the old VM. I tried restoring from a saved libvirt image, which involved resetting the server altogether, but still could not connect. My only other VM is a Windows machine hooked up to a GPU and monitor, this works fine. I get the following error in the Unraid logs each time I attempt to connect to the VM via noVNC: Tower nginx: 2021/03/15 09:30:55 [error] 10278#10278: *1827 recv() failed (104: Connection reset by peer) while reading upstream, client: 10.10.20.164, server: , request: "GET //wsproxy/5700/ HTTP/1.1", upstream: "http://127.0.0.1:5700/", host: "10.10.20.2" Any advice on how to fix this would be much appreciated. I am running Unraid 6.9 Thank you
- March 15, 20215 yr
- 2 replies
[SOLVED] AER PCIe Bus Errors
[SOLVED] AER PCIe Bus Errors

Team_Dango replied to Team_Dango's topic in General Support

Thank you for the suggestion. I'll check for that next time I reboot the server.
- March 11, 20215 yr
- 23 replies
[SOLVED] AER PCIe Bus Errors
[SOLVED] AER PCIe Bus Errors

Team_Dango replied to Team_Dango's topic in General Support

After doing some digging I believe I have solved my issue. It seems like it is somewhat a known bug on Asus X99 motherboards. Mine is an Asus X99-WS/IPMI. I am on the latest BIOS so updating was not an option. The solution was to add "pcie_aspm=off" to my syslinux configuration. After a reboot I appear to no longer be getting errors. Fingers crossed it stays fixed. If anyone has anything to add feel free to chime in. If I don't have any errors tomorrow morning I'll mark this solved.
- March 11, 20215 yr
- 23 replies
- - 3
[SOLVED] AER PCIe Bus Errors
[SOLVED] AER PCIe Bus Errors

Team_Dango posted a topic in General Support

I came home to an error saying my log file was full. Turns out I have been receiving a stream of PCIe errors since I made some hardware changes over the weekend. The first device that is throwing errors is one of two GPUs in the system. The errors look like: Tower kernel: pcieport 0000:00:03.0: AER: Multiple Uncorrected (Non-Fatal) error received: 0000:01:00.0 Tower kernel: vfio-pci 0000:01:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID) Tower kernel: vfio-pci 0000:01:00.0: device [10de:1e84] error status/mask=00100000/00000000 Tower kernel: vfio-pci 0000:01:00.0: [20] UnsupReq (First) Tower kernel: vfio-pci 0000:01:00.0: AER: TLP Header: 40000001 00000003 000be7c0 f7f7f7f7 Tower kernel: pcieport 0000:00:03.0: AER: device recovery successful The second device that is throwing errors is my LSI card. This is new. It is an LSI 9207-8i purchased from The Art of the Server on ebay. It is in a PCIe slot that was previously occupied by an NVME SSD in a PCIe adapter. Those errors look like: Tower kernel: mpt3sas 0000:04:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Tower kernel: pcieport 0000:00:03.0: AER: Multiple Uncorrected (Non-Fatal) error received: 0000:01:00.0 Tower kernel: mpt3sas 0000:04:00.0: device [1000:0087] error status/mask=00000001/00002000 Tower kernel: mpt3sas 0000:04:00.0: [ 0] RxErr Despite these errors, both devices are acting normally. The GPU is passed through to a VM and behaves as expected even under full load. The LSI card also appears fully functional. I went through an entire parity check which passed with zero errors. I am currently running through a drive rebuild (not because of drive failure, just swapping it out) and would rather not have to abort, but I also do not know how severe these errors are and if I need to take immediate action. I am attaching my full diagnostics dump. Any advice would be much appreciated. Thank you. tower-diagnostics-20210310-1757.zip
- March 10, 20215 yr
- 23 replies

Team_Dango

Joined

Last visited

Noob

Posts

Reputation

[Support] Joshndroid's Docker Repo Support Thread

[Support] jj9987 - PostgreSQL

Windows VM GPU Losing Signal, Requires Server Reboot

Windows VM GPU Losing Signal, Requires Server Reboot

Windows VM GPU Losing Signal, Requires Server Reboot

Windows VM GPU Losing Signal, Requires Server Reboot

Windows VM GPU Losing Signal, Requires Server Reboot

VNC Failed to Connect After Adding New VM

VNC Failed to Connect After Adding New VM

[SOLVED] AER PCIe Bus Errors

[SOLVED] AER PCIe Bus Errors

[SOLVED] AER PCIe Bus Errors

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)