Jump to content

SimpleDino

Members
  • Posts

    80
  • Joined

  • Last visited

Posts posted by SimpleDino

  1. I have the same issue since the upgrade, all of my VM's with GPU passthrough don't work anymore.

    I can access them through RDP, anydesk etc. but not with sunshine/moonlight without proper encode/decode function.

     

    I can see the GPU's in all device managers and ...(kinda working?!)  but none outputs any display to monitors.

     

    Have any of you found any solution to this?

    Br,

  2. 1 hour ago, ghost82 said:

    There are a lots of AER errors, I think the crash happens because of the log filling.

    The source of the issue could be hardware or software.

    If I were you, first of all, I would clean the slots and check the cables.

    If it's software related it could be fixed with a new kernel update.

    In the meantime to not show these, you can try pcie_aspm=off in your syslinux config.

    Otherwise you can try pci=noaer

     

    Thanks for the quick response.

     

    I'm curious about whether the pcie_aspm=off command will affect all PCIe slots on a global level. Specifically, I'm wondering if it will have an impact on the 4xM.2 PCIe adapter.

    Given the circumstances, pci=noaer might be the preferable solution for the time being. The peculiar thing is, I hadn't experienced any issues until recently.

     

    The problem likely originates from PCIe slot 3, which houses the RTX 3060. As mentioned in my post, Unraid and the VM only freeze when the GPU is passed through to the Windows VM.

    The GPU in PCIe slot 3 operates fine outside of the VM when used for stable diffusion or other tasks. However, the AER error persists, and this error first appeared after the initial system freeze.

  3. Yesterday, I remotely accessed one of my VMs with Nvidia GPU passthrough, and suddenly the whole VM froze and disconnected. Subsequently, the entire UNRAID server became unresponsive, and unfortunately, the only solution was a hard reset. This has occurred twice, and I'm now concerned that if I tempt fate and start the VM a third time, it could cause irreparable damage to the server, potentially leading to data loss.

     

    memtest is OK!

     

    Below is a snippet of what the logs show. Note that the Docker log is also 100% full:

    May 12 14:54:03 Tower kernel: pcieport 0000:46:02.0:   device [10b5:8714] error status/mask=00000080/0000a000
    May 12 14:54:03 Tower kernel: pcieport 0000:46:02.0:    [ 7] BadDLLP               
    May 12 14:54:03 Tower kernel: pcieport 0000:40:01.3: AER: Multiple Corrected error received: 0000:46:02.0
    May 12 14:54:03 Tower kernel: pcieport 0000:46:02.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
    May 12 14:54:03 Tower kernel: pcieport 0000:46:02.0:   device [10b5:8714] error status/mask=00000080/0000a000
    May 12 14:54:03 Tower kernel: pcieport 0000:46:02.0:    [ 7] BadDLLP               
    May 12 14:54:03 Tower kernel: pcieport 0000:40:01.3: AER: Corrected error received: 0000:46:02.0
    May 12 14:54:03 Tower kernel: pcieport 0000:46:02.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
    May 12 14:54:03 Tower kernel: pcieport 0000:46:02.0:   device [10b5:8714] error status/mask=00000080/0000a000
    May 12 14:54:03 Tower kernel: pcieport 0000:46:02.0:    [ 7] BadDLLP               
    May 12 14:54:03 Tower kernel: pcieport 0000:40:01.3: AER: Multiple Corrected error received: 0000:46:02.0
    May 12 14:54:03 Tower kernel: pcieport 0000:46:02.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
    May 12 14:54:03 Tower kernel: pcieport 0000:46:02.0:   device [10b5:8714] error status/mask=00000080/0000a000
    May 12 14:54:03 Tower kernel: pcieport 0000:46:02.0:    [ 7] BadDLLP               
    May 12 14:54:03 Tower kernel: pcieport 0000:40:01.3: AER: Corrected error received: 0000:46:02.0
    May 12 14:54:03 Tower kernel: pcieport 0000:46:02.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
    May 12 14:54:03 Tower kernel: pcieport 0000:46:02.0:   device [10b5:8714] error status/mask=00000080/0000a000
    May 12 14:54:03 Tower kernel: pcieport 0000:46:02.0:    [ 7] BadDLLP               
    May 12 14:54:03 Tower kernel: pcieport 0000:40:01.3: AER: Corrected error received: 0000:46:02.0
    May 12 14:54:03 Tower kernel: pcieport 0000:46:02.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
    May 12 14:54:03 Tower kernel: pcieport 0000:46:02.0:   device [10b5:8714] error status/mask=00000080/0000a000
    May 12 14:54:03 Tower kernel: pcieport 0000:46:02.0:    [ 7] BadDLLP               
              

     

    The log entries show recurring PCIe interface issues. "BadDLLP" is possibly because of corrupted data??

    The device [10b5:8714] see IOMMU GROUP below is probably the because it's connected to the PCIe port.

    The repeating "severity=Corrected" error suggests an underlying issue that, while currently correctable, persists. The 'AER: Multiple Corrected error received' messages from the PCIe port indicate multiple corrected errors from the device at 0000:46:02.0

     

     

    Quote

     

    IOMMU group 53:[10b5:8714] 45:00.0 PCI bridge: PLX Technology, Inc. Device 8714 (rev ab)

    IOMMU group 54:[10b5:8714] 46:01.0 PCI bridge: PLX Technology, Inc. Device 8714 (rev ab)

    IOMMU group 55:[10b5:8714] 46:02.0 PCI bridge: PLX Technology, Inc. Device 8714 (rev ab)

     

     

    PCIe ACS override is set to = both

     

    Attached, you will find the diagnostics.zip folder.

     

    OS: unRAID 6.12.0-rc5

     

    Hardware:
    Threadripper 1950x

    ASUS Zenith Extreme

    PCIE 1: PNY NVIDIA Quadro P2000 5GB

    PCIE 2: M.2 X16 Adapter GEN 3

    PCIE 3: Nvidia RTX 3060 12GB

    PCIE 4: HBA Card LSI SAS 9207-8i

     

     

    tower-diagnostics-20230512-2120.zip

  4. Hello Unraid community,

    I've been working on integrating the Stable Diffusion Discord Bot (ausbitbank/stable-diffusion-discord-bot) with the superboki or Sygils container on Unraid, and I've encountered some issues along the way. Despite several attempts to resolve these errors/issues, I've been unsuccessful so far. I've also posted about the issue on the bot's GitHub repo (ausbitbank/stable-diffusion-discord-bot/issues/41).

     

    The problem I'm experiencing is that while the Discord bot successfully generates an image upon receiving a prompt, the image doesn't appear in the Discord channel. I can find the generated image in the Invoke Docker container output folder, but the path logic seems to be causing issues. I receive the following error from the Discord bot container:

    logs:

     

    Quote

    Error: ENOENT: no such file or directory, open '/mnt/user/SD-Outputs/InvokeAI/03-InvokeAI/output:/app/output000011.ac064d6c.2916674852.png'] { errno: -2, code: 'ENOENT', syscall: 'open', path: '/mnt/user/SD-Outputs/InvokeAI/03-InvokeAI/output:/app/output000011.ac064d6c.2916674852.png' }

     

    I've made adjustments to the .env file and docker-compose.yaml file, as shown below:

    .env file:
     

    Quote

     

    basePath="/mnt/user/SD-Outputs/InvokeAI/03-InvokeAI"

    dbPath="${basePath}/db"

    outputPath="${basePath}"

     

    docker-compose.yaml:

    Quote

    version: '3'

    services:

    discord-bot:

    build: .

    container_name: stable-diffusion-discord-bot

    volumes: - /mnt/user/SD-Outputs/InvokeAI/03-InvokeAI/db:/app/db - /mnt/user/SD-Outputs/InvokeAI/03-InvokeAI/output:/app/output

    environment: - channelID="xxxxx"

    - adminID="xxxxx"

    - apiUrl="http://xxxxxxx:9000/"

    - discordBotKey="xxxxx"

    # ... rest of the environment variables

     

    Dockerfile:

    Quote

    FROM node:14

    WORKDIR /app

    COPY package*.json ./

    RUN npm install

    COPY . .

    CMD ["npm", "start"]

     

    I would greatly appreciate it if someone could help me create a working Docker container for the Stable Diffusion Discord Bot that can be integrated with the superboki or Sygils container on the Unraid CA App Store.

     

    Any assistance or guidance would be immensely helpful.

    Thank you in advance!

  5. On 12/8/2022 at 3:39 PM, ich777 said:

    This is caused by the GPU Statistics plugin.

     

    Please turn of the server completely, pull the power cord from the wall, press the power and reset button a few times to empty the caps, wait 30 seconds and then power up the server and see if this makes any difference.

     

    If not the only thing that I would recommend next is pulling the GPU from the server and try it on a Desktop PC, also install the drivers and put a load on it and see if everything works.

    nvidia-persistenced reports that it can't open the device.

     

    [SOLVED]

    Thanks for all the input, unfortunately it did not help!
    Figured out the issue... Apparently if you populate the last PCIe GPU slot on the X570 Taichi Motherboard then the first one becomes inactive somehow. Second one is also populated, used in VM's etc.

     

    I populated the last gpu slot as mentioned with a PCIE M.2 Adapter with four 970 Plus 1TB nvme's but I did not use or activate in Unraid until the other day and that is when nvidia-plugin gave out a warning and first gpu slot (P2000) got inactive but recognized in devices.

     

    I've removed the M.2 PCIE card until I get hold of an Threadripper system.

    • Like 2
  6. Hi,

     

    Today I got a notification saying nvidia-plugin had crashed (don't remember the detail...and yes I am ashamed) and when I tried to go to the Nvidia-plugin under settings the whole screen/UnRaid froze! After rebooting I deleted the plugin and re-installed to the letter according to your instructions etc. No more freezing issues and the plugin is OK, but now I can not find the Nvidia P2000 gpu card under the info section in the plugin (see picture). Has anyone encountered this issue? Or have I just missed a step during uninstall & install part... Any idea?! The card is no connected to any VMs or bound to VFIO.

     

    Also at htop am getting high cpu usage for this: nvidia-smi -q -x -g GPU-3181...........this is the gpu ID I guess. Could this be the p-state script for power consumptions or me trying nvidia-smi command in console and it's not finding anything?

     

    Steps taken:

    1. Uninstall plugin

    2. Reboot

    3. Install according to instructions

    4. Reboot

    5. Downgrade to previous driver...no luck

    5.1 Reboot

    6.0 Upgrade to current version

    6.1 Reboot.

     

     

    tower-diagnostics-20221208-1504.zip

    Nvidia-driver.jpg

    nvidia-smi.jpg

    htop - nvidia.jpg

  7.   <devices>
        <emulator>/usr/local/sbin/qemu</emulator>
        <disk type='file' device='disk'>
          <driver name='qemu' type='qcow2' cache='writeback'/>
          <source file='/mnt/user/domains/Ventura/Monterey GPU/Monterey GPU/BigSur-opencore.img'/>
          <target dev='hdc' bus='sata'/>
          <boot order='1'/>
          <address type='drive' controller='0' bus='0' target='0' unit='2'/>
        </disk>

    Here is the format of vdisk

  8. Hi!

     

    Just want to say that I've done this many times and created several VMS of the original VMs (copy or cloned) without issues before.

     

    Steps done to Clone/copy VM:

    1. Copy Full folder containing VM from domains

    2. Paste in a new folder with new name in domains.

    3. Start new Custom VM, copy and paste the XML from original VM

    3.1 Give it a new name, delete uuid (new will be generated)

    3.2 Delete previous source file and point to the new one (the new folder with copied VM in it)

    4. Edit helper script and replace the name of vm with the new and run the script.

    5. Voila! as I said, this method has worked until now.

     

    But now after updating to the latest UnRaid OS, I am getting this error when trying create & fire up a NEW VM. Any clues how to fix the image file or some other related issue?

     

    image.png.1d9b664fc95a046af3248238963a19ea.png

     

     

    Br,

     

     

  9. On 6/17/2022 at 1:17 PM, Squid said:

    Probably a coincidence, but you want to recreate the image file https://forums.unraid.net/topic/57181-docker-faq/#comment-564309

     

     

    Also, I'd hazard a guess that you've been having problem with the image filling up (this may have caused the problem), as your current set value for the size is 100G, but it used to be 200G (and is still mounting that size).

     

    If you've haven't managed to figure out why the image is filling up, I'd switch from using an image to instead use a folder for docker.  More space efficient and you never need to worry about the sizing again

    Thanks, recreating the image helped.

    I have not figured it unfortunately, the only big change I've made is updating the OS. Is there any good guide for frp, image --> folder?

     

    Thanks in advance!

     

    On 6/17/2022 at 1:51 PM, JorgeB said:

    Btrfs is detecting data corruption in all your pools, you should run memtest.

    I run memtest and got zero errors this time...but I did this after I solved the issue above with @Squid's help and a restart.

  10. 19 hours ago, Jorgen said:

    Remote access to lan works, maybe some of the other options too.

    The trick is to allow access from the wireguard network to your delugevpn docker (I’m assuming you’re using binhex’s delugevpn).
    Add the wireguard network range to the LAN_Network variable of the vpn docker, comma separated from your normal lan range. If you are using the default wireguard settings the range to add is: 10.253.0.0/24


    Sent from my iPhone using Tapatalk

    Ohh Thanks, it worked perfectly!

     

    • Like 1
  11. @SpaceInvaderOne Thanks a lot for the container update and your guides per usual! Appreciated!

    @ghost82 Thanks for all of your good and explanatory inputs! You have helped a lot of rookies like me!


    Just want to report a success for once and not an issue/error.

    Steps I took when updating to new macinabox and using the new scripts on current VMs with complete success:

    1. Remove Macinabox plus the scripts and also rm -r /mnt/user/appdata/macinabox.
    2. Download Macinabox with the desired new settings, wait for scripts to load.
    3. Copy the name of whatever macOS VM you want to update with and input it into the new helper script.
    4. Run script twice, now XML of that VM is updated.
    5. Now I can make whatever changes (remove/attach PCI & USB devices, HDDs/SSDs) I want in the VM and then just run the script once or twice without further manual XML editing.

    All inputs are welcome if I have forgotten or misunderstood something in the update

    Goodnights!

  12. I need help with containers that are routed through NordVPN & DelugeVPN container. On the local network I can access all of the routed containers WebUI but remotely (outside of lan network) when accessing through the wireguard vpn I can not access the same containers WebUI. Do I make any sense?

     

    I've tried these peers without any luck:

    • Remote access to server
    • Remote access to lan
    • Remote tunneled access
    •  

    I have not tried lan to lan access, because honestly I do not know how to set this up if this is the solution.

    • Like 1
  13. 3 hours ago, delgatto said:

     

    root@a9dc4d0a983b:/# sudo apt update
    Get:1 http://archive.ubuntu.com/ubuntu focal InRelease [265 kB]
    Get:2 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]
    Get:3 http://archive.ubuntu.com/ubuntu focal-backports InRelease [108 kB]
    Get:4 http://archive.ubuntu.com/ubuntu focal/restricted amd64 Packages [33.4 kB]
    Get:5 http://archive.ubuntu.com/ubuntu focal/universe amd64 Packages [11.3 MB]
    Get:6 https://repo.nordvpn.com/deb/nordvpn/debian stable InRelease [6174 B]                           
    Get:7 https://repo.nordvpn.com/deb/nordvpn/debian stable/main amd64 Packages [6280 B]                 
    Get:8 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]                                                
    Get:9 http://security.ubuntu.com/ubuntu focal-security/universe amd64 Packages [839 kB]                                  
    Get:10 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages [1470 kB]                                    
    Get:11 http://security.ubuntu.com/ubuntu focal-security/restricted amd64 Packages [889 kB]                               
    Get:12 http://security.ubuntu.com/ubuntu focal-security/multiverse amd64 Packages [30.1 kB]                              
    Get:5 http://archive.ubuntu.com/ubuntu focal/universe amd64 Packages [11.3 MB]                                           
    Get:13 http://archive.ubuntu.com/ubuntu focal/multiverse amd64 Packages [177 kB]                                         
    Get:14 http://archive.ubuntu.com/ubuntu focal/main amd64 Packages [1275 kB]                                              
    Get:15 http://archive.ubuntu.com/ubuntu focal-updates/universe amd64 Packages [1121 kB]                                  
    Get:16 http://archive.ubuntu.com/ubuntu focal-updates/multiverse amd64 Packages [33.7 kB]                                
    Get:17 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages [1940 kB]                                      
    Get:18 http://archive.ubuntu.com/ubuntu focal-updates/restricted amd64 Packages [1003 kB]                                
    Get:19 http://archive.ubuntu.com/ubuntu focal-backports/universe amd64 Packages [23.8 kB]                                
    Get:20 http://archive.ubuntu.com/ubuntu focal-backports/main amd64 Packages [50.8 kB]                                    
    Fetched 9879 kB in 9min 42s (17.0 kB/s)                                                                                  
    Reading package lists... Done
    Building dependency tree       
    Reading state information... Done
    6 packages can be upgraded. Run 'apt list --upgradable' to see them.
    root@a9dc4d0a983b:/# sudo apt speedtest-cli
    E: Invalid operation speedtest-cli

    Yeah sorry, I see others has already commented. I forgot the install part in sudo apt install speedtest-cli.

    I have edited the post now also.

  14. 9 minutes ago, delgatto said:

     

    Hi! Thanx for posting this!

    But for me it doesn't work. After sudo apt speedtest-cli it's reporting

    E: Invalid operation speedtest-cli.

     

    Don't know why it doesn't work for you. could post your input commands and the following texts?

     

     

×
×
  • Create New...