geekazoid

Members
  • Posts

    48
  • Joined

  • Last visited

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

geekazoid's Achievements

Rookie

Rookie (2/14)

2

Reputation

  1. Having this issue with a VM as well. It's an anomaly. Win10, 6 cores. The CPU usage reported is host only, in VM the CPUs are idle. So this is a hypervisor/VM configuration issue. I'm chasing it down as a libvirt/qemu issue since I'm not seeing a solution in the unraid searches. Check this thread out for example: link.
  2. Attached. It's been a while so I will quickly recap: GPU C- onboard VGA to 1080@60 (for UNRAID console only) GPU 1- Quadro M4000 DisplayPort 1.2 to 4K@60 (for PCI Passthrough only - workstation 1) GPU 2- GT 1030 HDMI 2.0 to 4K@60 (for PCI Passthrough only - workstation 2) In BIOS, the primary graphics is set to the onboard (GPU C). The other option is to set it to the PCIe and then it will likely go to the first PCI ID which is GPU 1. I only use the first: onboard. As desired, when console is on GPU C, I have remote access to it via the integrated IPMI board. If the GPUs 1 or 2 cables are connected to a display, console will always go to GPU 1. If I disconnect those cables, it will go to GPU C as desired. After boot I can connect the cables and Passthrough will work as normal. If I let the console go to GPU 1 and then attempt PCI Passthrough, it will work but I will lose console. And of course no access to console from IPMI management agent. I'm running 6.8.2 at this point, very stable. This issue emerged 2yrs ago after a BIOS update and subsequent BIOS updates have not changed it back. We could blame ASUS if we want, alhough my feeling is that if you tell the kernel to boot and use a device, and ignore another device, and it does not do that, there is a bug. Right now we are using this UNRAID machine as a dual headed (soon triple) workstation and it is very stable and reliable. The GPU cable issue on reboot is inconvenient at worst. I feel that I wasted my money buying a server board for UNRAID when really it can't support the server features. But UNRAID brings so many other conveniences that I can't complain too loudly. fluffy-diagnostics-20200224-1232.zip
  3. Cool. Well what it changes is this: - the iommu group numbers change for my passthrough devices, possible more. Total iommu groups is diminished. All my passthrough devices are listed though. - my PCIe USB cards are still not showing up as shareable in the VM manager - the devices connected to my passthrough USB cards are still connected to my host despite the blacklist - this appears in dmesg: root@fluffy:~# dmesg | grep vfio [ 0.000000] Command line: BOOT_IMAGE=/bzimage vfio-pci.ids=10de:13f1,10de:1d01,1912:0014,1b73:111i intel_iommu=on initrd=/bzroot [ 0.000000] Kernel command line: BOOT_IMAGE=/bzimage vfio-pci.ids=10de:13f1,10de:1d01,1912:0014,1b73:111i intel_iommu=on initrd=/bzroot [ 12.411973] vfio-pci 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none [ 12.424129] vfio_pci: add [10de:13f1[ffffffff:ffffffff]] class 0x000000/00000000 [ 12.424473] vfio-pci 0000:81:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none [ 12.436168] vfio_pci: add [10de:1d01[ffffffff:ffffffff]] class 0x000000/00000000 [ 12.436405] vfio_pci: add [1912:0014[ffffffff:ffffffff]] class 0x000000/00000000 [ 12.436621] vfio_pci: add [1b73:0111[ffffffff:ffffffff]] class 0x000000/00000000 Also this does nothing either way: root@fluffy:~# cat /boot/config/vfio-pci.cfg BIND=0000:0100 0000:8100 0000:0200 0000:0300 All of my test cycles feature a full cold start. I'm going to start regression testing on 6.6 soon because this is messing with my workaday life.
  4. Oh in fact I believe that you meant this: Thus the correct form would be: append vfio-pci.ids=10de:13f1,10de:1d01,1912:0014,1b73:111 intel_iommu=on initrd=/bzroot Is this right?
  5. Sorry I think I forgot to enable notify on reply. Thanks for this important comment on my syntax. By provider I assume that you mean domain? So basically use the notation from dmesg. So if I follow you, this: append vfio-pci.ids=01:00 81:00 02:00 03:00 intel_iommu=on initrd=/bzroot should be this: append vfio-pci.ids=0000:0100 0000:8100 0000:0200 0000:0300 intel_iommu=on initrd=/bzroot
  6. Did some clean cycles to confirm my condition. Here's what happens when I employ vfio settings "new method" as above after a clean boot without it (no errors) and a reconfigure with it after a power cycle: [ 98.766638] vfio-pci 0000:01:00.0: BAR 1: can't reserve [mem 0xb0000000-0xbfffffff 64bit pref] (above repeats ~2000 times) [ 98.932910] pcieport 0000:00:01.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID) [ 98.932914] pcieport 0000:00:01.0: device [8086:2f02] error status/mask=00004000/00000000 [ 98.932918] pcieport 0000:00:01.0: [14] CmpltTO (First) [ 98.932922] pcieport 0000:00:01.0: broadcast error_detected message [ 98.933000] pcieport 0000:00:01.0: broadcast mmio_enabled message [ 98.933003] pcieport 0000:00:01.0: broadcast resume message [ 98.933010] pcieport 0000:00:01.0: AER: Device recovery successful [ 98.950407] pcieport 0000:00:01.0: AER: Uncorrected (Non-Fatal) error received: 0000:00:01.0 [ 98.950414] pcieport 0000:00:01.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID) [ 98.950417] pcieport 0000:00:01.0: device [8086:2f02] error status/mask=00004000/00000000 [ 98.950421] pcieport 0000:00:01.0: [14] CmpltTO (First) I can now remove the vfio settings, *power cycle* then it will go back to normal. That's the excruciating detail, lemme know if there is a better thread for this.
  7. I should add that you need to power cycle the motherboard between changes to these settings. The PCI root device was throwing errors in dmesg after I added he vfio options, even after rolling them back and rebooting. It must put the device in a state that doesn't change until you power cycle the board. YMMV.
  8. What is the best way to blacklist devices so that the unraid host's kernel doesn't try to use them? For example I have two GPUs and two USB cards dedicated to two VMs, not to mention all of the peripherals connected to those USB cards. I don't want unraid to even bother trying to use these devices when the VMs are not started - connecting to those USB devices to the host is problematic. Right now I can see when I stop a VM all the passed through devices start appearing in dmesg. I used append vfio-pci.ids=01:00,81:00,02:00,03:00 but it didn't change this behavior at all. When I added the vfio-pci.cfg it totally broke passthrough. So, to recap: - Passthrough works well with no configuration added, but devices aren't properly blacklisted which requires manual intervention by me for every start/stop of the host or guest machines. - the "new method" breaks passthrough I need a more direct way of blacklisting devices at startup, ideally to tell the kernel to just leave 4 PCI IDs alone completely and use the hardware it has been provided.
  9. Sorry for the late reply. This is an old thread. I can recreate this problem and compare if you still have the issue.
  10. Months ago I reported an issue where this plugin stopped working. It would stall while scanning and never refresh the page. If there were no errors, you could never see the plugin page because it would auto-scan (undesirable) and then hang with no feedback. This behavior is gone now. I don't know if you made changes but assuming you didn't there are some environmental changes on my end that could have contributed to this. I recently relocated my machine back to a network I manage (HOME) from the network I was a guest (INLAWS) in. HOME is the same network my server was in for years before the problem with the FCP plugin emerged. It's the same router and same setup. Two differences between these environments: First, at HOME I control the router and I have internal horizon DNS with reverse records for everything. This cleans up a lot of networking. Second, at HOME the unraid server is on a physical connection to the LAN whereas at the INLAWS it was on a wireless bridge due to lack of proper wiring. I hope this contributes something to others' troubleshooting.
  11. I'm aware of the /boot/previous and it being available to restore from the https://<unraid_server>/Tools/Update however it really should be implemented so that you can boot from it in grub. I would work on this myself but I really have a full plate this summer and just can't take on any other projects. I am not convinced that file corruption is the issue. Both times this has happened I repaired it with make_bootable.bat. EFI is what was broken.
  12. This happened last time I upgraded, to 6.7.1. And it happened exactly the same way this time. I stopped the array, made a backup of /boot, ran update assistant, upgraded the OS, and it kernel panics on boot. From what I can tell, the update is breaking EFI. But I'm sure that comments here will tell me that its normal and not fix it. Going back to 6.6.7.
  13. You know what would be smart? If the update script made a backup of the critical files its manipulating, and verified it's work before reporting SUCCESS! It's not that hard to at least make sure you don't corrupt files. Why you guys keep telling users to do a job a script can do for them is beyond me. I was surprised when my 6.7.1 update blew up and the comments were "maybe the update failed". Like, you have a script doing this that didn't check file integrity!? Why suddenly is my storage system unable to write files reliably?? Nothing in 2yrs of running Unraid has made me feel less trusting of it than these recent threads I've seen about corrupted files during upgrades.