Jump to content

unRAID plugin for iGPU SR-IOV support


Recommended Posts

30 minutes ago, Revan335 said:

When its added to CA?

Maybe never but I'm really not too sure about that, I don't have hardware for that on hand so that I could help people, but as mentioned in the post that you've linked, I would like to give people the chance to still being able to use the plugin, that's why I maintain it now.

 

That's why the support link points to this thread.

Link to comment
On 11/21/2023 at 5:32 PM, da_stingo said:

 

I am running a Proxmox Hypervisor (Intel 12600) and passing a virtual iGPU to a virtualized Unraid (6.12.4) installation. With your recently posted plugin i was able to enable HW Transcoding with Plex inside a Docker container running on the virtualized unraid installation with the virtualized iGPU.

 

Thank you! :)

 

I want to report back to give some long time running impressions on the plugin.

I installed it successfully like stated in my post on 2023-11-21 and it worked out really well for several hours.

Unfortunately, I got multiple freezes of the unraid vm combined with some system freezes of my hypervisor.

After removing the sriov stuff from my hypervisor and unraid, everything is stable again.

Because this is the unraid forum and there are clearly more components involved in my setup than running unraid on bare-metal, I am not chasing the error. I will simply wait some time and maybe try it again at another point in time (If someone is interested, just give my post a ❤️ and I will report again when the time has come...).

Just wanted to tell you about the problems I found and enrich my first statement from some days ago.

 

  • Like 3
Link to comment

i just try to install it.. its showing up in Device Manager with Code 43. 

Since for me things a bit hard to understand, because i am relativ new to this..

 

What is the right Way to fix this? i was searching in this Tread but nothing helped me :)

 

Could someone give me a Step by Step instruction for dummys :D

 

Thanks a lot. 

 

Edit: What i read here is.. that it has to be 00:02.1 because 2.0 is for unraid? 

How can i Change this? 

image.png.e0005b75274d3626a7f4c2021e205511.png

 

there is also no option to Passthrough 00:02.1 

 

image.thumb.png.4e95a19aa1364f1173267f0c285a7a9c.png

Edited by eLpresidente
Link to comment
3 hours ago, alturismo said:

may take a closer look at the sr iov settings page, there you set the amount of vgpu ... etc ...

Thanks, got errormassage:

 

image.png.6b62e6397aabbf990a2ae77da2d7f7eb.png

 

and in Syslog this:

Quote

Nov 30 17:35:18 stevenas kernel: i915 0000:00:02.0: not enough MMIO resources for SR-IOV
Nov 30 17:35:18 stevenas kernel: i915 0000:00:02.0: [drm] *ERROR* Failed to enable 2 VFs (-ENOMEM)

 

image.thumb.png.404040cf2cb88bc3c005efacdc9252ee.png

 

Tried to Reboot like in the Plugin mentioned.. but still not able to Change it 

Edited by eLpresidente
Link to comment

i am sorry to border you, but for me the Solution https://access.redhat.com/solutions/37376 is hard to understand. 

i am absolutly new to this sorry for asking maybe stupid questions, but never have worked with or on Linux  Based systems.. 

 

Quote

As a workaround solution, one can pass "pci=realloc" to kernel 2.6.32-228.el6 during booting.

1. do i have to write "pci=realloc" in the go File? 

 

Quote

Append the following parameters "intel_iommu=on pci_pt_e820_access=on pci=assign-busses" to the kernel line in grub,conf,
3 , If it is a xen kernel, append iommu=1 msi=1 to xen kernel line (also include the above parameters to the kernel line),
4,options igb max_vfs=XX in /etc/modprobe,conf,

2. same for this ? 

 

3. or do i habe to write all of this in here? 

 

image.thumb.png.8f1ade0ba26cd17ee3287cdc5f2110f5.png

 

Before:

 

kernel /bzimage
append vfio_iommu_type1.allow_unsafe_interrupts=1 pcie_acs_override=downstream,multifunction initrd=/bzroot
acpi_enforce_resources=lax

 

After editing this:

 

kernel /bzimage
append vfio_iommu_type1.allow_unsafe_interrupts=1 pcie_acs_override=downstream,multifunction initrd=/bzroot
acpi_enforce_resources=lax

pci=realloc

intel_iommu=on pci_pt_e820_access=on pci=assign-busses

______________________________________________________________________________________________________

4. And make a modprobe.conf with nano and put in "igb max_vfs=XX" <-- here for XX then 2 if i need 2 VFs ?

 

 

 

 

Edited by eLpresidente
Link to comment
8 minutes ago, eLpresidente said:

i am sorry to border you, but for me the Solution https://access.redhat.com/solutions/37376 is hard to understand. 

in short, drop it ...

 

image.thumb.png.586c84e3b8eb19d4a1778d2a89b71cb6.png

 

you can try the mentioned workarounds ... but dont expect any support therefore ... most likely its a hardware limitation ...

 

9 minutes ago, eLpresidente said:

vfio_iommu_type1.allow_unsafe_interrupts=1 pcie_acs_override=downstream,multifunction

may i ask why you enabled all this ... ;) actually this is offtopic here now, but most likely often also not helpful ...

its a "hard" workaround to split iommu's ... 

Link to comment
6 minutes ago, alturismo said:

in short, drop it ...

 

image.thumb.png.586c84e3b8eb19d4a1778d2a89b71cb6.png

 

you can try the mentioned workarounds ... but dont expect any support therefore ... most likely its a hardware limitation ...

Oh okey i see, i thought you can workaround this with pci=realloc

 

Quote

may i ask why you enabled all this ... ;) actually this is offtopic here now, but most likely often also not helpful ...

its a "hard" workaround to split iommu's ... 

You May.. i guess its from the Start of my UnraidFight :D watching YouTubeVideos and dont understand shit what i am doing :D.. just click the same stupid shit what they show us.. :D i learned that this is not the best way.. but i guess this is still left from that.. 

My VM IOMMU Groups is split in Both.. i guess this came from that right? 

 

image.thumb.png.0b15b72767268781dcad7b9f45cce747.png

 

I'll now diable it and try again.. 

 

but was the way what i wrote up there totaly wrong? 

 

 

Edited by eLpresidente
Link to comment
23 minutes ago, ich777 said:

If you are using my plugin version (recommended post on top of this thread) it will work just fine.

 

I'm trying it now, but installing the plugin is causing a null pointer deference which breaks the entire driver. One this happens, all interactions with the driver hang indefinitely, and the plugin installation never completes - it hangs when reading `sriov_numvfs`:

  ├─plugin -q /usr/local/emhttp/plugins/dynamix.plugin.manager/scripts/plugin installhttps://raw.githubusercontent.co
  │   └─bash /tmp/inline.sh
  │       └─cat /sys/devices/pci0000:00/0000:00:02.0/sriov_numvfs

 

I went directly from 6.12.3 to 6.12.5... I'll try 6.12.4 and see if that's any better.

Log:

[  224.133372] Setting dangerous option enable_guc - tainting kernel
[  224.133729] i915 0000:00:02.0: Running in SR-IOV PF mode
[  224.134491] i915 0000:00:02.0: [drm] VT-d active for gfx access
[  224.134494] BUG: kernel NULL pointer dereference, address: 00000000000000d0
[  224.135129] #PF: supervisor read access in kernel mode
[  224.135759] #PF: error_code(0x0000) - not-present page
[  224.136386] PGD 6a580f067 P4D 6a580f067 PUD 6d2a94067 PMD 0
[  224.137020] Oops: 0000 [#1] PREEMPT SMP NOPTI
[  224.137651] CPU: 10 PID: 10228 Comm: modprobe Tainted: P     U     O       6.1.63-Unraid #1
[  224.138309] Hardware name: ASUSTeK COMPUTER INC. System Product Name/Pro WS W680M-ACE SE, BIOS 2606 07/24/2023
[  224.138988] RIP: 0010:kill_device+0xb/0x27
[  224.139673] Code: 31 c0 48 39 16 0f 94 c0 c3 cc cc cc cc 0f 1f 44 00 00 48 8b 47 40 48 8b 40 50 c3 cc cc cc cc 0f 1f 44 00 00 48 8b 4f 48 31 d2 <8a> 81 d0 00 00 00 a8 01 75 0b 83 c8 01 b2 01 88 81 d0 00 00 00 89
[  224.141119] RSP: 0018:ffffc90046ed7ae0 EFLAGS: 00010246
[  224.141836] RAX: ffff88810808cc90 RBX: ffff88810808cc10 RCX: 0000000000000000
[  224.142567] RDX: 0000000000000000 RSI: ffffffffa13d08e1 RDI: ffff88810808cc10
[  224.143297] RBP: ffff88810808cc90 R08: 0000000000000000 R09: ffffffff829513f0
[  224.144028] R10: 00003fffffffffff R11: fefefefefefefeff R12: 00000e6000000000
[  224.144763] R13: ffff8881011ed000 R14: ffff8881011ed000 R15: ffff888262ff2638
[  224.145480] FS:  000014c1f1174740(0000) GS:ffff88903f680000(0000) knlGS:0000000000000000
[  224.146188] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  224.146890] CR2: 00000000000000d0 CR3: 00000006ac8f4000 CR4: 0000000000750ee0
[  224.147597] PKRU: 55555554
[  224.148290] Call Trace:
[  224.148965]  <TASK>
[  224.149621]  ? __die_body+0x1a/0x5c
[  224.150277]  ? page_fault_oops+0x329/0x376
[  224.150930]  ? do_user_addr_fault+0x12e/0x48d
[  224.151581]  ? exc_page_fault+0xfb/0x11d
[  224.152228]  ? asm_exc_page_fault+0x22/0x30
[  224.152871]  ? kill_device+0xb/0x27
[  224.153499]  device_del+0x3f/0x31d
[  224.154116]  ? i915_ggtt_probe_hw+0x6c4/0x6ef [i915]
[  224.154802]  platform_device_del+0x21/0x70
[  224.155420]  platform_device_unregister+0xf/0x19
[  224.156031]  sysfb_disable+0x2b/0x54
[  224.156632]  aperture_remove_conflicting_pci_devices+0x1e/0x82
[  224.157248]  i915_driver_probe+0x6b3/0xb35 [i915]
[  224.157917]  ? slab_free_freelist_hook.constprop.0+0x3b/0xaf
[  224.158542]  local_pci_probe+0x3d/0x81
[  224.159170]  pci_device_probe+0x190/0x1e4
[  224.159792]  ? sysfs_do_create_link_sd+0x71/0xb7
[  224.160408]  really_probe+0x115/0x282
[  224.161016]  __driver_probe_device+0xc0/0xf2
[  224.161619]  driver_probe_device+0x1f/0x77
[  224.162210]  ? __device_attach_driver+0x97/0x97
[  224.162794]  __driver_attach+0xd7/0xee
[  224.163378]  ? __device_attach_driver+0x97/0x97
[  224.163964]  bus_for_each_dev+0x6e/0xa7
[  224.164548]  bus_add_driver+0xd8/0x1d0
[  224.165125]  driver_register+0x99/0xd7
[  224.165689]  i915_init+0x1d/0x80 [i915]
[  224.166286]  ? 0xffffffffa14f3000
[  224.166833]  do_one_initcall+0x82/0x19f
[  224.167378]  ? kmalloc_trace+0x43/0x52
[  224.167917]  do_init_module+0x4b/0x1d4
[  224.168446]  __do_sys_init_module+0xb6/0xf9
[  224.168960]  do_syscall_64+0x68/0x81
[  224.169456]  entry_SYSCALL_64_after_hwframe+0x64/0xce
[  224.169941] RIP: 0033:0x14c1f1298dfa
[  224.170407] Code: 48 8b 0d 21 20 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ee 1f 0d 00 f7 d8 64 89 01 48
[  224.171405] RSP: 002b:00007ffe6cf96f88 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
[  224.171907] RAX: ffffffffffffffda RBX: 0000000000429440 RCX: 000014c1f1298dfa
[  224.172412] RDX: 000000000042f710 RSI: 0000000000523d48 RDI: 000014c1f008f010
[  224.172910] RBP: 000000000042f710 R08: 0000000000000007 R09: 000000000042fb00
[  224.173401] R10: 0000000000000005 R11: 0000000000000246 R12: 000014c1f008f010
[  224.173881] R13: 0000000000000016 R14: 0000000000429570 R15: 000000000042f710
[  224.174350]  </TASK>
[  224.174805] Modules linked in: i915(O+) drm_buddy drm_display_helper intel_gtt xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle iptable_mangle vhost_net vhost vhost_iotlb tap ipvlan wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha xt_nat xt_tcpudp veth xt_conntrack nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter xt_mark iptable_nat xt_MASQUERADE xfs nfsd auth_rpcgss oid_registry lockd grace sunrpc cmac algif_hash algif_skcipher af_alg md_mod ip6table_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 tun bnep tcp_diag inet_diag ipmi_devintf nct6775 nct6775_core hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs af_packet 8021q garp mrp bridge stp llc zfs(PO) intel_rapl_msr intel_rapl_common iosf_mbi x86_pkg_temp_thermal intel_powerclamp zunicode(PO) zzstd(O) coretemp kvm_intel zlua(O) zavl(PO) kvm icp(PO) crct10dif_pclmul ast
[  224.174839]  crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 drm_vram_helper i2c_algo_bit drm_ttm_helper ttm aesni_intel drm_kms_helper crypto_simd cryptd zcommon(PO) rapl drm znvpair(PO) btusb ipmi_ssif btrtl intel_cstate spl(O) mei_hdcp mei_pxp wmi_bmof btbcm i2c_i801 agpgart i2c_smbus intel_uncore btintel syscopyarea sysfillrect cp210x tpm_crb sysimgblt acpi_ipmi nvme ahci mei_me tpm_tis bluetooth video cdc_ether libahci input_leds mei sr_mod tpm_tis_core i2c_core fb_sys_fops igc usbserial usbnet ecdh_generic ecc led_class atlantic nvme_core cdrom mii vmd fan thermal wmi ipmi_si tpm backlight joydev acpi_tad intel_pmc_core acpi_pad button unix
[  224.184229] CR2: 00000000000000d0
[  224.185027] ---[ end trace 0000000000000000 ]---
[  226.593327] RIP: 0010:kill_device+0xb/0x27
[  226.594162] Code: 31 c0 48 39 16 0f 94 c0 c3 cc cc cc cc 0f 1f 44 00 00 48 8b 47 40 48 8b 40 50 c3 cc cc cc cc 0f 1f 44 00 00 48 8b 4f 48 31 d2 <8a> 81 d0 00 00 00 a8 01 75 0b 83 c8 01 b2 01 88 81 d0 00 00 00 89
[  226.595851] RSP: 0018:ffffc90046ed7ae0 EFLAGS: 00010246
[  226.596697] RAX: ffff88810808cc90 RBX: ffff88810808cc10 RCX: 0000000000000000
[  226.597547] RDX: 0000000000000000 RSI: ffffffffa13d08e1 RDI: ffff88810808cc10
[  226.598404] RBP: ffff88810808cc90 R08: 0000000000000000 R09: ffffffff829513f0
[  226.599265] R10: 00003fffffffffff R11: fefefefefefefeff R12: 00000e6000000000
[  226.600161] R13: ffff8881011ed000 R14: ffff8881011ed000 R15: ffff888262ff2638
[  226.601044] FS:  000014c1f1174740(0000) GS:ffff88903f680000(0000) knlGS:0000000000000000
[  226.601977] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  226.602877] CR2: 00000000000000d0 CR3: 00000006ac8f4000 CR4: 0000000000750ee0
[  226.603796] PKRU: 55555554
[  226.604718] note: modprobe[10228] exited with irqs disabled

 

Edited by Daniel15
Link to comment
9 hours ago, Daniel15 said:

This article is for network cards not GPUs, and it's for a very old kernel (2.6.32). Checking if there's a BIOS update is a good idea though. Your BIOS may have limitations that prevent SR-IOV from working.

you are correct, but thats only 1 hit from google which i took now and in the end, ressource issue is the same ;)

 

and of course, updating bios could help, reverting his iommu pci aes override's could help, pray could help ;)

whenever i read about this error ... its the same road, compared to gvt-g vram issue and Bios implementations ...
nothing you can do about it from outside (standard way".

 

if you have any support for him how he can resolve it, go ahead ;) i couldnt find anything reliable.

Link to comment
12 hours ago, ich777 said:

6.12.6 was released just now. ;)

That's strange because for others it is working. @domrockt for you it is working correct?

Hmm... I'll have to try and figure out why mine is crashing. I tested five times and could consistently repro the crash on 6.12.5, whereas it never happens with 6.12.4. Unfortunately I don't have a test system with an Intel CPU, just my production system, so I'll probably just have to stick to 6.12.4 for now. 

 

@domrocktand @neunghaha28 - which CPU are you using? 

Link to comment
Guest
This topic is now closed to further replies.
×
×
  • Create New...