jwiener3

Members
  • Posts

    12
  • Joined

  • Last visited

Posts posted by jwiener3

  1. I am trying to get unraid (6.9 rc2) running as a vm on 6.7 U3 with a P400 GPU passthrough. When I add "hypervisor.cpuid.v0=FALSE" to the config file it will not boot with more than 1 cpu assigned. It appears like it is tries to load the Intel Micro Code and halts. I have tried disabling the microcode with the disable security mitigations plugin, but I still seeing in the hypervisor logs that it tries to load them. With 1 CPU it tries to as well, but doesn't seem to issue the CPU reset as listed in the logs.

    Below is the log from the vm with 2 or more CPUs

    2021-01-11T02:51:41.855Z| vcpu-0| I125: APIC THERMLVT write: 0x10000
    2021-01-11T02:51:43.182Z| vcpu-0| W115: CPU microcode update available.
    2021-01-11T02:51:43.182Z| vcpu-0| W115+ The guest OS tried to update the microcode from patch level 67 (43h) to patch level 68 (44h), but VMware ESX does not allow microcode patches to be applied from within a virtual machine.
    2021-01-11T02:51:43.182Z| vcpu-0| W115+ Microcode patches are used to correct CPU errata. You may be able to obtain a BIOS/firmware update which includes this microcode patch from your system vendor, or your host OS may provide a facility for loading microcode patches.CPU reset: soft (mode 0)

    Here is the log from 1 CPU

    2021-01-11T03:35:11.496Z| vcpu-0| I125: UHCI: HCReset
    2021-01-11T03:35:12.754Z| vcpu-0| W115: CPU microcode update available.
    2021-01-11T03:35:12.754Z| vcpu-0| W115+ The guest OS tried to update the microcode from patch level 67 (43h) to patch level 68 (44h), but VMware ESX does not allow microcode patches to be applied from within a virtual machine.
    2021-01-11T03:35:12.754Z| vcpu-0| W115+ Microcode patches are used to correct CPU errata. You may be able to obtain a BIOS/firmware update which includes this microcode patch from your system vendor, or your host OS may provide a facility for loading microcode patches.SVGA: Unregistering IOSpace at 0x1070
    2021-01-11T03:35:13.182Z| vcpu-0| I125: SVGA: Unregistering MemSpace at 0xe8000000(0xe8000000) and 0xfe000000(0xfe000000)

     

    Below is a screen shot from unraid where it halts.2021-01-10_20-52-14.thumb.png.b5e0d5c5a9a7fa1471f4b54f07c6f96e.png

     

    I was able to pass the GPU through to an Ubuntu 18.04 machine with the same virtual hardware and have it working. When I first tried it I ran into the same problem and found a post recommending to purged the intel microcode, I followed those steps to get it working. 

    Any idea of what I can do on unraid and how I can actually delete those files?  

    Reading from the later posts on the security mitigation plugin it is unclear if that boot option is still working in the newer kernels (which might be my problem).

     

     

     

     

     

     

  2. I know this is an old thread, but for anyone that stumbles across it looking for advice. I did get this working on 6.8.3 with the PAN os 9.1.2 qcow image (I did not try any other version). See below for my setting and I added 8 NICs the first being the management NIC. It takes a while to startup and I do get some errors (see screen shot), but it works well and I get the performance I need (500mbps down and 20mbps up).

     

    2020-07-20_9-09-14.png

    2020-07-20_9-13-47.png

  3. On 7/6/2020 at 11:38 AM, jwiener3 said:

    I tried this with a new blank config and saw the same issues and I also tried ich777 - Nvidia/DVB Kernel Helper/Builder Docker with the same results. So there must be something with my hardware and the nvidia driver, I just do not know where to begin on how to troubleshoot it. Any advice would be appreciated.

    Just to follow up here. Thanks to @ich777 I was able to figure out my issue, it was the pcie power. I put the card in a different machine and it worked fine, so I knew it wasn't the card. So I pulled the PSU out of another box and powered the card with that PSU while in my main server and it works! So I just ordered what I needed to make it a permanent fix. Thank you for those that helped me.

    • Thanks 1
  4. On 7/2/2020 at 8:58 AM, Solverz said:

    Just out of curiosity, try and do a fresh install of unraid on a spare USB, unplug your original unraid usb and put in the spare one.

    (Don't start the array or anything or assign any drives as if you assign them incorrectly you could lose data)

    Then install unraid nvidia.

     

    Now see if that boots okay and if it does you definitely know that there's something on your original unraid install that is causing an issue.

    Just to rule out anything hardware related possibly?

    I tried this with a new blank config and saw the same issues and I also tried ich777 - Nvidia/DVB Kernel Helper/Builder Docker with the same results. So there must be something with my hardware and the nvidia driver, I just do not know where to begin on how to troubleshoot it. Any advice would be appreciated.

  5. 9 hours ago, ich777 said:

    Then there might be something wrong with the bzroot...

    Try to go back to the stock one and try it again.

     

    EDIT: sorry should be a bit more specific, put your usb thumb drive into your computer download the Unraid version 6.8.3 from the downloadpage and replace the bzroot/bzmodules/bzfirmware/bzimages after that put it back on your server and reboot, another method will be to log via SFTP into your server (should be still possible if you can connect through ssd) a tool for that would be WinSCP if you are on windows, then go to your boot directory and replace the above mentioned files.

    Thank you, I did recover by going back to the standard files. I tried again with the NVIDIA drivers and had the same results. Put it back to the standard Unraid version again and then tried a 3rd time with the latest NVIDIA 6.9.0(22) with the same results.  I have reverted back to 6.8.3 standard, but does anyone have any suggestions on getting this to work so I can use my NVIDIA card? I am hoping to use it for my plex docker for transcoding.

  6. I tried installing this yesterday and after the reboot, my OS does not start all the way. I am unable able to log in with ssh, but not through the GUI. I see these logs in syslog and if I reboot per the message it is the same. It does start all the way in safe mode.

    Jun 30 20:13:20 Tower root: plugin: installing: /boot/config/plugins/Unraid-Nvidia.plg
    Jun 30 20:13:20 Tower root: plugin: running: anonymous
    Jun 30 20:13:20 Tower root:
    Jun 30 20:13:20 Tower root:
    Jun 30 20:13:20 Tower root:
    Jun 30 20:13:20 Tower root:
    Jun 30 20:13:20 Tower root: plugin: running: anonymous
    Jun 30 20:13:20 Tower root: plugin: skipping: /boot/config/plugins/Unraid-Nvidia/Unraid-Nvidia-2019.06.23.txz already exists
    Jun 30 20:13:20 Tower root: plugin: running: /boot/config/plugins/Unraid-Nvidia/Unraid-Nvidia-2019.06.23.txz
    Jun 30 20:13:20 Tower root:
    Jun 30 20:13:20 Tower root: +==============================================================================
    Jun 30 20:13:20 Tower root: | Installing new package /boot/config/plugins/Unraid-Nvidia/Unraid-Nvidia-2019.06.23.txz
    Jun 30 20:13:20 Tower root: +==============================================================================
    Jun 30 20:13:20 Tower root:
    Jun 30 20:13:20 Tower root: Verifying package Unraid-Nvidia-2019.06.23.txz.
    Jun 30 20:13:20 Tower root: Installing package Unraid-Nvidia-2019.06.23.txz:
    Jun 30 20:13:20 Tower root: PACKAGE DESCRIPTION:
    Jun 30 20:13:20 Tower root: Package Unraid-Nvidia-2019.06.23.txz installed.
    Jun 30 20:13:20 Tower root: plugin: running: anonymous
    Jun 30 20:13:21 Tower kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
    Jun 30 20:13:21 Tower kernel: #PF: supervisor write access in kernel mode
    Jun 30 20:13:21 Tower kernel: #PF: error_code(0x0002) - not-present page
    Jun 30 20:13:21 Tower kernel: PGD 80000013f5291067 P4D 80000013f5291067 PUD 1426ff2067 PMD 0
    Jun 30 20:13:21 Tower kernel: Oops: 0002 [#1] SMP PTI
    Jun 30 20:13:21 Tower kernel: CPU: 20 PID: 5398 Comm: nvidia-smi Tainted: P           O      5.7.2-Unraid #1
    Jun 30 20:13:21 Tower kernel: Hardware name: Cisco Systems Inc UCSC-C220-M4S/UCSC-C220-M4S, BIOS C220M4.3.0.4g.0.1113190807 11/13/2019
    Jun 30 20:13:21 Tower kernel: RIP: 0010:_nv025250rm+0x8/0x40 [nvidia]
    Jun 30 20:13:21 Tower kernel: Code: 1f 00 41 8b 4d 08 41 39 0a 4c 89 d6 0f 82 5b fe ff ff e9 69 fe ff ff 90 66 2e 0f 1f 84 00 00 00 00 00 48 83 ec 08 48 8b 42 48 <83> 00 01 c6 42 10 01 e8 5c f4 ff ff 85 c0 74 13 89 c2 be 00 10 57
    Jun 30 20:13:21 Tower kernel: RSP: 0018:ffffc90020e97a28 EFLAGS: 00010296
    Jun 30 20:13:21 Tower kernel: RAX: 0000000000000000 RBX: 000000000000001c RCX: ffff8893eed22e48
    Jun 30 20:13:21 Tower kernel: RDX: ffff8893f6963408 RSI: ffff8893e809e008 RDI: ffff8893eed64008
    Jun 30 20:13:21 Tower kernel: RBP: ffff8893eed22e40 R08: ffffffffa0a60930 R09: ffff8893eed229ec
    Jun 30 20:13:21 Tower kernel: R10: 0000000000001516 R11: 0000000000000000 R12: ffff8893eed64008
    Jun 30 20:13:21 Tower kernel: R13: ffff8893f68d4008 R14: ffff8893eed64008 R15: ffff8893f33e0008
    Jun 30 20:13:21 Tower kernel: FS:  000014bad9eb3b80(0000) GS:ffff88942fd00000(0000) knlGS:0000000000000000
    Jun 30 20:13:21 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Jun 30 20:13:21 Tower kernel: CR2: 0000000000000000 CR3: 00000013f536c001 CR4: 00000000003606e0
    Jun 30 20:13:21 Tower kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    Jun 30 20:13:21 Tower kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Jun 30 20:13:21 Tower kernel: Call Trace:
    Jun 30 20:13:21 Tower kernel: ? _nv025251rm+0x260/0x260 [nvidia]
    Jun 30 20:13:21 Tower kernel: ? _nv031459rm+0x7a/0xb0 [nvidia]
    Jun 30 20:13:21 Tower kernel: ? _nv031799rm+0x6ec/0x2440 [nvidia]
    Jun 30 20:13:21 Tower kernel: ? _nv021294rm+0xbb/0x1a0 [nvidia]
    Jun 30 20:13:21 Tower kernel: ? _nv021542rm+0x27/0x50 [nvidia]
    Jun 30 20:13:21 Tower kernel: ? _raw_spin_unlock_irqrestore+0xd/0xe
    Jun 30 20:13:21 Tower kernel: ? _nv000901rm+0x1200/0x1cc0 [nvidia]
    Jun 30 20:13:21 Tower kernel: ? rm_init_adapter+0xd5/0xe0 [nvidia]
    Jun 30 20:13:21 Tower kernel: ? nv_open_device+0x434/0x648 [nvidia]
    Jun 30 20:13:21 Tower kernel: ? nvidia_open+0x2a1/0x41a [nvidia]
    Jun 30 20:13:21 Tower kernel: ? nvidia_frontend_open+0x62/0x8d [nvidia]
    Jun 30 20:13:21 Tower kernel: ? chrdev_open+0x150/0x187
    Jun 30 20:13:21 Tower kernel: ? cdev_put+0x19/0x19
    Jun 30 20:13:21 Tower kernel: ? do_dentry_open+0x181/0x296
    Jun 30 20:13:21 Tower kernel: ? path_openat+0x85a/0x933
    Jun 30 20:13:21 Tower kernel: ? do_filp_open+0x4c/0xa9
    Jun 30 20:13:21 Tower kernel: ? up_write+0x17/0x24
    Jun 30 20:13:21 Tower kernel: ? chown_common.isra.0+0xec/0x14d
    Jun 30 20:13:21 Tower kernel: ? _cond_resched+0x1b/0x1e
    Jun 30 20:13:21 Tower kernel: ? slab_pre_alloc_hook+0x2c/0x53
    Jun 30 20:13:21 Tower kernel: ? do_sys_openat2+0x6d/0xd9
    Jun 30 20:13:21 Tower kernel: ? do_sys_open+0x35/0x4f
    Jun 30 20:13:21 Tower kernel: ? do_syscall_64+0x7a/0x87
    Jun 30 20:13:21 Tower kernel: ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
    Jun 30 20:13:21 Tower kernel: Modules linked in: iptable_nat xt_MASQUERADE nf_nat ip_tables wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 libchacha poly1305_x86_64 ip6_udp_tunnel udp_tunnel libblake2s blake2s_x86_64 libblake2s_generic bonding ixgbe mdio igb i2c_algo_bit nvidia_drm(PO) nvidia_modeset(PO) nvidia(PO) crc32_pclmul intel_rapl_perf intel_uncore aesni_intel glue_helper crypto_simd ghash_clmulni_intel cryptd kvm_intel kvm drm_kms_helper intel_cstate coretemp mxm_wmi drm crct10dif_pclmul intel_powerclamp crc32c_intel sb_edac backlight syscopyarea sysfillrect sysimgblt fb_sys_fops x86_pkg_temp_thermal agpgart ipmi_si ahci input_leds megaraid_sas libahci ipmi_ssif i2c_core led_class wmi acpi_power_meter button acpi_pad [last unloaded: mdio]
    Jun 30 20:13:21 Tower kernel: CR2: 0000000000000000
    Jun 30 20:13:21 Tower kernel: ---[ end trace 2ead729f5369cb81 ]---
    Jun 30 20:13:21 Tower kernel: RIP: 0010:_nv025250rm+0x8/0x40 [nvidia]
    Jun 30 20:13:21 Tower kernel: Code: 1f 00 41 8b 4d 08 41 39 0a 4c 89 d6 0f 82 5b fe ff ff e9 69 fe ff ff 90 66 2e 0f 1f 84 00 00 00 00 00 48 83 ec 08 48 8b 42 48 <83> 00 01 c6 42 10 01 e8 5c f4 ff ff 85 c0 74 13 89 c2 be 00 10 57
    Jun 30 20:13:21 Tower kernel: RSP: 0018:ffffc90020e97a28 EFLAGS: 00010296
    Jun 30 20:13:21 Tower kernel: RAX: 0000000000000000 RBX: 000000000000001c RCX: ffff8893eed22e48
    Jun 30 20:13:21 Tower kernel: RDX: ffff8893f6963408 RSI: ffff8893e809e008 RDI: ffff8893eed64008
    Jun 30 20:13:21 Tower kernel: RBP: ffff8893eed22e40 R08: ffffffffa0a60930 R09: ffff8893eed229ec
    Jun 30 20:13:21 Tower kernel: R10: 0000000000001516 R11: 0000000000000000 R12: ffff8893eed64008
    Jun 30 20:13:21 Tower kernel: R13: ffff8893f68d4008 R14: ffff8893eed64008 R15: ffff8893f33e0008
    Jun 30 20:13:21 Tower kernel: FS:  000014bad9eb3b80(0000) GS:ffff88942fd00000(0000) knlGS:0000000000000000
    Jun 30 20:13:21 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Jun 30 20:13:21 Tower kernel: CR2: 0000000000000000 CR3: 00000013f536c001 CR4: 00000000003606e0
    Jun 30 20:13:21 Tower kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    Jun 30 20:13:21 Tower kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Jun 30 20:13:21 Tower kernel: general protection fault, probably for non-canonical address 0x61ad8614c5ac1b00: 0000 [#2] SMP PTI
    Jun 30 20:13:21 Tower kernel: CPU: 20 PID: 5398 Comm: nvidia-smi Tainted: P      D    O      5.7.2-Unraid #1
    Jun 30 20:13:21 Tower kernel: Hardware name: Cisco Systems Inc UCSC-C220-M4S/UCSC-C220-M4S, BIOS C220M4.3.0.4g.0.1113190807 11/13/2019
    Jun 30 20:13:21 Tower kernel: RIP: 0010:_nv007414rm+0x2c/0x330 [nvidia]
    Jun 30 20:13:21 Tower kernel: Code: 48 85 d2 74 07 48 63 47 08 48 01 d0 48 8b 17 48 85 d2 75 16 e9 9d 02 00 00 0f 1f 44 00 00 48 8b 4a 10 48 85 c9 74 17 48 89 ca <48> 39 32 77 ef 0f 83 29 02 00 00 48 8b 4a 18 48 85 c9 75 e9 48 89
    Jun 30 20:13:21 Tower kernel: RSP: 0018:ffffc90020e97d40 EFLAGS: 00010006
    Jun 30 20:13:21 Tower kernel: RAX: ffffc90020e97dc8 RBX: ffffc90020e97d70 RCX: 61ad8614c5ac1b00
    Jun 30 20:13:21 Tower kernel: RDX: 61ad8614c5ac1b00 RSI: 0000000000001516 RDI: ffffffffa177a3d8
    Jun 30 20:13:21 Tower kernel: RBP: ffff8893f2c22ff0 R08: 0000000000000001 R09: ffffffffa0588903
    Jun 30 20:13:21 Tower kernel: R10: ffff889428430a00 R11: ffff889428430a00 R12: 675f65736e6f7073
    Jun 30 20:13:21 Tower kernel: R13: ffff889428433000 R14: ffffffffa1778c20 R15: ffff889428433000
    Jun 30 20:13:21 Tower kernel: FS:  0000000000000000(0000) GS:ffff88942fd00000(0000) knlGS:0000000000000000
    Jun 30 20:13:21 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Jun 30 20:13:21 Tower kernel: CR2: 0000000000000000 CR3: 000000000200a002 CR4: 00000000003606e0
    Jun 30 20:13:21 Tower kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    Jun 30 20:13:21 Tower kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Jun 30 20:13:21 Tower kernel: Call Trace:
    Jun 30 20:13:21 Tower kernel: ? _nv036791rm+0xf1/0x1d0 [nvidia]
    Jun 30 20:13:21 Tower kernel: ? rm_free_unused_clients+0x41/0xe0 [nvidia]
    Jun 30 20:13:21 Tower kernel: ? _raw_spin_lock_irqsave+0x3a/0x66
    Jun 30 20:13:21 Tower kernel: ? nvidia_close+0xf3/0x25b [nvidia]
    Jun 30 20:13:21 Tower kernel: ? nvidia_frontend_close+0x2c/0x3e [nvidia]
    Jun 30 20:13:21 Tower kernel: ? __fput+0x107/0x1d0
    Jun 30 20:13:21 Tower kernel: ? task_work_run+0x70/0x81
    Jun 30 20:13:21 Tower kernel: ? do_exit+0x3f8/0x8f3
    Jun 30 20:13:21 Tower kernel: ? rewind_stack_do_exit+0x17/0x20
    Jun 30 20:13:21 Tower kernel: Modules linked in: iptable_nat xt_MASQUERADE nf_nat ip_tables wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 libchacha poly1305_x86_64 ip6_udp_tunnel udp_tunnel libblake2s blake2s_x86_64 libblake2s_generic bonding ixgbe mdio igb i2c_algo_bit nvidia_drm(PO) nvidia_modeset(PO) nvidia(PO) crc32_pclmul intel_rapl_perf intel_uncore aesni_intel glue_helper crypto_simd ghash_clmulni_intel cryptd kvm_intel kvm drm_kms_helper intel_cstate coretemp mxm_wmi drm crct10dif_pclmul intel_powerclamp crc32c_intel sb_edac backlight syscopyarea sysfillrect sysimgblt fb_sys_fops x86_pkg_temp_thermal agpgart ipmi_si ahci input_leds megaraid_sas libahci ipmi_ssif i2c_core led_class wmi acpi_power_meter button acpi_pad [last unloaded: mdio]
    Jun 30 20:13:21 Tower kernel: ---[ end trace 2ead729f5369cb82 ]---
    Jun 30 20:13:21 Tower kernel: RIP: 0010:_nv025250rm+0x8/0x40 [nvidia]
    Jun 30 20:13:21 Tower kernel: Code: 1f 00 41 8b 4d 08 41 39 0a 4c 89 d6 0f 82 5b fe ff ff e9 69 fe ff ff 90 66 2e 0f 1f 84 00 00 00 00 00 48 83 ec 08 48 8b 42 48 <83> 00 01 c6 42 10 01 e8 5c f4 ff ff 85 c0 74 13 89 c2 be 00 10 57
    Jun 30 20:13:21 Tower kernel: RSP: 0018:ffffc90020e97a28 EFLAGS: 00010296
    Jun 30 20:13:21 Tower kernel: RAX: 0000000000000000 RBX: 000000000000001c RCX: ffff8893eed22e48
    Jun 30 20:13:21 Tower kernel: RDX: ffff8893f6963408 RSI: ffff8893e809e008 RDI: ffff8893eed64008
    Jun 30 20:13:21 Tower kernel: RBP: ffff8893eed22e40 R08: ffffffffa0a60930 R09: ffff8893eed229ec
    Jun 30 20:13:21 Tower kernel: R10: 0000000000001516 R11: 0000000000000000 R12: ffff8893eed64008
    Jun 30 20:13:21 Tower kernel: R13: ffff8893f68d4008 R14: ffff8893eed64008 R15: ffff8893f33e0008
    Jun 30 20:13:21 Tower kernel: FS:  0000000000000000(0000) GS:ffff88942fd00000(0000) knlGS:0000000000000000
    Jun 30 20:13:21 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Jun 30 20:13:21 Tower kernel: CR2: 0000000000000000 CR3: 000000000200a002 CR4: 00000000003606e0
    Jun 30 20:13:21 Tower kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    Jun 30 20:13:21 Tower kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Jun 30 20:13:21 Tower kernel: Fixing recursive fault but reboot is needed!
    Jun 30 20:13:42 Tower rsyslogd: action 'action-3-builtin:omfwd' resumed (module 'builtin:omfwd') [v8.2002.0 try https://www.rsyslog.com/e/2359 ]

    root@Tower:~# lspci -v | grep VGA
    07:00.0 VGA compatible controller: NVIDIA Corporation GK106GL [Quadro K4000] (rev a1) (prog-if 00 [VGA controller])

     

    Wondering if anyone has any suggestions on where to start looking for an answer? I have tried my best google-fu and failed before posting here.