jwiener3

Members
  • Posts

    12
  • Joined

  • Last visited

Everything posted by jwiener3

  1. I ran into a similar issue. I have since moved to a physical server so I am not sure if those files will still work/fix it.
  2. That worked for me, upgraded to the latest beta, then replaced the bzroot files and booted it up. I was able to then pass through my nvidia card! Thank you!
  3. Thank you for doing this, I was getting in over my head trying to figure it out. I will give it a try over the weekend. For the record I am running this on a Cisco UCS C-240-M4 with 2x Xeon CPU E5-2667 v3
  4. Thanks for that info and the links! I am really getting in deep fast here and it is frustrating and fun. It looks like I will have to see if I can learn how to extract and repack microcode. I did find a thread talking about that, so I will look at that in the next few days.
  5. Good info, how did you load that version? It is likely the intel microcode include in version RC1 "intel-microcode: version 20201118"
  6. I am trying to get unraid (6.9 rc2) running as a vm on 6.7 U3 with a P400 GPU passthrough. When I add "hypervisor.cpuid.v0=FALSE" to the config file it will not boot with more than 1 cpu assigned. It appears like it is tries to load the Intel Micro Code and halts. I have tried disabling the microcode with the disable security mitigations plugin, but I still seeing in the hypervisor logs that it tries to load them. With 1 CPU it tries to as well, but doesn't seem to issue the CPU reset as listed in the logs. Below is the log from the vm with 2 or more CPUs 2021-01-11T02:51:41.855Z| vcpu-0| I125: APIC THERMLVT write: 0x10000 2021-01-11T02:51:43.182Z| vcpu-0| W115: CPU microcode update available. 2021-01-11T02:51:43.182Z| vcpu-0| W115+ The guest OS tried to update the microcode from patch level 67 (43h) to patch level 68 (44h), but VMware ESX does not allow microcode patches to be applied from within a virtual machine. 2021-01-11T02:51:43.182Z| vcpu-0| W115+ Microcode patches are used to correct CPU errata. You may be able to obtain a BIOS/firmware update which includes this microcode patch from your system vendor, or your host OS may provide a facility for loading microcode patches.CPU reset: soft (mode 0) Here is the log from 1 CPU 2021-01-11T03:35:11.496Z| vcpu-0| I125: UHCI: HCReset 2021-01-11T03:35:12.754Z| vcpu-0| W115: CPU microcode update available. 2021-01-11T03:35:12.754Z| vcpu-0| W115+ The guest OS tried to update the microcode from patch level 67 (43h) to patch level 68 (44h), but VMware ESX does not allow microcode patches to be applied from within a virtual machine. 2021-01-11T03:35:12.754Z| vcpu-0| W115+ Microcode patches are used to correct CPU errata. You may be able to obtain a BIOS/firmware update which includes this microcode patch from your system vendor, or your host OS may provide a facility for loading microcode patches.SVGA: Unregistering IOSpace at 0x1070 2021-01-11T03:35:13.182Z| vcpu-0| I125: SVGA: Unregistering MemSpace at 0xe8000000(0xe8000000) and 0xfe000000(0xfe000000) Below is a screen shot from unraid where it halts. I was able to pass the GPU through to an Ubuntu 18.04 machine with the same virtual hardware and have it working. When I first tried it I ran into the same problem and found a post recommending to purged the intel microcode, I followed those steps to get it working. Any idea of what I can do on unraid and how I can actually delete those files? Reading from the later posts on the security mitigation plugin it is unclear if that boot option is still working in the newer kernels (which might be my problem).
  7. I know this is an old thread, but for anyone that stumbles across it looking for advice. I did get this working on 6.8.3 with the PAN os 9.1.2 qcow image (I did not try any other version). See below for my setting and I added 8 NICs the first being the management NIC. It takes a while to startup and I do get some errors (see screen shot), but it works well and I get the performance I need (500mbps down and 20mbps up).
  8. Just to follow up here. Thanks to @ich777 I was able to figure out my issue, it was the pcie power. I put the card in a different machine and it worked fine, so I knew it wasn't the card. So I pulled the PSU out of another box and powered the card with that PSU while in my main server and it works! So I just ordered what I needed to make it a permanent fix. Thank you for those that helped me.
  9. I tried this with a new blank config and saw the same issues and I also tried ich777 - Nvidia/DVB Kernel Helper/Builder Docker with the same results. So there must be something with my hardware and the nvidia driver, I just do not know where to begin on how to troubleshoot it. Any advice would be appreciated.
  10. Thank you, I did recover by going back to the standard files. I tried again with the NVIDIA drivers and had the same results. Put it back to the standard Unraid version again and then tried a 3rd time with the latest NVIDIA 6.9.0(22) with the same results. I have reverted back to 6.8.3 standard, but does anyone have any suggestions on getting this to work so I can use my NVIDIA card? I am hoping to use it for my plex docker for transcoding.
  11. Yes sorry, typo there. I AM able to login with ssh, not the gui.
  12. I tried installing this yesterday and after the reboot, my OS does not start all the way. I am unable able to log in with ssh, but not through the GUI. I see these logs in syslog and if I reboot per the message it is the same. It does start all the way in safe mode. Jun 30 20:13:20 Tower root: plugin: installing: /boot/config/plugins/Unraid-Nvidia.plg Jun 30 20:13:20 Tower root: plugin: running: anonymous Jun 30 20:13:20 Tower root: Jun 30 20:13:20 Tower root: Jun 30 20:13:20 Tower root: Jun 30 20:13:20 Tower root: Jun 30 20:13:20 Tower root: plugin: running: anonymous Jun 30 20:13:20 Tower root: plugin: skipping: /boot/config/plugins/Unraid-Nvidia/Unraid-Nvidia-2019.06.23.txz already exists Jun 30 20:13:20 Tower root: plugin: running: /boot/config/plugins/Unraid-Nvidia/Unraid-Nvidia-2019.06.23.txz Jun 30 20:13:20 Tower root: Jun 30 20:13:20 Tower root: +============================================================================== Jun 30 20:13:20 Tower root: | Installing new package /boot/config/plugins/Unraid-Nvidia/Unraid-Nvidia-2019.06.23.txz Jun 30 20:13:20 Tower root: +============================================================================== Jun 30 20:13:20 Tower root: Jun 30 20:13:20 Tower root: Verifying package Unraid-Nvidia-2019.06.23.txz. Jun 30 20:13:20 Tower root: Installing package Unraid-Nvidia-2019.06.23.txz: Jun 30 20:13:20 Tower root: PACKAGE DESCRIPTION: Jun 30 20:13:20 Tower root: Package Unraid-Nvidia-2019.06.23.txz installed. Jun 30 20:13:20 Tower root: plugin: running: anonymous Jun 30 20:13:21 Tower kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 Jun 30 20:13:21 Tower kernel: #PF: supervisor write access in kernel mode Jun 30 20:13:21 Tower kernel: #PF: error_code(0x0002) - not-present page Jun 30 20:13:21 Tower kernel: PGD 80000013f5291067 P4D 80000013f5291067 PUD 1426ff2067 PMD 0 Jun 30 20:13:21 Tower kernel: Oops: 0002 [#1] SMP PTI Jun 30 20:13:21 Tower kernel: CPU: 20 PID: 5398 Comm: nvidia-smi Tainted: P O 5.7.2-Unraid #1 Jun 30 20:13:21 Tower kernel: Hardware name: Cisco Systems Inc UCSC-C220-M4S/UCSC-C220-M4S, BIOS C220M4.3.0.4g.0.1113190807 11/13/2019 Jun 30 20:13:21 Tower kernel: RIP: 0010:_nv025250rm+0x8/0x40 [nvidia] Jun 30 20:13:21 Tower kernel: Code: 1f 00 41 8b 4d 08 41 39 0a 4c 89 d6 0f 82 5b fe ff ff e9 69 fe ff ff 90 66 2e 0f 1f 84 00 00 00 00 00 48 83 ec 08 48 8b 42 48 <83> 00 01 c6 42 10 01 e8 5c f4 ff ff 85 c0 74 13 89 c2 be 00 10 57 Jun 30 20:13:21 Tower kernel: RSP: 0018:ffffc90020e97a28 EFLAGS: 00010296 Jun 30 20:13:21 Tower kernel: RAX: 0000000000000000 RBX: 000000000000001c RCX: ffff8893eed22e48 Jun 30 20:13:21 Tower kernel: RDX: ffff8893f6963408 RSI: ffff8893e809e008 RDI: ffff8893eed64008 Jun 30 20:13:21 Tower kernel: RBP: ffff8893eed22e40 R08: ffffffffa0a60930 R09: ffff8893eed229ec Jun 30 20:13:21 Tower kernel: R10: 0000000000001516 R11: 0000000000000000 R12: ffff8893eed64008 Jun 30 20:13:21 Tower kernel: R13: ffff8893f68d4008 R14: ffff8893eed64008 R15: ffff8893f33e0008 Jun 30 20:13:21 Tower kernel: FS: 000014bad9eb3b80(0000) GS:ffff88942fd00000(0000) knlGS:0000000000000000 Jun 30 20:13:21 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 30 20:13:21 Tower kernel: CR2: 0000000000000000 CR3: 00000013f536c001 CR4: 00000000003606e0 Jun 30 20:13:21 Tower kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jun 30 20:13:21 Tower kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Jun 30 20:13:21 Tower kernel: Call Trace: Jun 30 20:13:21 Tower kernel: ? _nv025251rm+0x260/0x260 [nvidia] Jun 30 20:13:21 Tower kernel: ? _nv031459rm+0x7a/0xb0 [nvidia] Jun 30 20:13:21 Tower kernel: ? _nv031799rm+0x6ec/0x2440 [nvidia] Jun 30 20:13:21 Tower kernel: ? _nv021294rm+0xbb/0x1a0 [nvidia] Jun 30 20:13:21 Tower kernel: ? _nv021542rm+0x27/0x50 [nvidia] Jun 30 20:13:21 Tower kernel: ? _raw_spin_unlock_irqrestore+0xd/0xe Jun 30 20:13:21 Tower kernel: ? _nv000901rm+0x1200/0x1cc0 [nvidia] Jun 30 20:13:21 Tower kernel: ? rm_init_adapter+0xd5/0xe0 [nvidia] Jun 30 20:13:21 Tower kernel: ? nv_open_device+0x434/0x648 [nvidia] Jun 30 20:13:21 Tower kernel: ? nvidia_open+0x2a1/0x41a [nvidia] Jun 30 20:13:21 Tower kernel: ? nvidia_frontend_open+0x62/0x8d [nvidia] Jun 30 20:13:21 Tower kernel: ? chrdev_open+0x150/0x187 Jun 30 20:13:21 Tower kernel: ? cdev_put+0x19/0x19 Jun 30 20:13:21 Tower kernel: ? do_dentry_open+0x181/0x296 Jun 30 20:13:21 Tower kernel: ? path_openat+0x85a/0x933 Jun 30 20:13:21 Tower kernel: ? do_filp_open+0x4c/0xa9 Jun 30 20:13:21 Tower kernel: ? up_write+0x17/0x24 Jun 30 20:13:21 Tower kernel: ? chown_common.isra.0+0xec/0x14d Jun 30 20:13:21 Tower kernel: ? _cond_resched+0x1b/0x1e Jun 30 20:13:21 Tower kernel: ? slab_pre_alloc_hook+0x2c/0x53 Jun 30 20:13:21 Tower kernel: ? do_sys_openat2+0x6d/0xd9 Jun 30 20:13:21 Tower kernel: ? do_sys_open+0x35/0x4f Jun 30 20:13:21 Tower kernel: ? do_syscall_64+0x7a/0x87 Jun 30 20:13:21 Tower kernel: ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jun 30 20:13:21 Tower kernel: Modules linked in: iptable_nat xt_MASQUERADE nf_nat ip_tables wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 libchacha poly1305_x86_64 ip6_udp_tunnel udp_tunnel libblake2s blake2s_x86_64 libblake2s_generic bonding ixgbe mdio igb i2c_algo_bit nvidia_drm(PO) nvidia_modeset(PO) nvidia(PO) crc32_pclmul intel_rapl_perf intel_uncore aesni_intel glue_helper crypto_simd ghash_clmulni_intel cryptd kvm_intel kvm drm_kms_helper intel_cstate coretemp mxm_wmi drm crct10dif_pclmul intel_powerclamp crc32c_intel sb_edac backlight syscopyarea sysfillrect sysimgblt fb_sys_fops x86_pkg_temp_thermal agpgart ipmi_si ahci input_leds megaraid_sas libahci ipmi_ssif i2c_core led_class wmi acpi_power_meter button acpi_pad [last unloaded: mdio] Jun 30 20:13:21 Tower kernel: CR2: 0000000000000000 Jun 30 20:13:21 Tower kernel: ---[ end trace 2ead729f5369cb81 ]--- Jun 30 20:13:21 Tower kernel: RIP: 0010:_nv025250rm+0x8/0x40 [nvidia] Jun 30 20:13:21 Tower kernel: Code: 1f 00 41 8b 4d 08 41 39 0a 4c 89 d6 0f 82 5b fe ff ff e9 69 fe ff ff 90 66 2e 0f 1f 84 00 00 00 00 00 48 83 ec 08 48 8b 42 48 <83> 00 01 c6 42 10 01 e8 5c f4 ff ff 85 c0 74 13 89 c2 be 00 10 57 Jun 30 20:13:21 Tower kernel: RSP: 0018:ffffc90020e97a28 EFLAGS: 00010296 Jun 30 20:13:21 Tower kernel: RAX: 0000000000000000 RBX: 000000000000001c RCX: ffff8893eed22e48 Jun 30 20:13:21 Tower kernel: RDX: ffff8893f6963408 RSI: ffff8893e809e008 RDI: ffff8893eed64008 Jun 30 20:13:21 Tower kernel: RBP: ffff8893eed22e40 R08: ffffffffa0a60930 R09: ffff8893eed229ec Jun 30 20:13:21 Tower kernel: R10: 0000000000001516 R11: 0000000000000000 R12: ffff8893eed64008 Jun 30 20:13:21 Tower kernel: R13: ffff8893f68d4008 R14: ffff8893eed64008 R15: ffff8893f33e0008 Jun 30 20:13:21 Tower kernel: FS: 000014bad9eb3b80(0000) GS:ffff88942fd00000(0000) knlGS:0000000000000000 Jun 30 20:13:21 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 30 20:13:21 Tower kernel: CR2: 0000000000000000 CR3: 00000013f536c001 CR4: 00000000003606e0 Jun 30 20:13:21 Tower kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jun 30 20:13:21 Tower kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Jun 30 20:13:21 Tower kernel: general protection fault, probably for non-canonical address 0x61ad8614c5ac1b00: 0000 [#2] SMP PTI Jun 30 20:13:21 Tower kernel: CPU: 20 PID: 5398 Comm: nvidia-smi Tainted: P D O 5.7.2-Unraid #1 Jun 30 20:13:21 Tower kernel: Hardware name: Cisco Systems Inc UCSC-C220-M4S/UCSC-C220-M4S, BIOS C220M4.3.0.4g.0.1113190807 11/13/2019 Jun 30 20:13:21 Tower kernel: RIP: 0010:_nv007414rm+0x2c/0x330 [nvidia] Jun 30 20:13:21 Tower kernel: Code: 48 85 d2 74 07 48 63 47 08 48 01 d0 48 8b 17 48 85 d2 75 16 e9 9d 02 00 00 0f 1f 44 00 00 48 8b 4a 10 48 85 c9 74 17 48 89 ca <48> 39 32 77 ef 0f 83 29 02 00 00 48 8b 4a 18 48 85 c9 75 e9 48 89 Jun 30 20:13:21 Tower kernel: RSP: 0018:ffffc90020e97d40 EFLAGS: 00010006 Jun 30 20:13:21 Tower kernel: RAX: ffffc90020e97dc8 RBX: ffffc90020e97d70 RCX: 61ad8614c5ac1b00 Jun 30 20:13:21 Tower kernel: RDX: 61ad8614c5ac1b00 RSI: 0000000000001516 RDI: ffffffffa177a3d8 Jun 30 20:13:21 Tower kernel: RBP: ffff8893f2c22ff0 R08: 0000000000000001 R09: ffffffffa0588903 Jun 30 20:13:21 Tower kernel: R10: ffff889428430a00 R11: ffff889428430a00 R12: 675f65736e6f7073 Jun 30 20:13:21 Tower kernel: R13: ffff889428433000 R14: ffffffffa1778c20 R15: ffff889428433000 Jun 30 20:13:21 Tower kernel: FS: 0000000000000000(0000) GS:ffff88942fd00000(0000) knlGS:0000000000000000 Jun 30 20:13:21 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 30 20:13:21 Tower kernel: CR2: 0000000000000000 CR3: 000000000200a002 CR4: 00000000003606e0 Jun 30 20:13:21 Tower kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jun 30 20:13:21 Tower kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Jun 30 20:13:21 Tower kernel: Call Trace: Jun 30 20:13:21 Tower kernel: ? _nv036791rm+0xf1/0x1d0 [nvidia] Jun 30 20:13:21 Tower kernel: ? rm_free_unused_clients+0x41/0xe0 [nvidia] Jun 30 20:13:21 Tower kernel: ? _raw_spin_lock_irqsave+0x3a/0x66 Jun 30 20:13:21 Tower kernel: ? nvidia_close+0xf3/0x25b [nvidia] Jun 30 20:13:21 Tower kernel: ? nvidia_frontend_close+0x2c/0x3e [nvidia] Jun 30 20:13:21 Tower kernel: ? __fput+0x107/0x1d0 Jun 30 20:13:21 Tower kernel: ? task_work_run+0x70/0x81 Jun 30 20:13:21 Tower kernel: ? do_exit+0x3f8/0x8f3 Jun 30 20:13:21 Tower kernel: ? rewind_stack_do_exit+0x17/0x20 Jun 30 20:13:21 Tower kernel: Modules linked in: iptable_nat xt_MASQUERADE nf_nat ip_tables wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 libchacha poly1305_x86_64 ip6_udp_tunnel udp_tunnel libblake2s blake2s_x86_64 libblake2s_generic bonding ixgbe mdio igb i2c_algo_bit nvidia_drm(PO) nvidia_modeset(PO) nvidia(PO) crc32_pclmul intel_rapl_perf intel_uncore aesni_intel glue_helper crypto_simd ghash_clmulni_intel cryptd kvm_intel kvm drm_kms_helper intel_cstate coretemp mxm_wmi drm crct10dif_pclmul intel_powerclamp crc32c_intel sb_edac backlight syscopyarea sysfillrect sysimgblt fb_sys_fops x86_pkg_temp_thermal agpgart ipmi_si ahci input_leds megaraid_sas libahci ipmi_ssif i2c_core led_class wmi acpi_power_meter button acpi_pad [last unloaded: mdio] Jun 30 20:13:21 Tower kernel: ---[ end trace 2ead729f5369cb82 ]--- Jun 30 20:13:21 Tower kernel: RIP: 0010:_nv025250rm+0x8/0x40 [nvidia] Jun 30 20:13:21 Tower kernel: Code: 1f 00 41 8b 4d 08 41 39 0a 4c 89 d6 0f 82 5b fe ff ff e9 69 fe ff ff 90 66 2e 0f 1f 84 00 00 00 00 00 48 83 ec 08 48 8b 42 48 <83> 00 01 c6 42 10 01 e8 5c f4 ff ff 85 c0 74 13 89 c2 be 00 10 57 Jun 30 20:13:21 Tower kernel: RSP: 0018:ffffc90020e97a28 EFLAGS: 00010296 Jun 30 20:13:21 Tower kernel: RAX: 0000000000000000 RBX: 000000000000001c RCX: ffff8893eed22e48 Jun 30 20:13:21 Tower kernel: RDX: ffff8893f6963408 RSI: ffff8893e809e008 RDI: ffff8893eed64008 Jun 30 20:13:21 Tower kernel: RBP: ffff8893eed22e40 R08: ffffffffa0a60930 R09: ffff8893eed229ec Jun 30 20:13:21 Tower kernel: R10: 0000000000001516 R11: 0000000000000000 R12: ffff8893eed64008 Jun 30 20:13:21 Tower kernel: R13: ffff8893f68d4008 R14: ffff8893eed64008 R15: ffff8893f33e0008 Jun 30 20:13:21 Tower kernel: FS: 0000000000000000(0000) GS:ffff88942fd00000(0000) knlGS:0000000000000000 Jun 30 20:13:21 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 30 20:13:21 Tower kernel: CR2: 0000000000000000 CR3: 000000000200a002 CR4: 00000000003606e0 Jun 30 20:13:21 Tower kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jun 30 20:13:21 Tower kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Jun 30 20:13:21 Tower kernel: Fixing recursive fault but reboot is needed! Jun 30 20:13:42 Tower rsyslogd: action 'action-3-builtin:omfwd' resumed (module 'builtin:omfwd') [v8.2002.0 try https://www.rsyslog.com/e/2359 ] root@Tower:~# lspci -v | grep VGA 07:00.0 VGA compatible controller: NVIDIA Corporation GK106GL [Quadro K4000] (rev a1) (prog-if 00 [VGA controller]) Wondering if anyone has any suggestions on where to start looking for an answer? I have tried my best google-fu and failed before posting here.