cardo Posted April 25, 2021 Posted April 25, 2021 Hi All, I have a bit of a quandary. I have two GPU's in my unRAID server, and previously I was using the primary GPU an RTX 2060 to pass through to a gaming VM and I was using a second 1660 for Plex HW transcoding. I found I've not been using the 2060 since I've been mostly gaming on my laptop and I decided to use idle cycles to mine eth and I was hoping to utilize both GPU's for this. What I found is that unless I keep my primary GPU bound to VFIO under Tools > System Devices, the server hangs as soon as I try to mount the array, I see the following in the syslog every 2-3 minutes: Apr 24 23:40:07 MojoRyzen kernel: rcu: INFO: rcu_sched self-detected stall on CPU Apr 24 23:40:07 MojoRyzen kernel: rcu: 14-....: (960010 ticks this GP) idle=c8e/1/0x4000000000000000 softirq=5044/5044 fqs=227707 Apr 24 23:40:07 MojoRyzen kernel: (t=960016 jiffies g=13785 q=5495899) Apr 24 23:40:07 MojoRyzen kernel: Sending NMI from CPU 14 to CPUs 4: Apr 24 23:40:07 MojoRyzen kernel: NMI backtrace for cpu 4 Apr 24 23:40:07 MojoRyzen kernel: CPU: 4 PID: 5730 Comm: udevd Tainted: P D O 5.10.28-Unraid #1 Apr 24 23:40:07 MojoRyzen kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Taichi, BIOS P3.00 04/07/2020 Apr 24 23:40:07 MojoRyzen kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x134/0x18a Apr 24 23:40:07 MojoRyzen kernel: Code: d3 c1 ee 12 83 e0 03 ff ce 48 c1 e0 05 48 63 f6 48 05 00 30 02 00 48 03 04 f5 00 99 df 81 48 89 10 8b 42 08 85 c0 75 04 f3 90 <eb> f5 48 8b 32 48 85 f6 74 03 0f 0d 0e 8b 07 66 85 c0 74 04 f3 90 Apr 24 23:40:07 MojoRyzen kernel: RSP: 0018:ffffc90000cafb80 EFLAGS: 00000046 Apr 24 23:40:07 MojoRyzen kernel: RAX: 0000000000000000 RBX: ffff888100cb30a0 RCX: 0000000000140000 Apr 24 23:40:07 MojoRyzen kernel: RDX: ffff888ffe923000 RSI: 000000000000000d RDI: ffff888100cb30a0 Apr 24 23:40:07 MojoRyzen kernel: RBP: ffff888100cb3080 R08: 0000000000000000 R09: 0000000000000000 Apr 24 23:40:07 MojoRyzen kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000046 Apr 24 23:40:07 MojoRyzen kernel: R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000 Apr 24 23:40:07 MojoRyzen kernel: FS: 000014dfca02fbc0(0000) GS:ffff888ffe900000(0000) knlGS:0000000000000000 Apr 24 23:40:07 MojoRyzen kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Apr 24 23:40:07 MojoRyzen kernel: CR2: 0000153564b375a0 CR3: 000000010148e000 CR4: 0000000000350ee0 Apr 24 23:40:07 MojoRyzen kernel: Call Trace: Apr 24 23:40:07 MojoRyzen kernel: queued_spin_lock_slowpath+0x7/0xa Apr 24 23:40:07 MojoRyzen kernel: _raw_spin_lock_irqsave+0x23/0x29 Apr 24 23:40:07 MojoRyzen kernel: __wake_up_common_lock+0x5d/0xaf Apr 24 23:40:07 MojoRyzen kernel: ep_poll_callback+0x119/0x1a0 Apr 24 23:40:07 MojoRyzen kernel: __wake_up_common+0xa7/0x12f Apr 24 23:40:07 MojoRyzen kernel: __wake_up_common_lock+0x77/0xaf Apr 24 23:40:07 MojoRyzen kernel: sock_def_readable+0x29/0x41 Apr 24 23:40:07 MojoRyzen kernel: unix_dgram_sendmsg+0x4a3/0x51f Apr 24 23:40:07 MojoRyzen kernel: sock_sendmsg_nosec+0x32/0x3c Apr 24 23:40:07 MojoRyzen kernel: sock_write_iter+0x84/0xaf Apr 24 23:40:07 MojoRyzen kernel: new_sync_write+0x7a/0xb2 Apr 24 23:40:07 MojoRyzen kernel: vfs_write+0xd7/0x121 Apr 24 23:40:07 MojoRyzen kernel: ksys_write+0x71/0xba Apr 24 23:40:07 MojoRyzen kernel: do_syscall_64+0x5d/0x6a Apr 24 23:40:07 MojoRyzen kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Apr 24 23:40:07 MojoRyzen kernel: RIP: 0033:0x14dfca680833 Apr 24 23:40:07 MojoRyzen kernel: Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18 Apr 24 23:40:07 MojoRyzen kernel: RSP: 002b:00007ffc38face68 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 Apr 24 23:40:07 MojoRyzen kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000014dfca680833 Apr 24 23:40:07 MojoRyzen kernel: RDX: 0000000000000000 RSI: 00007ffc38facf90 RDI: 0000000000000008 Apr 24 23:40:07 MojoRyzen kernel: RBP: 00007ffc38facf90 R08: 000000000064704a R09: 0000000000000006 Apr 24 23:40:07 MojoRyzen kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008 Apr 24 23:40:07 MojoRyzen kernel: R13: 0000000000000000 R14: 000014dfca02fb08 R15: ffffffffffffffff Apr 24 23:40:07 MojoRyzen kernel: Sending NMI from CPU 14 to CPUs 13: Apr 24 23:40:07 MojoRyzen kernel: NMI backtrace for cpu 13 Apr 24 23:40:07 MojoRyzen kernel: CPU: 13 PID: 5731 Comm: udevd Tainted: P D O 5.10.28-Unraid #1 Apr 24 23:40:07 MojoRyzen kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Taichi, BIOS P3.00 04/07/2020 Apr 24 23:40:07 MojoRyzen kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x134/0x18a Apr 24 23:40:07 MojoRyzen kernel: Code: d3 c1 ee 12 83 e0 03 ff ce 48 c1 e0 05 48 63 f6 48 05 00 30 02 00 48 03 04 f5 00 99 df 81 48 89 10 8b 42 08 85 c0 75 04 f3 90 <eb> f5 48 8b 32 48 85 f6 74 03 0f 0d 0e 8b 07 66 85 c0 74 04 f3 90 Apr 24 23:40:07 MojoRyzen kernel: RSP: 0018:ffffc90000cb7d00 EFLAGS: 00000046 Apr 24 23:40:07 MojoRyzen kernel: RAX: 0000000000000000 RBX: ffff888100cb30a0 RCX: 0000000000380000 Apr 24 23:40:07 MojoRyzen kernel: RDX: ffff888ffeb63000 RSI: 0000000000000066 RDI: ffff888100cb30a0 Apr 24 23:40:07 MojoRyzen kernel: RBP: ffff888100cb3080 R08: 0000000000000000 R09: 0000000000000000 Apr 24 23:40:07 MojoRyzen kernel: R10: 0000000000000000 R11: ffff888102ffcc50 R12: 0000000000000046 Apr 24 23:40:07 MojoRyzen kernel: R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000 Apr 24 23:40:07 MojoRyzen kernel: FS: 000014dfca02fbc0(0000) GS:ffff888ffeb40000(0000) knlGS:0000000000000000 Apr 24 23:40:07 MojoRyzen kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Apr 24 23:40:07 MojoRyzen kernel: CR2: 000014dfca7efed4 CR3: 000000016bb20000 CR4: 0000000000350ee0 Apr 24 23:40:07 MojoRyzen kernel: Call Trace: Apr 24 23:40:07 MojoRyzen kernel: queued_spin_lock_slowpath+0x7/0xa Apr 24 23:40:07 MojoRyzen kernel: _raw_spin_lock_irqsave+0x23/0x29 Apr 24 23:40:07 MojoRyzen kernel: __wake_up_common_lock+0x5d/0xaf Apr 24 23:40:07 MojoRyzen kernel: ep_poll_callback+0x119/0x1a0 Apr 24 23:40:07 MojoRyzen kernel: __wake_up_common+0xa7/0x12f Apr 24 23:40:07 MojoRyzen kernel: __wake_up_common_lock+0x77/0xaf Apr 24 23:40:07 MojoRyzen kernel: fsnotify_add_event+0xc2/0xe4 Apr 24 23:40:07 MojoRyzen kernel: inotify_handle_inode_event+0xd7/0x10d Apr 24 23:40:07 MojoRyzen kernel: inotify_ignored_and_remove_idr+0x1c/0x3a Apr 24 23:40:07 MojoRyzen kernel: __do_sys_inotify_rm_watch+0x74/0x98 Apr 24 23:40:07 MojoRyzen kernel: do_syscall_64+0x5d/0x6a Apr 24 23:40:07 MojoRyzen kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Apr 24 23:40:07 MojoRyzen kernel: RIP: 0033:0x14dfca691157 Apr 24 23:40:07 MojoRyzen kernel: Code: 73 01 c3 48 8b 0d 39 7d 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 ff 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 09 7d 0c 00 f7 d8 64 89 01 48 Apr 24 23:40:07 MojoRyzen kernel: RSP: 002b:00007ffc38faca18 EFLAGS: 00000293 ORIG_RAX: 00000000000000ff Apr 24 23:40:07 MojoRyzen kernel: RAX: ffffffffffffffda RBX: 000000000064fef0 RCX: 000014dfca691157 Apr 24 23:40:07 MojoRyzen kernel: RDX: 0000000000000007 RSI: 0000000000000058 RDI: 0000000000000005 Apr 24 23:40:07 MojoRyzen kernel: RBP: 0000000000000058 R08: 000000000064704a R09: 0000000000000006 Apr 24 23:40:07 MojoRyzen kernel: R10: 0000000000647010 R11: 0000000000000293 R12: 00007ffc38faca20 Apr 24 23:40:07 MojoRyzen kernel: R13: 0000000003938700 R14: 0000000000645700 R15: 0000000000645740 Apr 24 23:40:07 MojoRyzen kernel: NMI backtrace for cpu 14 Apr 24 23:40:07 MojoRyzen kernel: CPU: 14 PID: 8849 Comm: mount Tainted: P D O 5.10.28-Unraid #1 Apr 24 23:40:07 MojoRyzen kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Taichi, BIOS P3.00 04/07/2020 Apr 24 23:40:07 MojoRyzen kernel: Call Trace: Apr 24 23:40:07 MojoRyzen kernel: <IRQ> Apr 24 23:40:07 MojoRyzen kernel: dump_stack+0x6b/0x83 Apr 24 23:40:07 MojoRyzen kernel: ? lapic_can_unplug_cpu+0x8e/0x8e Apr 24 23:40:07 MojoRyzen kernel: nmi_cpu_backtrace+0x7d/0x8f Apr 24 23:40:07 MojoRyzen kernel: nmi_trigger_cpumask_backtrace+0x56/0xd3 Apr 24 23:40:07 MojoRyzen kernel: rcu_dump_cpu_stacks+0x9f/0xc6 Apr 24 23:40:07 MojoRyzen kernel: rcu_sched_clock_irq+0x1ec/0x543 Apr 24 23:40:07 MojoRyzen kernel: ? trigger_load_balance+0x5a/0x1ca Apr 24 23:40:07 MojoRyzen kernel: update_process_times+0x50/0x6e Apr 24 23:40:07 MojoRyzen kernel: tick_sched_timer+0x36/0x64 Apr 24 23:40:07 MojoRyzen kernel: __hrtimer_run_queues+0xb7/0x10b Apr 24 23:40:07 MojoRyzen kernel: ? tick_sched_do_timer+0x39/0x39 Apr 24 23:40:07 MojoRyzen kernel: hrtimer_interrupt+0x8d/0x15b Apr 24 23:40:07 MojoRyzen kernel: __sysvec_apic_timer_interrupt+0x5d/0x68 Apr 24 23:40:07 MojoRyzen kernel: asm_call_irq_on_stack+0x12/0x20 Apr 24 23:40:07 MojoRyzen kernel: </IRQ> Apr 24 23:40:07 MojoRyzen kernel: sysvec_apic_timer_interrupt+0x71/0x95 Apr 24 23:40:07 MojoRyzen kernel: asm_sysvec_apic_timer_interrupt+0x12/0x20 Apr 24 23:40:07 MojoRyzen kernel: RIP: 0010:smp_call_function_many_cond+0x272/0x285 Apr 24 23:40:07 MojoRyzen kernel: Code: 4c 89 fe e8 b3 e2 28 00 3b 05 e2 9e 06 01 89 c7 73 1c 48 63 c7 49 8b 14 24 48 03 14 c5 00 99 df 81 8b 42 08 a8 01 74 04 f3 90 <eb> f5 eb d2 48 83 c4 30 5b 5d 41 5c 41 5d 41 5e 41 5f c3 0f b6 c9 Apr 24 23:40:07 MojoRyzen kernel: RSP: 0018:ffffc9000159fb48 EFLAGS: 00000202 Apr 24 23:40:07 MojoRyzen kernel: RAX: 0000000000000011 RBX: 0000000000000001 RCX: 000000000000000d Apr 24 23:40:07 MojoRyzen kernel: RDX: ffff888ffeb67c00 RSI: 0000000000000000 RDI: 000000000000000d Apr 24 23:40:07 MojoRyzen kernel: RBP: 0000000000000100 R08: 0000000000000000 R09: 000000000000000d Apr 24 23:40:07 MojoRyzen kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff888ffeba33c0 Apr 24 23:40:07 MojoRyzen kernel: R13: 0000000000000000 R14: ffffffff81180ce4 R15: ffff888ffeba33c8 Apr 24 23:40:07 MojoRyzen kernel: ? put_bh+0x5/0x5 Apr 24 23:40:07 MojoRyzen kernel: ? smp_call_function_many_cond+0x250/0x285 Apr 24 23:40:07 MojoRyzen kernel: ? brelse+0x8/0x8 Apr 24 23:40:07 MojoRyzen kernel: ? brelse+0x8/0x8 Apr 24 23:40:07 MojoRyzen kernel: ? put_bh+0x5/0x5 Apr 24 23:40:07 MojoRyzen kernel: on_each_cpu_cond_mask+0x2a/0x6e Apr 24 23:40:07 MojoRyzen kernel: invalidate_bdev+0x15/0x44 Apr 24 23:40:07 MojoRyzen kernel: btrfs_get_bdev_and_sb+0x60/0x94 Apr 24 23:40:07 MojoRyzen kernel: open_fs_devices+0x89/0x252 Apr 24 23:40:07 MojoRyzen kernel: btrfs_mount_root+0x1ac/0x3f1 Apr 24 23:40:07 MojoRyzen kernel: ? _cond_resched+0x1b/0x1e Apr 24 23:40:07 MojoRyzen kernel: legacy_get_tree+0x22/0x3b Apr 24 23:40:07 MojoRyzen kernel: vfs_get_tree+0x19/0x86 Apr 24 23:40:07 MojoRyzen kernel: fc_mount+0x9/0x2b Apr 24 23:40:07 MojoRyzen kernel: vfs_kern_mount.part.0+0x3d/0x7d Apr 24 23:40:07 MojoRyzen kernel: btrfs_mount+0x141/0x406 Apr 24 23:40:07 MojoRyzen kernel: ? vfs_parse_fs_string+0x55/0x9c Apr 24 23:40:07 MojoRyzen kernel: ? legacy_parse_param+0x23/0x200 Apr 24 23:40:07 MojoRyzen kernel: ? legacy_get_tree+0x22/0x3b Apr 24 23:40:07 MojoRyzen kernel: legacy_get_tree+0x22/0x3b Apr 24 23:40:07 MojoRyzen kernel: vfs_get_tree+0x19/0x86 Apr 24 23:40:07 MojoRyzen kernel: path_mount+0x674/0x750 Apr 24 23:40:07 MojoRyzen kernel: do_mount+0x57/0x84 Apr 24 23:40:07 MojoRyzen kernel: __do_sys_mount+0xfa/0x122 Apr 24 23:40:07 MojoRyzen kernel: do_syscall_64+0x5d/0x6a Apr 24 23:40:07 MojoRyzen kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Apr 24 23:40:07 MojoRyzen kernel: RIP: 0033:0x145f4263d1ba Apr 24 23:40:07 MojoRyzen kernel: Code: 48 8b 0d d9 7c 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a6 7c 0c 00 f7 d8 64 89 01 48 Apr 24 23:40:07 MojoRyzen kernel: RSP: 002b:00007ffd25cd42b8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5 Apr 24 23:40:07 MojoRyzen kernel: RAX: ffffffffffffffda RBX: 0000145f427bffa4 RCX: 0000145f4263d1ba Apr 24 23:40:07 MojoRyzen kernel: RDX: 0000000000432be0 RSI: 000000000040e540 RDI: 0000000000436720 Apr 24 23:40:07 MojoRyzen kernel: RBP: 000000000040e2f0 R08: 000000000040e580 R09: 0000000000000002 Apr 24 23:40:07 MojoRyzen kernel: R10: 0000000000000400 R11: 0000000000000206 R12: 0000000000000000 Apr 24 23:40:07 MojoRyzen kernel: R13: 0000000000436720 R14: 0000000000432be0 R15: 000000000040e2f0 I have a Ryzen 3900X and an ASRock Taichi x570 motherboard. I do I have a cable connected to the GPU to a display, but I don't have it turned on. I am clearly doing something wrong here, but I can't figure out why it isn't working. I also notice that several of the cores are pegged and it requires a hard reboot to get the system back. Any way for me to boot without binding the primary GPU? Thanks in advance for any ideas! Quote
uek2wooF Posted April 27, 2021 Posted April 27, 2021 I have a taichi x570 and just one GPU. In order to pass it through to a VM I have to: Bind it to vfio in tools -> system devices also I have to do this (echo 0 > /sys/class/vtconsole/vtcon0/bind) 2>/dev/null (echo 0 > /sys/class/vtconsole/vtcon1/bind) 2>/dev/null (echo efi-framebuffer.0 > /sys/bus/platform/drivers/efi-framebuffer/unbind) 2>/dev/null I have that in a userscript that runs at array start. I don't know if that works with 2 GPUs. You can run it from commandline too before the VM starts. If you are trying to prevent unraid from using your GPUs for something so you can pass them through try that. (You will no longer have a console.) Quote
cardo Posted April 28, 2021 Author Posted April 28, 2021 Thanks for the reply. I was actually hoping to not pass the card through and use it in my mining container, but I can only mount the array with it bound in VFIO. What I did instead is just set up a Windows 10 VM and am using mining software there and it is working well, so I'm happy with it for now. Quote
uek2wooF Posted April 28, 2021 Posted April 28, 2021 Oh you mean in docker, not a vm. Did you try the nvida driver plugin discussed here Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.