6.9.2: Array Won't Start Unless GPU is Bound in Tools > System Devices


Recommended Posts

Hi All,

 

I have a bit of a quandary.  I have two GPU's in my unRAID server, and previously I was using the primary GPU an RTX 2060 to pass through to a gaming VM and I was using a second 1660 for Plex HW transcoding.

 

I found I've not been using the 2060 since I've been mostly gaming on my laptop and I decided to use idle cycles to mine eth and I was hoping to utilize both GPU's for this.  What I found is that unless I keep my primary GPU bound to VFIO under Tools > System Devices, the server hangs as soon as I try to mount the array,

 

I see the following in the syslog every 2-3 minutes:

 

Apr 24 23:40:07 MojoRyzen kernel: rcu: INFO: rcu_sched self-detected stall on CPU
Apr 24 23:40:07 MojoRyzen kernel: rcu: 	14-....: (960010 ticks this GP) idle=c8e/1/0x4000000000000000 softirq=5044/5044 fqs=227707 
Apr 24 23:40:07 MojoRyzen kernel: 	(t=960016 jiffies g=13785 q=5495899)
Apr 24 23:40:07 MojoRyzen kernel: Sending NMI from CPU 14 to CPUs 4:
Apr 24 23:40:07 MojoRyzen kernel: NMI backtrace for cpu 4
Apr 24 23:40:07 MojoRyzen kernel: CPU: 4 PID: 5730 Comm: udevd Tainted: P      D    O      5.10.28-Unraid #1
Apr 24 23:40:07 MojoRyzen kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Taichi, BIOS P3.00 04/07/2020
Apr 24 23:40:07 MojoRyzen kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x134/0x18a
Apr 24 23:40:07 MojoRyzen kernel: Code: d3 c1 ee 12 83 e0 03 ff ce 48 c1 e0 05 48 63 f6 48 05 00 30 02 00 48 03 04 f5 00 99 df 81 48 89 10 8b 42 08 85 c0 75 04 f3 90 <eb> f5 48 8b 32 48 85 f6 74 03 0f 0d 0e 8b 07 66 85 c0 74 04 f3 90
Apr 24 23:40:07 MojoRyzen kernel: RSP: 0018:ffffc90000cafb80 EFLAGS: 00000046
Apr 24 23:40:07 MojoRyzen kernel: RAX: 0000000000000000 RBX: ffff888100cb30a0 RCX: 0000000000140000
Apr 24 23:40:07 MojoRyzen kernel: RDX: ffff888ffe923000 RSI: 000000000000000d RDI: ffff888100cb30a0
Apr 24 23:40:07 MojoRyzen kernel: RBP: ffff888100cb3080 R08: 0000000000000000 R09: 0000000000000000
Apr 24 23:40:07 MojoRyzen kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000046
Apr 24 23:40:07 MojoRyzen kernel: R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000
Apr 24 23:40:07 MojoRyzen kernel: FS:  000014dfca02fbc0(0000) GS:ffff888ffe900000(0000) knlGS:0000000000000000
Apr 24 23:40:07 MojoRyzen kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 24 23:40:07 MojoRyzen kernel: CR2: 0000153564b375a0 CR3: 000000010148e000 CR4: 0000000000350ee0
Apr 24 23:40:07 MojoRyzen kernel: Call Trace:
Apr 24 23:40:07 MojoRyzen kernel: queued_spin_lock_slowpath+0x7/0xa
Apr 24 23:40:07 MojoRyzen kernel: _raw_spin_lock_irqsave+0x23/0x29
Apr 24 23:40:07 MojoRyzen kernel: __wake_up_common_lock+0x5d/0xaf
Apr 24 23:40:07 MojoRyzen kernel: ep_poll_callback+0x119/0x1a0
Apr 24 23:40:07 MojoRyzen kernel: __wake_up_common+0xa7/0x12f
Apr 24 23:40:07 MojoRyzen kernel: __wake_up_common_lock+0x77/0xaf
Apr 24 23:40:07 MojoRyzen kernel: sock_def_readable+0x29/0x41
Apr 24 23:40:07 MojoRyzen kernel: unix_dgram_sendmsg+0x4a3/0x51f
Apr 24 23:40:07 MojoRyzen kernel: sock_sendmsg_nosec+0x32/0x3c
Apr 24 23:40:07 MojoRyzen kernel: sock_write_iter+0x84/0xaf
Apr 24 23:40:07 MojoRyzen kernel: new_sync_write+0x7a/0xb2
Apr 24 23:40:07 MojoRyzen kernel: vfs_write+0xd7/0x121
Apr 24 23:40:07 MojoRyzen kernel: ksys_write+0x71/0xba
Apr 24 23:40:07 MojoRyzen kernel: do_syscall_64+0x5d/0x6a
Apr 24 23:40:07 MojoRyzen kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Apr 24 23:40:07 MojoRyzen kernel: RIP: 0033:0x14dfca680833
Apr 24 23:40:07 MojoRyzen kernel: Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18
Apr 24 23:40:07 MojoRyzen kernel: RSP: 002b:00007ffc38face68 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
Apr 24 23:40:07 MojoRyzen kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000014dfca680833
Apr 24 23:40:07 MojoRyzen kernel: RDX: 0000000000000000 RSI: 00007ffc38facf90 RDI: 0000000000000008
Apr 24 23:40:07 MojoRyzen kernel: RBP: 00007ffc38facf90 R08: 000000000064704a R09: 0000000000000006
Apr 24 23:40:07 MojoRyzen kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
Apr 24 23:40:07 MojoRyzen kernel: R13: 0000000000000000 R14: 000014dfca02fb08 R15: ffffffffffffffff
Apr 24 23:40:07 MojoRyzen kernel: Sending NMI from CPU 14 to CPUs 13:
Apr 24 23:40:07 MojoRyzen kernel: NMI backtrace for cpu 13
Apr 24 23:40:07 MojoRyzen kernel: CPU: 13 PID: 5731 Comm: udevd Tainted: P      D    O      5.10.28-Unraid #1
Apr 24 23:40:07 MojoRyzen kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Taichi, BIOS P3.00 04/07/2020
Apr 24 23:40:07 MojoRyzen kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x134/0x18a
Apr 24 23:40:07 MojoRyzen kernel: Code: d3 c1 ee 12 83 e0 03 ff ce 48 c1 e0 05 48 63 f6 48 05 00 30 02 00 48 03 04 f5 00 99 df 81 48 89 10 8b 42 08 85 c0 75 04 f3 90 <eb> f5 48 8b 32 48 85 f6 74 03 0f 0d 0e 8b 07 66 85 c0 74 04 f3 90
Apr 24 23:40:07 MojoRyzen kernel: RSP: 0018:ffffc90000cb7d00 EFLAGS: 00000046
Apr 24 23:40:07 MojoRyzen kernel: RAX: 0000000000000000 RBX: ffff888100cb30a0 RCX: 0000000000380000
Apr 24 23:40:07 MojoRyzen kernel: RDX: ffff888ffeb63000 RSI: 0000000000000066 RDI: ffff888100cb30a0
Apr 24 23:40:07 MojoRyzen kernel: RBP: ffff888100cb3080 R08: 0000000000000000 R09: 0000000000000000
Apr 24 23:40:07 MojoRyzen kernel: R10: 0000000000000000 R11: ffff888102ffcc50 R12: 0000000000000046
Apr 24 23:40:07 MojoRyzen kernel: R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000
Apr 24 23:40:07 MojoRyzen kernel: FS:  000014dfca02fbc0(0000) GS:ffff888ffeb40000(0000) knlGS:0000000000000000
Apr 24 23:40:07 MojoRyzen kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 24 23:40:07 MojoRyzen kernel: CR2: 000014dfca7efed4 CR3: 000000016bb20000 CR4: 0000000000350ee0
Apr 24 23:40:07 MojoRyzen kernel: Call Trace:
Apr 24 23:40:07 MojoRyzen kernel: queued_spin_lock_slowpath+0x7/0xa
Apr 24 23:40:07 MojoRyzen kernel: _raw_spin_lock_irqsave+0x23/0x29
Apr 24 23:40:07 MojoRyzen kernel: __wake_up_common_lock+0x5d/0xaf
Apr 24 23:40:07 MojoRyzen kernel: ep_poll_callback+0x119/0x1a0
Apr 24 23:40:07 MojoRyzen kernel: __wake_up_common+0xa7/0x12f
Apr 24 23:40:07 MojoRyzen kernel: __wake_up_common_lock+0x77/0xaf
Apr 24 23:40:07 MojoRyzen kernel: fsnotify_add_event+0xc2/0xe4
Apr 24 23:40:07 MojoRyzen kernel: inotify_handle_inode_event+0xd7/0x10d
Apr 24 23:40:07 MojoRyzen kernel: inotify_ignored_and_remove_idr+0x1c/0x3a
Apr 24 23:40:07 MojoRyzen kernel: __do_sys_inotify_rm_watch+0x74/0x98
Apr 24 23:40:07 MojoRyzen kernel: do_syscall_64+0x5d/0x6a
Apr 24 23:40:07 MojoRyzen kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Apr 24 23:40:07 MojoRyzen kernel: RIP: 0033:0x14dfca691157
Apr 24 23:40:07 MojoRyzen kernel: Code: 73 01 c3 48 8b 0d 39 7d 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 ff 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 09 7d 0c 00 f7 d8 64 89 01 48
Apr 24 23:40:07 MojoRyzen kernel: RSP: 002b:00007ffc38faca18 EFLAGS: 00000293 ORIG_RAX: 00000000000000ff
Apr 24 23:40:07 MojoRyzen kernel: RAX: ffffffffffffffda RBX: 000000000064fef0 RCX: 000014dfca691157
Apr 24 23:40:07 MojoRyzen kernel: RDX: 0000000000000007 RSI: 0000000000000058 RDI: 0000000000000005
Apr 24 23:40:07 MojoRyzen kernel: RBP: 0000000000000058 R08: 000000000064704a R09: 0000000000000006
Apr 24 23:40:07 MojoRyzen kernel: R10: 0000000000647010 R11: 0000000000000293 R12: 00007ffc38faca20
Apr 24 23:40:07 MojoRyzen kernel: R13: 0000000003938700 R14: 0000000000645700 R15: 0000000000645740
Apr 24 23:40:07 MojoRyzen kernel: NMI backtrace for cpu 14
Apr 24 23:40:07 MojoRyzen kernel: CPU: 14 PID: 8849 Comm: mount Tainted: P      D    O      5.10.28-Unraid #1
Apr 24 23:40:07 MojoRyzen kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Taichi, BIOS P3.00 04/07/2020
Apr 24 23:40:07 MojoRyzen kernel: Call Trace:
Apr 24 23:40:07 MojoRyzen kernel: <IRQ>
Apr 24 23:40:07 MojoRyzen kernel: dump_stack+0x6b/0x83
Apr 24 23:40:07 MojoRyzen kernel: ? lapic_can_unplug_cpu+0x8e/0x8e
Apr 24 23:40:07 MojoRyzen kernel: nmi_cpu_backtrace+0x7d/0x8f
Apr 24 23:40:07 MojoRyzen kernel: nmi_trigger_cpumask_backtrace+0x56/0xd3
Apr 24 23:40:07 MojoRyzen kernel: rcu_dump_cpu_stacks+0x9f/0xc6
Apr 24 23:40:07 MojoRyzen kernel: rcu_sched_clock_irq+0x1ec/0x543
Apr 24 23:40:07 MojoRyzen kernel: ? trigger_load_balance+0x5a/0x1ca
Apr 24 23:40:07 MojoRyzen kernel: update_process_times+0x50/0x6e
Apr 24 23:40:07 MojoRyzen kernel: tick_sched_timer+0x36/0x64
Apr 24 23:40:07 MojoRyzen kernel: __hrtimer_run_queues+0xb7/0x10b
Apr 24 23:40:07 MojoRyzen kernel: ? tick_sched_do_timer+0x39/0x39
Apr 24 23:40:07 MojoRyzen kernel: hrtimer_interrupt+0x8d/0x15b
Apr 24 23:40:07 MojoRyzen kernel: __sysvec_apic_timer_interrupt+0x5d/0x68
Apr 24 23:40:07 MojoRyzen kernel: asm_call_irq_on_stack+0x12/0x20
Apr 24 23:40:07 MojoRyzen kernel: </IRQ>
Apr 24 23:40:07 MojoRyzen kernel: sysvec_apic_timer_interrupt+0x71/0x95
Apr 24 23:40:07 MojoRyzen kernel: asm_sysvec_apic_timer_interrupt+0x12/0x20
Apr 24 23:40:07 MojoRyzen kernel: RIP: 0010:smp_call_function_many_cond+0x272/0x285
Apr 24 23:40:07 MojoRyzen kernel: Code: 4c 89 fe e8 b3 e2 28 00 3b 05 e2 9e 06 01 89 c7 73 1c 48 63 c7 49 8b 14 24 48 03 14 c5 00 99 df 81 8b 42 08 a8 01 74 04 f3 90 <eb> f5 eb d2 48 83 c4 30 5b 5d 41 5c 41 5d 41 5e 41 5f c3 0f b6 c9
Apr 24 23:40:07 MojoRyzen kernel: RSP: 0018:ffffc9000159fb48 EFLAGS: 00000202
Apr 24 23:40:07 MojoRyzen kernel: RAX: 0000000000000011 RBX: 0000000000000001 RCX: 000000000000000d
Apr 24 23:40:07 MojoRyzen kernel: RDX: ffff888ffeb67c00 RSI: 0000000000000000 RDI: 000000000000000d
Apr 24 23:40:07 MojoRyzen kernel: RBP: 0000000000000100 R08: 0000000000000000 R09: 000000000000000d
Apr 24 23:40:07 MojoRyzen kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff888ffeba33c0
Apr 24 23:40:07 MojoRyzen kernel: R13: 0000000000000000 R14: ffffffff81180ce4 R15: ffff888ffeba33c8
Apr 24 23:40:07 MojoRyzen kernel: ? put_bh+0x5/0x5
Apr 24 23:40:07 MojoRyzen kernel: ? smp_call_function_many_cond+0x250/0x285
Apr 24 23:40:07 MojoRyzen kernel: ? brelse+0x8/0x8
Apr 24 23:40:07 MojoRyzen kernel: ? brelse+0x8/0x8
Apr 24 23:40:07 MojoRyzen kernel: ? put_bh+0x5/0x5
Apr 24 23:40:07 MojoRyzen kernel: on_each_cpu_cond_mask+0x2a/0x6e
Apr 24 23:40:07 MojoRyzen kernel: invalidate_bdev+0x15/0x44
Apr 24 23:40:07 MojoRyzen kernel: btrfs_get_bdev_and_sb+0x60/0x94
Apr 24 23:40:07 MojoRyzen kernel: open_fs_devices+0x89/0x252
Apr 24 23:40:07 MojoRyzen kernel: btrfs_mount_root+0x1ac/0x3f1
Apr 24 23:40:07 MojoRyzen kernel: ? _cond_resched+0x1b/0x1e
Apr 24 23:40:07 MojoRyzen kernel: legacy_get_tree+0x22/0x3b
Apr 24 23:40:07 MojoRyzen kernel: vfs_get_tree+0x19/0x86
Apr 24 23:40:07 MojoRyzen kernel: fc_mount+0x9/0x2b
Apr 24 23:40:07 MojoRyzen kernel: vfs_kern_mount.part.0+0x3d/0x7d
Apr 24 23:40:07 MojoRyzen kernel: btrfs_mount+0x141/0x406
Apr 24 23:40:07 MojoRyzen kernel: ? vfs_parse_fs_string+0x55/0x9c
Apr 24 23:40:07 MojoRyzen kernel: ? legacy_parse_param+0x23/0x200
Apr 24 23:40:07 MojoRyzen kernel: ? legacy_get_tree+0x22/0x3b
Apr 24 23:40:07 MojoRyzen kernel: legacy_get_tree+0x22/0x3b
Apr 24 23:40:07 MojoRyzen kernel: vfs_get_tree+0x19/0x86
Apr 24 23:40:07 MojoRyzen kernel: path_mount+0x674/0x750
Apr 24 23:40:07 MojoRyzen kernel: do_mount+0x57/0x84
Apr 24 23:40:07 MojoRyzen kernel: __do_sys_mount+0xfa/0x122
Apr 24 23:40:07 MojoRyzen kernel: do_syscall_64+0x5d/0x6a
Apr 24 23:40:07 MojoRyzen kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Apr 24 23:40:07 MojoRyzen kernel: RIP: 0033:0x145f4263d1ba
Apr 24 23:40:07 MojoRyzen kernel: Code: 48 8b 0d d9 7c 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a6 7c 0c 00 f7 d8 64 89 01 48
Apr 24 23:40:07 MojoRyzen kernel: RSP: 002b:00007ffd25cd42b8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
Apr 24 23:40:07 MojoRyzen kernel: RAX: ffffffffffffffda RBX: 0000145f427bffa4 RCX: 0000145f4263d1ba
Apr 24 23:40:07 MojoRyzen kernel: RDX: 0000000000432be0 RSI: 000000000040e540 RDI: 0000000000436720
Apr 24 23:40:07 MojoRyzen kernel: RBP: 000000000040e2f0 R08: 000000000040e580 R09: 0000000000000002
Apr 24 23:40:07 MojoRyzen kernel: R10: 0000000000000400 R11: 0000000000000206 R12: 0000000000000000
Apr 24 23:40:07 MojoRyzen kernel: R13: 0000000000436720 R14: 0000000000432be0 R15: 000000000040e2f0

 

I have a Ryzen 3900X and an ASRock Taichi x570 motherboard.  I do I have a cable connected to the GPU to a display, but I don't have it turned on.  I am clearly doing something wrong here, but I can't figure out why it isn't working.  I also notice that several of the cores are pegged and it requires a hard reboot to get the system back.

 

Any way for me to boot without binding the primary GPU?

 

Thanks in advance for any ideas!

Link to comment

 

I have a taichi x570 and just one GPU.  In order to pass it through to a VM I have to:

 

Bind it to vfio in tools -> system devices

 

also I have to do this

(echo 0 > /sys/class/vtconsole/vtcon0/bind) 2>/dev/null
(echo 0 > /sys/class/vtconsole/vtcon1/bind) 2>/dev/null
(echo efi-framebuffer.0 > /sys/bus/platform/drivers/efi-framebuffer/unbind) 2>/dev/null

 

I have that in a userscript that runs at array start.  I don't know if that works with 2 GPUs.  You can run it from commandline too before the VM starts.

 

If you are trying to prevent unraid from using your GPUs for something so you can pass them through try that.  (You will no longer have a console.)

 

Link to comment

Thanks for the reply.  I was actually hoping to not pass the card through and use it in my mining container, but I can only mount the array with it bound in VFIO.  What I did instead is just set up a Windows 10 VM and am using mining software there and it is working well, so I'm happy with it for now.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.