Jump to content

Having an issue after the computer reboots when it was shutdown via NUT


0xjams

Recommended Posts

Hi:

I've had frequent blackouts recently so I started using NUT to properly shutdown my server. However, this week the server has been booting in an inconsistent state only when it reboots itself after power is back.

 

To fix the problem I have to physically turn it off with the button, boot it then do a parity check, shut down and boot normally again. Super tedious process, and I don't want to do it every time there's a blackout.

 

I got the logs from a successful boot and a failed one. Let's start with the failed one:

 

 

Dec 15 18:02:39 groudon rc.docker: amcrest2mqtt: started succesfully!
Dec 15 18:02:39 groudon kernel: ACPI Warning: \_SB.PC00.PEG1.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20220331/nsarguments-61)
Dec 15 18:02:39 groudon kernel: br0.20: port 2(vnet0) entered blocking state
Dec 15 18:02:39 groudon kernel: br0.20: port 2(vnet0) entered disabled state
Dec 15 18:02:39 groudon kernel: device vnet0 entered promiscuous mode
Dec 15 18:02:39 groudon kernel: br0.20: port 2(vnet0) entered blocking state
Dec 15 18:02:39 groudon kernel: br0.20: port 2(vnet0) entered forwarding state
Dec 15 18:02:40 groudon kernel: br0.20: port 3(vnet1) entered blocking state
Dec 15 18:02:40 groudon kernel: br0.20: port 3(vnet1) entered disabled state
Dec 15 18:02:40 groudon kernel: device vnet1 entered promiscuous mode
Dec 15 18:02:40 groudon kernel: br0.20: port 3(vnet1) entered blocking state
Dec 15 18:02:40 groudon kernel: br0.20: port 3(vnet1) entered forwarding state
Dec 15 18:02:40 groudon kernel: br0: port 2(vnet2) entered blocking state
Dec 15 18:02:40 groudon kernel: br0: port 2(vnet2) entered disabled state
Dec 15 18:02:40 groudon kernel: device vnet2 entered promiscuous mode
Dec 15 18:02:40 groudon kernel: br0: port 2(vnet2) entered blocking state
Dec 15 18:02:40 groudon kernel: br0: port 2(vnet2) entered forwarding state
Dec 15 18:02:40 groudon kernel: nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint.
Dec 15 18:02:40 groudon kernel: nvidia-uvm: Loaded the UVM driver, major device number 238.
Dec 15 18:02:40 groudon kernel: NVRM: GPU 0000:01:00.0: request_irq() failed (-22)
Dec 15 18:02:40 groudon kernel: NVRM: GPU 0000:01:00.0: request_irq() failed (-22)
Dec 15 18:02:40 groudon kernel: NVRM: GPU 0000:01:00.0: request_irq() failed (-22)
Dec 15 18:02:40 groudon kernel: NVRM: GPU 0000:01:00.0: request_irq() failed (-22)
Dec 15 18:02:40 groudon kernel: BUG: unable to handle page fault for address: 000000000001000f
Dec 15 18:02:40 groudon kernel: #PF: supervisor read access in kernel mode
Dec 15 18:02:40 groudon kernel: #PF: error_code(0x0000) - not-present page
Dec 15 18:02:40 groudon kernel: PGD 0 P4D 0 
Dec 15 18:02:40 groudon kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Dec 15 18:02:40 groudon kernel: CPU: 6 PID: 346 Comm: kworker/u64:8 Tainted: P           O       6.1.64-Unraid #1
Dec 15 18:02:40 groudon kernel: Hardware name: Intel(R) Client Systems NUC13RNGi9/NUC13SBBi9, BIOS SBRPL579.0047.2022.1006.1728 10/06/2022
Dec 15 18:02:40 groudon kernel: Workqueue: loop2 loop_rootcg_workfn
Dec 15 18:02:40 groudon kernel: RIP: 0010:__kmem_cache_alloc_node+0xb5/0x147
Dec 15 18:02:40 groudon kernel: Code: 48 c1 e9 3a 41 39 cf 74 1a 45 89 e8 4c 89 f1 44 89 fa 44 89 e6 48 89 ef e8 ce e9 ff ff 48 89 04 24 eb 25 8b 4d 28 48 8b 7d 00 <48> 8b 1c 08 48 8d 8a 00 01 00 00 65 48 0f c7 0f 0f 94 c0 84 c0 74
Dec 15 18:02:40 groudon kernel: RSP: 0018:ffffc90004e17860 EFLAGS: 00010246
Dec 15 18:02:40 groudon kernel: RAX: 000000000000ffff RBX: ffff888100042400 RCX: 0000000000000010
Dec 15 18:02:40 groudon kernel: RDX: 0000000000186806 RSI: 0000000000008d40 RDI: 0000000000030a70
Dec 15 18:02:40 groudon kernel: RBP: ffff888100042400 R08: 0000000000008d40 R09: ffffc90004e17928
Dec 15 18:02:40 groudon kernel: R10: 0000000000000402 R11: ffffc90004e17928 R12: 0000000000008d40
Dec 15 18:02:40 groudon kernel: R13: 0000000000000018 R14: ffffffff8129ebce R15: 00000000ffffffff
Dec 15 18:02:40 groudon kernel: FS:  0000000000000000(0000) GS:ffff88905f380000(0000) knlGS:0000000000000000
Dec 15 18:02:40 groudon kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 15 18:02:40 groudon kernel: CR2: 000000000001000f CR3: 000000013ce82000 CR4: 0000000000752ee0
Dec 15 18:02:40 groudon kernel: PKRU: 55555554
Dec 15 18:02:40 groudon kernel: Call Trace:
Dec 15 18:02:40 groudon kernel: <TASK>

 

I could notice that after the nvidia driver was loaded, everything started to fail. However, the following is a trace of a successful boot, and that same nvidia message is there.

 

Dec 15 18:50:23 groudon rc.docker: amcrest2mqtt: started succesfully!
Dec 15 18:50:23 groudon kernel: br-af7630112533: port 1(vethfae6309) entered blocking state
Dec 15 18:50:23 groudon kernel: br-af7630112533: port 1(vethfae6309) entered disabled state
Dec 15 18:50:23 groudon kernel: device vethfae6309 entered promiscuous mode
Dec 15 18:50:23 groudon kernel: br-af7630112533: port 1(vethfae6309) entered blocking state
Dec 15 18:50:23 groudon kernel: br-af7630112533: port 1(vethfae6309) entered forwarding state
Dec 15 18:50:23 groudon kernel: IPv6: ADDRCONF(NETDEV_CHANGE): br-af7630112533: link becomes ready
Dec 15 18:50:23 groudon kernel: br-af7630112533: port 1(vethfae6309) entered disabled state
Dec 15 18:50:23 groudon kernel: ACPI Warning: \_SB.PC00.PEG1.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20220331/nsarguments-61)
Dec 15 18:50:23 groudon kernel: br0.20: port 2(vnet0) entered blocking state
Dec 15 18:50:23 groudon kernel: br0.20: port 2(vnet0) entered disabled state
Dec 15 18:50:23 groudon kernel: device vnet0 entered promiscuous mode
Dec 15 18:50:23 groudon kernel: br0.20: port 2(vnet0) entered blocking state
Dec 15 18:50:23 groudon kernel: br0.20: port 2(vnet0) entered forwarding state
Dec 15 18:50:24 groudon kernel: br0.20: port 3(vnet1) entered blocking state
Dec 15 18:50:24 groudon kernel: br0.20: port 3(vnet1) entered disabled state
Dec 15 18:50:24 groudon kernel: device vnet1 entered promiscuous mode
Dec 15 18:50:24 groudon kernel: br0.20: port 3(vnet1) entered blocking state
Dec 15 18:50:24 groudon kernel: br0.20: port 3(vnet1) entered forwarding state
Dec 15 18:50:24 groudon kernel: br0: port 2(vnet2) entered blocking state
Dec 15 18:50:24 groudon kernel: br0: port 2(vnet2) entered disabled state
Dec 15 18:50:24 groudon kernel: device vnet2 entered promiscuous mode
Dec 15 18:50:24 groudon kernel: br0: port 2(vnet2) entered blocking state
Dec 15 18:50:24 groudon kernel: br0: port 2(vnet2) entered forwarding state
Dec 15 18:50:24 groudon kernel: nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint.
Dec 15 18:50:24 groudon kernel: nvidia-uvm: Loaded the UVM driver, major device number 238.
Dec 15 18:50:25 groudon kernel: eth0: renamed from vethbc07e43
Dec 15 18:50:25 groudon kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethd6f9f97: link becomes ready
Dec 15 18:50:25 groudon kernel: br-67d1cd50b67d: port 1(vethd6f9f97) entered blocking state
Dec 15 18:50:25 groudon kernel: br-67d1cd50b67d: port 1(vethd6f9f97) entered forwarding state
Dec 15 18:50:25 groudon kernel: IPv6: ADDRCONF(NETDEV_CHANGE): br-67d1cd50b67d: link becomes ready
Dec 15 18:50:25 groudon rc.docker: plex: started succesfully!
Dec 15 18:50:25 groudon rc.docker: plex: wait 600 seconds
Dec 15 18:50:25 groudon kernel: eth0: renamed from veth28a0bbd
Dec 15 18:50:25 groudon kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethfae6309: link becomes ready
Dec 15 18:50:25 groudon kernel: br-af7630112533: port 1(vethfae6309) entered blocking state
Dec 15 18:50:25 groudon kernel: br-af7630112533: port 1(vethfae6309) entered forwarding state
Dec 15 18:50:25 groudon unassigned.devices: Mounting 'Auto Mount' Remote Shares...

 

Any ideas of what I could do?

 

Link to comment

It's possible the frequent blackouts before you started using a UPS with NUT fried something in your server.

It seems to happen around the time normally your GPU drivers would be loaded, perhaps a memory (RAM) or GPU issue.

 

I'd first run an extensive memtest on the system to see if your memory (RAM) is OK.

Blackouts can wreak havoc on system components, so making sure everything is OK hardware-wise would be the first step.

 

Quick google brings up a lot of NVIDIA-related results, for "Oops: 0000 [#1] PREEMPT SMP NOPTI".

Do you have any custom graphic driver (plugins) installed for your GPU? GPU working normally otherwise?

 

Please post here a diagnostics package, so we can take a look at the software-side of things in the meantime.

 

Edited by Rysz
Link to comment
10 hours ago, 0xjams said:

I've had frequent blackouts recently so I started using NUT to properly shutdown my server. However, this week the server has been booting in an inconsistent state only when it reboots itself after power is back.

 

First Question:

 

Are you saying that the computer restarts after power is restored to the UPS?

 

Depending on your UPS powerdown  settings, this is a bad idea!  The problem is that your server battery may not have enough battery reserve to achieve a second clean powerdown.  The best practice is to leave the server powered down until you are sure that the power has been properly fixed.

 

Normally, the restart-on-power setting is in the BIOS of the MB.  Can't tell you where as every BIOS is different but the MB's Manual may help. 

 

In your situation, I would actually set the UPS to powerdown after a fixed time on battery.  You should determine this time by figuring out the length of time of a power outage that the power will not be restored any time soon.  (In my case, I have determined that after thirty seconds on battery, the power will be out for hours!)  This setting should allow for a second clean shutdown for most servers and UPS's.  Remember that a total recharge of the UPS battery will take several hours after power is restored.

 

Second Question:

 

Do you get a clean shutdown when you start it from the MAIN tab of the GUI?  If you don't, you have another problem that is not related to the UPS settings.. 

 

Link to comment

I agree to the advice given above about UPS Settings, enough battery reserves should be considered.

But you foremost need to figure out why you're sometimes getting kernel panics and stuck when booting up.

An unclean shutdown (or multiple) from a NUT/UPS misconfiguration might explain a triggered parity check, but not that.

 

I'd stress test the GPU and memtest the memory (RAM) to figure out any hardware-side problems first.

Then I'd attempt to eliminate driver-related errors by updating any driver plugins to their latest versions.

If that doesn't help then I'd attempt to deactivate any driver plugins and see if that resolves the issue for you.

 

In any case we'll need a diagnostics package as well, so we can help you identify any configuration problems.

 

Edited by Rysz
Link to comment

HI everyone

 

Thank you for your answers. So, I have my Unraid's Nut settings to start powering off after 4 minutes of battery usage instead of normal power. The UPS has like 20 minutes of autonomy.

 

After extensive reading, I noticed that Nut's command to shutdown is /sbin/poweroff, but I read somewhere that a clear shutdown is with a powerdown script even though the script says it's deprecated.

 

I also read that there's a setting that defines how long a shutdown should wait before forcing it. I increased that value (it was in 60 seconds).

 

I've manually powered off and rebooted my server a few times, and this inconsistent state only happens when it turns on automatically when the UPS has power again right after a blackout (I used the BIOS setting so that it turns on when power returns).

 

I have no idea if I'm barking up the wrong tree here, though.

Link to comment
1 minute ago, 0xjams said:

HI everyone

 

Thank you for your answers. So, I have my Unraid's Nut settings to start powering off after 4 minutes of battery usage instead of normal power. The UPS has like 20 minutes of autonomy.

 

After extensive reading, I noticed that Nut's command to shutdown is /sbin/poweroff, but I read somewhere that a clear shutdown is with a powerdown script even though the script says it's deprecated.

 

I also read that there's a setting that defines how long a shutdown should wait before forcing it. I increased that value (it was in 60 seconds).

 

I've manually powered off and rebooted my server a few times, and this inconsistent state only happens when it turns on automatically when the UPS has power again right after a blackout (I used the BIOS setting so that it turns on when power returns).

 

I have no idea if I'm barking up the wrong tree here, though.

 

4 minutes seems reasonable considering a 20 minutes estimated runtime of your UPS.

The poweroff command is fine, that command results in a graceful shutdown of your system.

The powerdown script would also work, but it's deprecated now and probably should no longer be used.

 

It's possible 60 seconds weren't enough and left your GPU drivers in an inconsistent state when forcing shutdown.

Increasing it is probably a good idea, I use 120 seconds myself and factored this in when choosing the NUT on-battery timeout.

Link to comment
28 minutes ago, 0xjams said:

I also read that there's a setting that defines how long a shutdown should wait before forcing it. I increased that value (it was in 60 seconds).

 

One way to determine what the setting should be is to determine the time required to stop the array.  This is one way to do that.  Make sure that all of the Dockers and VMs that you normally use are running.  With a stopwatch (or some other timer), click on the 'STOP' button on the MAIN tab in the 'ARRAY OPERATIONS' section/tab.  Measure the time required for the array to stop.  Add about 25% to that number and use that for the Force-Shutdown setting.

 

IF you are still having problems with unclean shutdowns be sure that you have read this section of the Manual:

 

      https://docs.unraid.net/unraid-os/manual/troubleshooting/#unclean-shutdowns

 

There are several suggestion there for making sure that things are being stopped.  Pay attention to the one on using Tip and Tweaks to stop BASH  and SSH sessions. 

Link to comment

I replicated the event in a controlled way. I was in front of the server the moment the server booted up and the message before it fails relates to a Nut related pid.

 

I'm the image you can see the prompt. However, I can't type anything and the cursor does not even blink.

 

In this state I have no ways of cleanly shutting down because neither ssh, http or the command prompt (with a monitor and a keyboard) work.

 

Does the boot up text provide any further clues?

 

20231216_120022.jpg

Link to comment
3 hours ago, 0xjams said:

I replicated the event in a controlled way. I was in front of the server the moment the server booted up and the message before it fails relates to a Nut related pid.

 

I'm the image you can see the prompt. However, I can't type anything and the cursor does not even blink.

 

In this state I have no ways of cleanly shutting down because neither ssh, http or the command prompt (with a monitor and a keyboard) work.

 

Does the boot up text provide any further clues?

 

20231216_120022.jpg

 

The NUT messages are all normal and part of the regular NUT startup process (it's checking for duplicate instances before starting up the services). In fact there seems to be some kind of transfer (not by NUT) happening after NUT startup is already completed, this CURL output before the login prompt is caused by something else that's not NUT. I'll check your diagnostics package if I can find some clues.

Link to comment

I just checked your diagnostics package, the next thing in line after NUT installation is the NVIDIA driver.

Which returns us to your GPU, something that's happening with the GPU or GPU driver hangs your system up.

 

When you replicated this, was your network also down - because there's some network unreachable errors in the logs.

Is it possible you also have network equipment connected to the UPS that's not 100% up yet when your server reboots?

It's possible one of your plugins keeps trying to pull files from the internet while the network is still down and hangs up there.

 

That'd be the only explanation that makes sense to me, considering it only happens when recovering from power loss and not otherwise.

It'd be interesting what happens after a power loss reboot not into regular UNRAID but UNRAID safe mode (no plugins), if also still happens there.

 

Edited by Rysz
Link to comment

That makes total sense. I read somewhere that the Nvidia driver gets downloaded every time upon reboot.

 

So whenever power comes back maybe the server starts faster than the router or the switch the computer is connected to. Could the problem be a race condition between the server and the switch the server is connected to?

 

Is there a way to delay startup a bit? A parameter somewhere?

Link to comment
10 hours ago, 0xjams said:

That makes total sense. I read somewhere that the Nvidia driver gets downloaded every time upon reboot.

 

So whenever power comes back maybe the server starts faster than the router or the switch the computer is connected to. Could the problem be a race condition between the server and the switch the server is connected to?

 

Is there a way to delay startup a bit? A parameter somewhere?

 

Not in that sense, as that's usually a hardware configuration feature directly on the UPS (APC calls them Outlet Groups, I think).

Some UPS devices offer this kind of configuration through the display on the UPS itself, my Eaton device does for example.

Alternatively, if your UPS has a network card you might be able to do this configuration via the UPS web interface.

 

Did you ever try to wait when it's stuck, like what happens after say 10 or 20 minutes?

It would be interesting if some kind of stuck operation eventually times out and gives an error message.

That would help to narrow down what plugin (or not) causes the operation that's stuck, so we would know that... 🙂

 

Did you ever attempt to replicate the situation with the NVIDIA plugin disabled?

It might also be worth asking here, that's the support topic for the NVIDIA driver:


This report also looks interesting and a bit (but not entirely) similar to your case:

 

Edited by Rysz
Link to comment

Hi

 

I finally had another event today. I increased the shutdown timeout by a lot, but the problem persisted.

 

I also left the server running to see if it normalizes or if I get more log entries. Here's syslog. I have no more log entries after that, nothing gets written.

 

Another interesting thing is that I was lucky enough to get an ssh session, because right now I can no longer create another ssh session, but the one I initially got still works.

 

Dec 21 18:10:02 groudon kernel: ? slab_post_alloc_hook+0x4d/0x15e
Dec 21 18:10:02 groudon kernel: ? percpu_ref_init+0x6d/0xf1
Dec 21 18:10:02 groudon kernel: kernfs_new_node+0x44/0x68
Dec 21 18:10:02 groudon kernel: __kernfs_create_file+0x2c/0x99
Dec 21 18:10:02 groudon kernel: cgroup_addrm_files+0x14c/0x28d
Dec 21 18:10:02 groudon kernel: css_populate_dir+0x58/0x107
Dec 21 18:10:02 groudon kernel: cgroup_mkdir+0x2fd/0x3be
Dec 21 18:10:02 groudon kernel: kernfs_iop_mkdir+0x56/0x72
Dec 21 18:10:02 groudon kernel: vfs_mkdir+0x82/0xbb
Dec 21 18:10:02 groudon kernel: do_mkdirat+0x82/0xda
Dec 21 18:10:02 groudon kernel: __x64_sys_mkdir+0x22/0x2a
Dec 21 18:10:02 groudon kernel: do_syscall_64+0x68/0x81
Dec 21 18:10:02 groudon kernel: entry_SYSCALL_64_after_hwframe+0x64/0xce
Dec 21 18:10:02 groudon kernel: RIP: 0033:0x146ee31847d7
Dec 21 18:10:02 groudon kernel: Code: 44 00 00 48 8b 05 41 46 0e 00 bb ff ff ff ff 64 c7 00 16 00 00 00 e9 71 ff ff ff 0f 1f 84 00 00 00 00 00 b8 53 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 11 46 0e 00 f7 d8 64 89 02 b8
Dec 21 18:10:02 groudon kernel: RSP: 002b:00007ffe13498458 EFLAGS: 00000206 ORIG_RAX: 0000000000000053
Dec 21 18:10:02 groudon kernel: RAX: ffffffffffffffda RBX: 00000000004531f6 RCX: 0000146ee31847d7
Dec 21 18:10:02 groudon kernel: RDX: 0000000000000000 RSI: 00000000000001ed RDI: 0000000000450e40
Dec 21 18:10:02 groudon kernel: RBP: 000000000042d11c R08: 000000000000000e R09: 0000000000000000
Dec 21 18:10:02 groudon kernel: R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
Dec 21 18:10:02 groudon kernel: R13: 00007ffe13498610 R14: 00007ffe13498538 R15: 0000000000000001
Dec 21 18:10:02 groudon kernel: </TASK>
Dec 21 18:10:02 groudon kernel: Modules linked in: nvidia_uvm(PO) xt_nat xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter xfs md_mod zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) tcp_diag inet_diag ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs bridge 8021q garp mrp stp llc bonding tls igc atlantic nvidia_drm(PO) nvidia_modeset(PO) intel_rapl_msr intel_rapl_common iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel nvidia(PO) kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 btusb sha1_ssse3 btrtl mei_pxp drm_kms_helper aesni_intel mei_hdcp btbcm crypto_simd cryptd rapl btintel intel_cstate bluetooth drm wmi_bmof ahci i2c_i801 mei_me pl2303
Dec 21 18:10:02 groudon kernel: syscopyarea nvme i2c_smbus input_leds sysfillrect ecdh_generic intel_uncore thunderbolt sysimgblt i2c_core mei nvme_core libahci joydev led_class usbserial ecc fb_sys_fops thermal fan tpm_crb video tpm_tis tpm_tis_core wmi tpm backlight intel_pmc_core acpi_pad acpi_tad button unix [last unloaded: igc]
Dec 21 18:10:02 groudon kernel: ---[ end trace 0000000000000000 ]---
Dec 21 18:10:02 groudon kernel: RIP: 0010:__kmem_cache_alloc_node+0xb5/0x147
Dec 21 18:10:02 groudon kernel: Code: 48 c1 e9 3a 41 39 cf 74 1a 45 89 e8 4c 89 f1 44 89 fa 44 89 e6 48 89 ef e8 ce e9 ff ff 48 89 04 24 eb 25 8b 4d 28 48 8b 7d 00 <48> 8b 1c 08 48 8d 8a 00 01 00 00 65 48 0f c7 0f 0f 94 c0 84 c0 74
Dec 21 18:10:02 groudon kernel: RSP: 0018:ffffc90009dd7bc0 EFLAGS: 00010246
Dec 21 18:10:02 groudon kernel: RAX: dead000000000122 RBX: ffff888100042400 RCX: 0000000000000010
Dec 21 18:10:02 groudon kernel: RDX: 000000000020610e RSI: 0000000000000dc0 RDI: 0000000000030a70
Dec 21 18:10:02 groudon kernel: RBP: ffff888100042400 R08: 0000000000000dc0 R09: ffffffff8297b8c0
Dec 21 18:10:02 groudon kernel: R10: 0000000000000000 R11: 0000000000032140 R12: 0000000000000dc0
Dec 21 18:10:02 groudon kernel: R13: 0000000000000020 R14: ffffffff811fa649 R15: 00000000ffffffff
Dec 21 18:10:02 groudon kernel: FS:  0000146ee3053c80(0000) GS:ffff88905f580000(0000) knlGS:0000000000000000
Dec 21 18:10:02 groudon kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 21 18:10:02 groudon kernel: CR2: 0000151612737540 CR3: 000000010720c000 CR4: 0000000000752ee0
Dec 21 18:10:02 groudon kernel: PKRU: 55555554
Dec 21 18:10:02 groudon sshd[13790]: pam_elogind(sshd:session): Failed to create session: Message recipient disconnected from message bus without replying
Dec 21 18:10:02 groudon sshd[13790]: Starting session: command for root from 10.30.101.114 port 52384 id 0
Dec 21 18:10:02 groudon sshd[13790]: Close session: user root from 10.30.101.114 port 52384 id 0
Dec 21 18:10:02 groudon sshd[13790]: Received disconnect from 10.30.101.114 port 52384:11: disconnected by user
Dec 21 18:10:02 groudon sshd[13790]: Disconnected from user root 10.30.101.114 port 52384
Dec 21 18:10:02 groudon sshd[13790]: pam_unix(sshd:session): session closed for user root
Dec 21 18:10:02 groudon kernel: general protection fault, probably for non-canonical address 0xdead000000000132: 0000 [#95] PREEMPT SMP NOPTI
Dec 21 18:10:02 groudon kernel: CPU: 14 PID: 13793 Comm: sshd Tainted: P      D W  O       6.1.64-Unraid #1
Dec 21 18:10:02 groudon kernel: Hardware name: Intel(R) Client Systems NUC13RNGi9/NUC13SBBi9, BIOS SBRPL579.0047.2022.1006.1728 10/06/2022
Dec 21 18:10:02 groudon kernel: RIP: 0010:__kmem_cache_alloc_node+0xb5/0x147
Dec 21 18:10:02 groudon kernel: Code: 48 c1 e9 3a 41 39 cf 74 1a 45 89 e8 4c 89 f1 44 89 fa 44 89 e6 48 89 ef e8 ce e9 ff ff 48 89 04 24 eb 25 8b 4d 28 48 8b 7d 00 <48> 8b 1c 08 48 8d 8a 00 01 00 00 65 48 0f c7 0f 0f 94 c0 84 c0 74
Dec 21 18:10:02 groudon kernel: RSP: 0018:ffffc9000a8a7d10 EFLAGS: 00010246
Dec 21 18:10:02 groudon kernel: RAX: dead000000000122 RBX: ffff888100042400 RCX: 0000000000000010
Dec 21 18:10:02 groudon kernel: RDX: 000000000020610e RSI: 0000000000000cc0 RDI: 0000000000030a70
Dec 21 18:10:02 groudon kernel: RBP: ffff888100042400 R08: 0000000000000cc0 R09: ffff888147dc4fe0
Dec 21 18:10:02 groudon kernel: R10: ffffc9000a8a7cc0 R11: ffffc9000a8a79c0 R12: 0000000000000cc0
Dec 21 18:10:02 groudon kernel: R13: 000000000000001c R14: ffffffff81295b88 R15: 00000000ffffffff
Dec 21 18:10:02 groudon kernel: FS:  00001516126ac740(0000) GS:ffff88905f580000(0000) knlGS:0000000000000000
Dec 21 18:10:02 groudon kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 21 18:10:02 groudon kernel: CR2: 0000151612737540 CR3: 000000020d42e000 CR4: 0000000000752ee0
Dec 21 18:10:02 groudon kernel: PKRU: 55555554
Dec 21 18:10:02 groudon kernel: Call Trace:
Dec 21 18:10:02 groudon kernel: <TASK>
Dec 21 18:10:02 groudon kernel: ? __die_body+0x1a/0x5c
Dec 21 18:10:02 groudon kernel: ? die_addr+0x38/0x51
Dec 21 18:10:02 groudon kernel: ? exc_general_protection+0x30f/0x345
Dec 21 18:10:02 groudon kernel: ? asm_exc_general_protection+0x22/0x30
Dec 21 18:10:02 groudon kernel: ? load_elf_binary+0xd2/0x13ce
Dec 21 18:10:02 groudon kernel: ? __kmem_cache_alloc_node+0xb5/0x147
Dec 21 18:10:02 groudon kernel: ? load_elf_binary+0xd2/0x13ce
Dec 21 18:10:02 groudon kernel: __kmalloc+0x81/0xaf
Dec 21 18:10:02 groudon kernel: load_elf_binary+0xd2/0x13ce
Dec 21 18:10:02 groudon kernel: ? __kernel_read+0xf7/0x13c
Dec 21 18:10:02 groudon kernel: ? __kernel_read+0xf7/0x13c
Dec 21 18:10:02 groudon kernel: bprm_execve+0x237/0x52b
Dec 21 18:10:02 groudon kernel: do_execveat_common.isra.0+0x1a6/0x1cf
Dec 21 18:10:02 groudon kernel: __x64_sys_execve+0x38/0x44
Dec 21 18:10:02 groudon kernel: do_syscall_64+0x68/0x81
Dec 21 18:10:02 groudon kernel: entry_SYSCALL_64_after_hwframe+0x64/0xce
Dec 21 18:10:02 groudon kernel: RIP: 0033:0x1516127d7a87
Dec 21 18:10:02 groudon kernel: Code: 48 8d 3d fc 3f 11 00 5b 5d 41 5c 41 5d 41 5e 41 5f e9 0d bd fa ff 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 3b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 61 c3 10 00 f7 d8 64 89 01 48
Dec 21 18:10:02 groudon kernel: RSP: 002b:00007ffedb96c268 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
Dec 21 18:10:02 groudon kernel: RAX: ffffffffffffffda RBX: 00001516126ac6a0 RCX: 00001516127d7a87
Dec 21 18:10:02 groudon kernel: RDX: 000056156aa25300 RSI: 000056156aa3db40 RDI: 000056156aa252e0
Dec 21 18:10:02 groudon kernel: RBP: 000056156aa3db40 R08: 0000000000000000 R09: 0000000000000000
Dec 21 18:10:02 groudon kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 000056156aa1fda0
Dec 21 18:10:02 groudon kernel: R13: 0000000000000005 R14: 0000000000000001 R15: 0000000000000005
Dec 21 18:10:02 groudon kernel: </TASK>
Dec 21 18:10:02 groudon kernel: Modules linked in: nvidia_uvm(PO) xt_nat xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter xfs md_mod zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) tcp_diag inet_diag ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs bridge 8021q garp mrp stp llc bonding tls igc atlantic nvidia_drm(PO) nvidia_modeset(PO) intel_rapl_msr intel_rapl_common iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel nvidia(PO) kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 btusb sha1_ssse3 btrtl mei_pxp drm_kms_helper aesni_intel mei_hdcp btbcm crypto_simd cryptd rapl btintel intel_cstate bluetooth drm wmi_bmof ahci i2c_i801 mei_me pl2303
Dec 21 18:10:02 groudon kernel: syscopyarea nvme i2c_smbus input_leds sysfillrect ecdh_generic intel_uncore thunderbolt sysimgblt i2c_core mei nvme_core libahci joydev led_class usbserial ecc fb_sys_fops thermal fan tpm_crb video tpm_tis tpm_tis_core wmi tpm backlight intel_pmc_core acpi_pad acpi_tad button unix [last unloaded: igc]
Dec 21 18:10:02 groudon kernel: ---[ end trace 0000000000000000 ]---
Dec 21 18:10:02 groudon kernel: RIP: 0010:__kmem_cache_alloc_node+0xb5/0x147
Dec 21 18:10:02 groudon kernel: Code: 48 c1 e9 3a 41 39 cf 74 1a 45 89 e8 4c 89 f1 44 89 fa 44 89 e6 48 89 ef e8 ce e9 ff ff 48 89 04 24 eb 25 8b 4d 28 48 8b 7d 00 <48> 8b 1c 08 48 8d 8a 00 01 00 00 65 48 0f c7 0f 0f 94 c0 84 c0 74
Dec 21 18:10:02 groudon kernel: RSP: 0018:ffffc90009dd7bc0 EFLAGS: 00010246
Dec 21 18:10:02 groudon kernel: RAX: dead000000000122 RBX: ffff888100042400 RCX: 0000000000000010
Dec 21 18:10:02 groudon kernel: RDX: 000000000020610e RSI: 0000000000000dc0 RDI: 0000000000030a70
Dec 21 18:10:02 groudon kernel: RBP: ffff888100042400 R08: 0000000000000dc0 R09: ffffffff8297b8c0
Dec 21 18:10:02 groudon kernel: R10: 0000000000000000 R11: 0000000000032140 R12: 0000000000000dc0
Dec 21 18:10:02 groudon kernel: R13: 0000000000000020 R14: ffffffff811fa649 R15: 00000000ffffffff
Dec 21 18:10:02 groudon kernel: FS:  00001516126ac740(0000) GS:ffff88905f580000(0000) knlGS:0000000000000000
Dec 21 18:10:02 groudon kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 21 18:10:02 groudon kernel: CR2: 0000151612737540 CR3: 000000020d42e000 CR4: 0000000000752ee0
Dec 21 18:10:02 groudon kernel: PKRU: 55555554
Dec 21 18:10:03 groudon kernel: BUG: Bad rss-counter state mm:00000000d4e77f0a type:MM_ANONPAGES val:1
Dec 21 18:10:03 groudon sshd[13806]: Connection from 10.30.101.114 port 52389 on 10.0.101.3 port 22 rdomain ""
Dec 21 18:10:03 groudon sshd[13806]: Accepted key RSA SHA256:gXcM2ytff7bWSoMQe/YBvyYHcA4TUJBY8tIZjA8eT4E found at /root/.ssh/authorized_keys:1
Dec 21 18:10:03 groudon sshd[13806]: Postponed publickey for root from 10.30.101.114 port 52389 ssh2 [preauth]
Dec 21 18:10:03 groudon sshd[13806]: Accepted key RSA SHA256:gXcM2ytff7bWSoMQe/YBvyYHcA4TUJBY8tIZjA8eT4E found at /root/.ssh/authorized_keys:1
Dec 21 18:10:03 groudon sshd[13806]: Accepted publickey for root from 10.30.101.114 port 52389 ssh2: RSA SHA256:gXcM2ytff7bWSoMQe/YBvyYHcA4TUJBY8tIZjA8eT4E
Dec 21 18:10:03 groudon sshd[13806]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Dec 21 18:10:04 groudon kernel: general protection fault, probably for non-canonical address 0xdead000000000132: 0000 [#98] PREEMPT SMP NOPTI
Dec 21 18:10:04 groudon kernel: CPU: 14 PID: 13808 Comm: sshd Tainted: P      D W  O       6.1.64-Unraid #1
Dec 21 18:10:04 groudon kernel: Hardware name: Intel(R) Client Systems NUC13RNGi9/NUC13SBBi9, BIOS SBRPL579.0047.2022.1006.1728 10/06/2022
Dec 21 18:10:04 groudon kernel: RIP: 0010:__kmem_cache_alloc_node+0xb5/0x147
Dec 21 18:10:04 groudon kernel: Code: 48 c1 e9 3a 41 39 cf 74 1a 45 89 e8 4c 89 f1 44 89 fa 44 89 e6 48 89 ef e8 ce e9 ff ff 48 89 04 24 eb 25 8b 4d 28 48 8b 7d 00 <48> 8b 1c 08 48 8d 8a 00 01 00 00 65 48 0f c7 0f 0f 94 c0 84 c0 74
Dec 21 18:10:04 groudon kernel: RSP: 0018:ffffc9000a8dfd10 EFLAGS: 00010246
Dec 21 18:10:04 groudon kernel: RAX: dead000000000122 RBX: ffff888100042400 RCX: 0000000000000010
Dec 21 18:10:04 groudon kernel: RDX: 000000000020610e RSI: 0000000000000cc0 RDI: 0000000000030a70
Dec 21 18:10:04 groudon kernel: RBP: ffff888100042400 R08: 0000000000000cc0 R09: ffff888147dc4fe0
Dec 21 18:10:04 groudon kernel: R10: ffffc9000a8dfcc0 R11: ffffc9000a8df9c0 R12: 0000000000000cc0
Dec 21 18:10:04 groudon kernel: R13: 000000000000001c R14: ffffffff81295b88 R15: 00000000ffffffff
Dec 21 18:10:04 groudon kernel: FS:  00001516126ac740(0000) GS:ffff88905f580000(0000) knlGS:0000000000000000
Dec 21 18:10:04 groudon kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 21 18:10:04 groudon kernel: CR2: 0000151612737540 CR3: 00000001f3c8c000 CR4: 0000000000752ee0
Dec 21 18:10:04 groudon kernel: PKRU: 55555554
Dec 21 18:10:04 groudon kernel: Call Trace:
Dec 21 18:10:04 groudon kernel: <TASK>
Dec 21 18:10:04 groudon kernel: ? __die_body+0x1a/0x5c
Dec 21 18:10:04 groudon kernel: ? die_addr+0x38/0x51
Dec 21 18:10:04 groudon kernel: ? exc_general_protection+0x30f/0x345
Dec 21 18:10:04 groudon kernel: ? asm_exc_general_protection+0x22/0x30
Dec 21 18:10:04 groudon kernel: ? load_elf_binary+0xd2/0x13ce
Dec 21 18:10:04 groudon kernel: ? __kmem_cache_alloc_node+0xb5/0x147
Dec 21 18:10:04 groudon kernel: ? load_elf_binary+0xd2/0x13ce
Dec 21 18:10:04 groudon kernel: __kmalloc+0x81/0xaf
Dec 21 18:10:04 groudon kernel: load_elf_binary+0xd2/0x13ce
Dec 21 18:10:04 groudon kernel: ? __kernel_read+0xf7/0x13c
Dec 21 18:10:04 groudon kernel: ? __kernel_read+0xf7/0x13c
Dec 21 18:10:04 groudon kernel: bprm_execve+0x237/0x52b
Dec 21 18:10:04 groudon kernel: do_execveat_common.isra.0+0x1a6/0x1cf
Dec 21 18:10:04 groudon kernel: __x64_sys_execve+0x38/0x44
Dec 21 18:10:04 groudon kernel: do_syscall_64+0x68/0x81
Dec 21 18:10:04 groudon kernel: entry_SYSCALL_64_after_hwframe+0x64/0xce
Dec 21 18:10:04 groudon kernel: RIP: 0033:0x1516127d7a87
Dec 21 18:10:04 groudon kernel: Code: 48 8d 3d fc 3f 11 00 5b 5d 41 5c 41 5d 41 5e 41 5f e9 0d bd fa ff 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 3b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 61 c3 10 00 f7 d8 64 89 01 48
Dec 21 18:10:04 groudon kernel: RSP: 002b:00007ffedb96c268 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
Dec 21 18:10:04 groudon kernel: RAX: ffffffffffffffda RBX: 00001516126ac6a0 RCX: 00001516127d7a87
Dec 21 18:10:04 groudon kernel: RDX: 000056156aa25300 RSI: 000056156aa3db40 RDI: 000056156aa252e0
Dec 21 18:10:04 groudon kernel: RBP: 000056156aa3db40 R08: 0000000000000000 R09: 0000000000000000
Dec 21 18:10:04 groudon kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 000056156aa1fda0
Dec 21 18:10:04 groudon kernel: R13: 0000000000000005 R14: 0000000000000001 R15: 0000000000000005
Dec 21 18:10:04 groudon kernel: </TASK>
Dec 21 18:10:04 groudon kernel: Modules linked in: nvidia_uvm(PO) xt_nat xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter xfs md_mod zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) tcp_diag inet_diag ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs bridge 8021q garp mrp stp llc bonding tls igc atlantic nvidia_drm(PO) nvidia_modeset(PO) intel_rapl_msr intel_rapl_common iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel nvidia(PO) kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 btusb sha1_ssse3 btrtl mei_pxp drm_kms_helper aesni_intel mei_hdcp btbcm crypto_simd cryptd rapl btintel intel_cstate bluetooth drm wmi_bmof ahci i2c_i801 mei_me pl2303
Dec 21 18:10:04 groudon kernel: syscopyarea nvme i2c_smbus input_leds sysfillrect ecdh_generic intel_uncore thunderbolt sysimgblt i2c_core mei nvme_core libahci joydev led_class usbserial ecc fb_sys_fops thermal fan tpm_crb video tpm_tis tpm_tis_core wmi tpm backlight intel_pmc_core acpi_pad acpi_tad button unix [last unloaded: igc]
Dec 21 18:10:04 groudon kernel: ---[ end trace 0000000000000000 ]---
Dec 21 18:10:04 groudon kernel: RIP: 0010:__kmem_cache_alloc_node+0xb5/0x147
Dec 21 18:10:04 groudon kernel: Code: 48 c1 e9 3a 41 39 cf 74 1a 45 89 e8 4c 89 f1 44 89 fa 44 89 e6 48 89 ef e8 ce e9 ff ff 48 89 04 24 eb 25 8b 4d 28 48 8b 7d 00 <48> 8b 1c 08 48 8d 8a 00 01 00 00 65 48 0f c7 0f 0f 94 c0 84 c0 74
Dec 21 18:10:04 groudon kernel: RSP: 0018:ffffc90009dd7bc0 EFLAGS: 00010246
Dec 21 18:10:04 groudon kernel: RAX: dead000000000122 RBX: ffff888100042400 RCX: 0000000000000010
Dec 21 18:10:04 groudon kernel: RDX: 000000000020610e RSI: 0000000000000dc0 RDI: 0000000000030a70
Dec 21 18:10:04 groudon kernel: RBP: ffff888100042400 R08: 0000000000000dc0 R09: ffffffff8297b8c0
Dec 21 18:10:04 groudon kernel: R10: 0000000000000000 R11: 0000000000032140 R12: 0000000000000dc0
Dec 21 18:10:04 groudon kernel: R13: 0000000000000020 R14: ffffffff811fa649 R15: 00000000ffffffff
Dec 21 18:10:04 groudon kernel: FS:  00001516126ac740(0000) GS:ffff88905f580000(0000) knlGS:0000000000000000
Dec 21 18:10:04 groudon kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 21 18:10:04 groudon kernel: CR2: 0000151612737540 CR3: 00000001f3c8c000 CR4: 0000000000752ee0
Dec 21 18:10:04 groudon kernel: PKRU: 55555554
Dec 21 18:10:04 groudon kernel: </TASK>
Dec 21 18:10:04 groudon kernel: Modules linked in: nvidia_uvm(PO) xt_nat xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter xfs md_mod zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) tcp_diag inet_diag ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs bridge 8021q garp mrp stp llc bonding tls igc atlantic nvidia_drm(PO) nvidia_modeset(PO) intel_rapl_msr intel_rapl_common iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel nvidia(PO) kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 btusb sha1_ssse3 btrtl mei_pxp drm_kms_helper aesni_intel mei_hdcp btbcm crypto_simd cryptd rapl btintel intel_cstate bluetooth drm wmi_bmof ahci i2c_i801 mei_me pl2303
Dec 21 18:10:04 groudon kernel: syscopyarea nvme i2c_smbus input_leds sysfillrect ecdh_generic intel_uncore thunderbolt sysimgblt i2c_core mei nvme_core libahci joydev led_class usbserial ecc fb_sys_fops thermal fan tpm_crb video tpm_tis tpm_tis_core wmi tpm backlight intel_pmc_core acpi_pad acpi_tad button unix [last unloaded: igc]
Dec 21 18:10:04 groudon kernel: ---[ end trace 0000000000000000 ]---
Dec 21 18:10:04 groudon kernel: RIP: 0010:__kmem_cache_alloc_node+0xb5/0x147
Dec 21 18:10:04 groudon kernel: Code: 48 c1 e9 3a 41 39 cf 74 1a 45 89 e8 4c 89 f1 44 89 fa 44 89 e6 48 89 ef e8 ce e9 ff ff 48 89 04 24 eb 25 8b 4d 28 48 8b 7d 00 <48> 8b 1c 08 48 8d 8a 00 01 00 00 65 48 0f c7 0f 0f 94 c0 84 c0 74
Dec 21 18:10:04 groudon kernel: RSP: 0018:ffffc90009dd7bc0 EFLAGS: 00010246
Dec 21 18:10:04 groudon kernel: RAX: dead000000000122 RBX: ffff888100042400 RCX: 0000000000000010
Dec 21 18:10:04 groudon kernel: RDX: 000000000020610e RSI: 0000000000000dc0 RDI: 0000000000030a70
Dec 21 18:10:04 groudon kernel: RBP: ffff888100042400 R08: 0000000000000dc0 R09: ffffffff8297b8c0
Dec 21 18:10:04 groudon kernel: R10: 0000000000000000 R11: 0000000000032140 R12: 0000000000000dc0
Dec 21 18:10:04 groudon kernel: R13: 0000000000000020 R14: ffffffff811fa649 R15: 00000000ffffffff
Dec 21 18:10:04 groudon kernel: FS:  00001516126ac740(0000) GS:ffff88905f580000(0000) knlGS:0000000000000000
Dec 21 18:10:04 groudon kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 21 18:10:04 groudon kernel: CR2: 0000151612737540 CR3: 00000001f3c8c000 CR4: 0000000000752ee0
Dec 21 18:10:04 groudon kernel: PKRU: 55555554

 

Any ideas?

Edited by 0xjams
Link to comment

Could the boot drive have someone to do with this? I ordered a new one just in case. But the 2 seconds I had access to the web admin before the HTTP 500 errors, I got a glimpse of a message saying that it could not find the unique ID of the drive.

 

After the event of the log above, I tried shutting it down via ssh, it accepted the command but it wouldn't finish shutting down. After more than 10 minutes I manually turned it off and on again.

 

As usual, this caused a parity check.

Link to comment

You'll need to run memtest a bit longer than 39 minutes to have a reliable diagnostic there.

I'd do at least 4 passes, depending on how much time you can sacrifice, the more would be the better.

 

But honestly it looks like something on your system is shot from these frequent blackouts.

NUT itself does nothing else than initiate a shutdown in a power loss scenario, so it wouldn't cause any of these problems.

 

There's a ton of kernel-related messages in your logs, most of them seem to have something to do with your GPU.

Once again I'd advise posting in the NVIDIA driver support topic (linked above) so maybe @ich777 can chime in there.

Link to comment
1 hour ago, Rysz said:

can chime in there.

Yes I can. ;)

 

7 hours ago, 0xjams said:

I have no idea how that makes it any different.

Correct my if I'm wrong but you are using this NUC compute module with a PCIe interface on the bottom and connect a GPU to it correct?

 

What is you exact issue? If it's related to the Nvidia Driver plugin I would strongly recommend that you create a post in the Nvidia Driver thread with fresh Diagnostics, also check your /logs folder on your boot device since it will create every time a shutdown failes a Diagnostics file.

Please make also sure that you have Above 4G Decoding and Resizable BAR enabled in your BIOS.

Make also sure that you are on the latest BIOS version.

  • Thanks 1
Link to comment
6 hours ago, ich777 said:

What is you exact issue?

 

So, let me explain my situation with bulletpoints:

  • I have a NUC 13 extreme with a GPU, it runs fine and nothing fails. I can even manually shut it down via the web interface and boot it again just fine.
  • A month or so ago I started having blackouts, so when I did I started to manually turn it off during the extra time the UPS gave me, and this worked fine.
  • I started to automate the process. I configured NUT to turn off the computer, and it worked fine. I would turn it on manually after power was back.
  • I wanted to improve this by configuring the server's BIOS to turn back on after power is restored, when this happened, the server would boot up in an inconsistent state with many errors. To fix this I would have to physically shut it down (sometimes not even keyboard access would work). Then start it, do a parity check, and back to business as usual.

What I've tried so far:

  • Increasing the shutdown time before it Unraid forces shutdown, it was 150 seconds, I increased it to 200 but the problem persisted.
  • I tried seeing if there was a race condition with network access so I manually shut it down, and made both the switch and the server start at the same time (without a blackout), but the problem did not show up.
  • Memtest (I need to run it for longer, though).

I have a blackout scheduled for today. This is my plan to test:

  • I've increased the time before forcing shutdown to 600 seconds to see how that goes.
  • I disabled the autostart on the BIOS to see if the problem does not occur if I manually turn it on even after being shut down by NUT. From my experience, this should make the problem disappear, but we'll see.
  • I stopped all my VMs to see if maybe speeding up the shutdown process will help somehow.

If this doesn't work, I still want to try the following:

  • Disable NVidia driver altogether and see if that fixes the problem even when shutting down and turning on happen automatically.
  • Change the NVidia driver's version, I'm using latest maybe I could switch to stable.

Any other ideas I could try_

 

 

 

Link to comment
2 hours ago, 0xjams said:

I wanted to improve this by configuring the server's BIOS to turn back on after power is restored, when this happened, the server would boot up in an inconsistent state with many errors. To fix this I would have to physically shut it down (sometimes not even keyboard access would work). Then start it, do a parity check, and back to business as usual.

So it started to be unreliable after a power outage correct?

 

2 hours ago, 0xjams said:

Change the NVidia driver's version, I'm using latest maybe I could switch to stable.

This won't change much.

 

2 hours ago, 0xjams said:

Any other ideas I could try_

Shut down the server and wait for you power outage, after that start the server again, let it run for a day and then post your Diagnostics again.

 

Have you yet looked on your USB Boot device if in the folder /boot/logs are any Diagnostics? If so please post them.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...