Jump to content

Unraid server crash - happened when running Live Memory Tester & monitoring syslog


Recommended Posts

Hello,

 

I have experienced 4 random crashes in the last 2 weeks. Today I run memtest86 for a few hours, with no errors. Then I tested the Live Memory Tester for a few minutes, and the server crashed with these updates in the syslog:

 

Does this mean something to someone? 🙂 Why is nvidia mentioned? Thanks!

 

###

 

Aug 25 15:01:16 UnraidHippo ool www[6989]: /usr/local/emhttp/plugins/dwmemtester/scripts/start '1' '0' '' '' '' '' '8G' ''
Aug 25 15:01:16 UnraidHippo memtester-runner: memory testing started with parameters: 8G
Aug 25 15:07:18 UnraidHippo kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
Aug 25 15:07:18 UnraidHippo kernel: #PF: supervisor read access in kernel mode
Aug 25 15:07:18 UnraidHippo kernel: #PF: error_code(0x0000) - not-present page
Aug 25 15:07:18 UnraidHippo kernel: PGD 80000003660bd067 P4D 80000003660bd067 PUD 374570067 PMD 0
Aug 25 15:07:18 UnraidHippo kernel: Oops: 0000 [#1] PREEMPT SMP PTI
Aug 25 15:07:18 UnraidHippo kernel: CPU: 0 PID: 21127 Comm: nvidia-smi Tainted: P           O       6.1.99-Unraid #1
Aug 25 15:07:18 UnraidHippo kernel: Hardware name: Hewlett-Packard p6-2265eo/2ADA, BIOS 7.12 06/07/2012
Aug 25 15:07:18 UnraidHippo kernel: RIP: 0010:__cpa_process_fault+0x3ea/0x40e
Aug 25 15:07:18 UnraidHippo kernel: Code: d0 48 c1 e8 0c 49 89 44 24 30 31 db eb 2b 49 8b 7c 24 30 e8 d4 f3 ff ff 84 c0 75 18 49 8b 04 24 48 89 ee
 48 c7 c7 f7 17 0c 82 <48> 8b 10 e8 13 c8 00 00 0f 0b bb f2 ff ff ff 48 83 c4 30 89 d8 5b
Aug 25 15:07:18 UnraidHippo kernel: RSP: 0018:ffffc9000070b720 EFLAGS: 00010246
Aug 25 15:07:18 UnraidHippo kernel: RAX: 0000000000000000 RBX: ffffc9000070b890 RCX: 0000000080000000
Aug 25 15:07:18 UnraidHippo kernel: RDX: 0000000000001001 RSI: ffff11030e3ee000 RDI: ffffffff820c17f7
Aug 25 15:07:18 UnraidHippo kernel: RBP: ffff11030e3ee000 R08: ffffc9000070b7de R09: ffffffff8220a110
Aug 25 15:07:18 UnraidHippo kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffc9000070b890
Aug 25 15:07:18 UnraidHippo kernel: R13: 8000000231c07073 R14: 0000000000000000 R15: 0000000000000000
Aug 25 15:07:18 UnraidHippo kernel: FS:  000014880d7fe1c0(0000) GS:ffff88840ec00000(0000) knlGS:0000000000000000
Aug 25 15:07:18 UnraidHippo kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 25 15:07:18 UnraidHippo kernel: CR2: 0000000000000000 CR3: 00000003c880a001 CR4: 00000000001706f0
Aug 25 15:07:18 UnraidHippo kernel: Call Trace:
Aug 25 15:07:18 UnraidHippo kernel: <TASK>
Aug 25 15:07:18 UnraidHippo kernel: ? __die_body+0x1a/0x5c  

Link to comment

I am now fairly sure it is somehow related to the Nvidia driver/plugin. With the driver uninstalled, the server has not hang/crashed once. If I install it back, the problem usually appears within hours. It crashes even if there is no transcoding going on in the Plex server (which is the only thing I use the GPU for).

 

When it happens, I need to hold the power-button for a few seconds to shut the server down. The flash boot drive got corrupted once, so I had to restore it from the cloud backup.

 

Any suggestions what to try? Older Nvidia driver versions?

 

Thanks.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...