[SOLVED][UNRAID 6.10 RC2] - Server "dies" after being left alone?

Squid · January 29, 2022

Does this consistently happen during a parity check? If so, I would restart in Safe Mode and disable all VMs and docker apps from running and try a parity check again.

Biomatrix · January 29, 2022

10 minutes ago, Squid said:

Does this consistently happen during a parity check? If so, I would restart in Safe Mode and disable all VMs and docker apps from running and try a parity check again.

you know what - I don't know.

I've only recently built this server; and have been chasing little issues(had zero issues on my dell) I can't say it's consistantly with a Parity, because sometimes I come back when i've started a partity to a clean run...

but it's worth a shot for sure. I will attempt this and come back with more (or less?) information!

Squid · January 29, 2022

1 hour ago, Biomatrix said:

you know what - I don't know.

Trouble is that hard lockups / reboots are always a guessing game unless there's some error that precedes them. Safe Mode (since you're using the nVidia plugin) and VMs with passthrough being disabled significantly narrows down issues

WizADSL · January 30, 2022

You may also want to run a memory test.

Biomatrix · January 30, 2022

23 hours ago, Squid said:

Trouble is that hard lockups / reboots are always a guessing game unless there's some error that precedes them. Safe Mode (since you're using the nVidia plugin) and VMs with passthrough being disabled significantly narrows down issues

yea...
new syslog - this is where it crashes/reboots
after 5 hours of idle...

Jan 30 05:24:16 Fermentor emhttpd: read SMART /dev/sde
Jan 30 05:25:16 Fermentor kernel: sd 11:0:3:0: attempting task abort!scmd(0x00000000de361d93), outstanding for 60341 ms & timeout 60000 ms
Jan 30 05:25:16 Fermentor kernel: sd 11:0:3:0: [sde] tag#1003 CDB: opcode=0x4d 4d 00 40 00 00 00 00 00 04 00
Jan 30 05:25:16 Fermentor kernel: scsi target11:0:3: handle(0x000d), sas_address(0x5000c50084f99f3d), phy(7)
Jan 30 05:25:16 Fermentor kernel: scsi target11:0:3: enclosure logical id(0x50030480091bdf7f), slot(7) 
Jan 30 05:25:16 Fermentor kernel: scsi target11:0:3: enclosure level(0x0000), connector name(     )
Jan 30 05:25:16 Fermentor kernel: sd 11:0:3:0: task abort: SUCCESS scmd(0x00000000de361d93)
Jan 30 05:30:14 Fermentor kernel: Linux version 5.14.15-Unraid (root@Develop) (gcc (GCC) 11.2.0, GNU ld version 2.37-slack15) #1 SMP Thu Oct 28 09:56:33 PDT 2021
Jan 30 05:30:14 Fermentor kernel: Command line: BOOT_IMAGE=/bzimage initrd=/bzroot
Jan 30 05:30:14 Fermentor kernel: x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
Jan 30 05:30:14 Fermentor kernel: x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
Jan 30 05:30:14 Fermentor kernel: x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
Jan 30 05:30:14 Fermentor kernel: x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
Jan 30 05:30:14 Fermentor kernel: x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
Jan 30 05:30:14 Fermentor kernel: signal: max sigframe size: 1776
Jan 30 05:30:14 Fermentor kernel: BIOS-provided physical RAM map:

So... i'm going down the line; going to disable Nvidia.

14 hours ago, WizADSL said:

You may also want to run a memory test.

Memory tested good - all 384GB of it... LOL
can redo a test again.

there was no change in the system from the previous 64GB ram, to this 384GB from the dell, the crashes happened then too.

I do apprecate everyone!

Biomatrix · February 1, 2022

UPDATE :
so I have disabled 2 user scripts that I brought over from my older box.
disabling them seems to have fixed it (so far)
Nvidia was the last peice that wasn't really disabled - so I did these scripts first...

I don't even know if they are nessary anymore?

Nvidia-Persistance-First

#!/bin/bash
nvidia-smi --persistence-mode=1

Nvidia-Power-Reduction

#!/bin/bash
gpupstate=$(nvidia-smi --query-gpu="pstate" --format=csv,noheader);
gpupid=$(nvidia-smi --query-compute-apps="pid" --format=csv,noheader);
if [ "$gpupstate" == "P0" ] && [ -z "$gpupid" ]; then nvidia-smi -pm 1; fuser -kv /dev/nvidia*; fi;

Either way; I will come back in another day or two to report if nothing else.

Biomatrix · February 2, 2022

so, I am closing this - I have not experenced even a studder since I removed those 2 items,

Biomatrix · February 2, 2022

Changed Status to Solved

[SOLVED][UNRAID 6.10 RC2] - Server "dies" after being left alone?

User Feedback

Recommended Comments

Squid 4,999

Link to comment

Biomatrix 2

Link to comment

Squid 4,999

Link to comment

WizADSL 3

Link to comment

Biomatrix 2

Link to comment

Biomatrix 2

Link to comment

Biomatrix 2

Link to comment

Biomatrix 2

Link to comment

Join the conversation