Unraid server becomes unresponsive, have to do physical reset/reboot


Recommended Posts

Not sure why my unraid server is becoming unresponsive after some time (a day, sometimes few days). All new parts. I attached zip file but this was after reboot. Not sure how I can generate log file before it becomes unresponsive.

 

i5-11600k (stock clocks)

XMP RAM profile 3200mhz

Using igpu

Usb 2.0 flash drive

z590 motherboard gigabyte

 

Unraid version: 6.9.2

 

 

ur1-diagnostics-20220119-0822.zip

Link to comment

Sounds about right, buggy linux intel IGPU drivers + plex hardware transcoding. I don't really want to give up HW transcoding with Plex. I guess wait until a new linux gpu driver? Could be unraid related maybe but sounds more like linux intel GPU buggy drivers. Also probably doesn't help I am 11th gen intel which isn't officially support by unraid yet either.

 

Do you know how to turn on some type of logging so I can try to produce an error report before it becomes unresponsive?

Edited by snailtrails
Link to comment
28 minutes ago, doobyns said:

do you use plex ? if yes the latest versions are plagued by bugs, stick to version 1.24.2.4973, it's one of the last without the bug.

 

I am on Plex Version 1.25.3.5409

34 minutes ago, JorgeB said:

According to the bug report it just crashes without spitting anything to the log.

 

Ok thanks. I will just have to sit tight until then.

Edited by snailtrails
Link to comment
  • 2 weeks later...

I'm been doing a bunch of testing after replacing my Xeon E5-2696v2 + GPU and Supermicro motherboard setup with a Z590 + i5-11400 combo using Quicksync and have arrived at a stable solution that allows me to hardware transcode with the latest Plex builds.

 

I have issues with any Unraid release from 6.10.0-RC1 including RC2 and the latest test builds beyond that. Regardless of my setup including blacklisting i915 in favour of GPU Top I still have crashes when Plex uses hardware transcoding (incl current build 1.25.5.5492). My server will hang (no video, keyboard, mouse) and drop from the network. Requires hard reset.

 

My solution:

Unraid: 6.9.2

Plex 1.25.5.5492

config/modprobe.d/i915.conf = options i915 force_probe=4c8b

no GPU top or GPU statistics installed

 

My force_probe=4c8b because when I tried 4c8a I got a message in dmesg that told me to use 4c8b instead. My CPU config is below for reference.

 

Hope this helps someone - this was very frustrating going from an incredibly stable Xeon build to an i5-11400 and having unpredictable crashes without anything in the logs.

 

Cheers!

 

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   39 bits physical, 48 bits virtual
CPU(s):                          12
On-line CPU(s) list:             0-11
Thread(s) per core:              2
Core(s) per socket:              6
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           167
Model name:                      11th Gen Intel(R) Core(TM) i5-11400 @ 2.60GHz
Stepping:                        1
CPU MHz:                         4203.711
CPU max MHz:                     4400.0000
CPU min MHz:                     800.0000
BogoMIPS:                        5184.00
Virtualization:                  VT-x
L1d cache:                       288 KiB
L1i cache:                       192 KiB
L2 cache:                        3 MiB
L3 cache:                        12 MiB
NUMA node0 CPU(s):               0-11
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled v
                                 ia prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user
                                  pointer sanitization
Vulnerability Spectre v2:        Mitigation; Enhanced IBRS, IBPB conditional, RS
                                 B filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtr
                                 r pge mca cmov pat pse36 clflush dts acpi mmx f
                                 xsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rd
                                 tscp lm constant_tsc art arch_perfmon pebs bts
                                 rep_good nopl xtopology nonstop_tsc cpuid aperf
                                 mperf tsc_known_freq pni pclmulqdq dtes64 monit
                                 or ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr
                                 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc
                                 _deadline_timer aes xsave avx f16c rdrand lahf_
                                 lm abm 3dnowprefetch cpuid_fault invpcid_single
                                  ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow
                                 vnmi flexpriority ept vpid ept_ad fsgsbase tsc_
                                 adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx
                                 512f avx512dq rdseed adx smap avx512ifma clflus
                                 hopt intel_pt avx512cd sha_ni avx512bw avx512vl
                                  xsaveopt xsavec xgetbv1 xsaves dtherm ida arat
                                  pln pts hwp hwp_notify hwp_act_window hwp_epp
                                 hwp_pkg_req avx512vbmi umip pku ospke avx512_vb
                                 mi2 gfni vaes vpclmulqdq avx512_vnni avx512_bit
                                 alg avx512_vpopcntdq rdpid fsrm md_clear flush_
                                 l1d arch_capabilities

 

Link to comment
On 2/6/2022 at 4:12 PM, akawoz said:

I'm been doing a bunch of testing after replacing my Xeon E5-2696v2 + GPU and Supermicro motherboard setup with a Z590 + i5-11400 combo using Quicksync and have arrived at a stable solution that allows me to hardware transcode with the latest Plex builds.

 

I have issues with any Unraid release from 6.10.0-RC1 including RC2 and the latest test builds beyond that. Regardless of my setup including blacklisting i915 in favour of GPU Top I still have crashes when Plex uses hardware transcoding (incl current build 1.25.5.5492). My server will hang (no video, keyboard, mouse) and drop from the network. Requires hard reset.

 

My solution:

Unraid: 6.9.2

Plex 1.25.5.5492

config/modprobe.d/i915.conf = options i915 force_probe=4c8b

no GPU top or GPU statistics installed

 

My force_probe=4c8b because when I tried 4c8a I got a message in dmesg that told me to use 4c8b instead. My CPU config is below for reference.

 

Hope this helps someone - this was very frustrating going from an incredibly stable Xeon build to an i5-11400 and having unpredictable crashes without anything in the logs.

 

Cheers!

 

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   39 bits physical, 48 bits virtual
CPU(s):                          12
On-line CPU(s) list:             0-11
Thread(s) per core:              2
Core(s) per socket:              6
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           167
Model name:                      11th Gen Intel(R) Core(TM) i5-11400 @ 2.60GHz
Stepping:                        1
CPU MHz:                         4203.711
CPU max MHz:                     4400.0000
CPU min MHz:                     800.0000
BogoMIPS:                        5184.00
Virtualization:                  VT-x
L1d cache:                       288 KiB
L1i cache:                       192 KiB
L2 cache:                        3 MiB
L3 cache:                        12 MiB
NUMA node0 CPU(s):               0-11
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled v
                                 ia prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user
                                  pointer sanitization
Vulnerability Spectre v2:        Mitigation; Enhanced IBRS, IBPB conditional, RS
                                 B filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtr
                                 r pge mca cmov pat pse36 clflush dts acpi mmx f
                                 xsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rd
                                 tscp lm constant_tsc art arch_perfmon pebs bts
                                 rep_good nopl xtopology nonstop_tsc cpuid aperf
                                 mperf tsc_known_freq pni pclmulqdq dtes64 monit
                                 or ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr
                                 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc
                                 _deadline_timer aes xsave avx f16c rdrand lahf_
                                 lm abm 3dnowprefetch cpuid_fault invpcid_single
                                  ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow
                                 vnmi flexpriority ept vpid ept_ad fsgsbase tsc_
                                 adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx
                                 512f avx512dq rdseed adx smap avx512ifma clflus
                                 hopt intel_pt avx512cd sha_ni avx512bw avx512vl
                                  xsaveopt xsavec xgetbv1 xsaves dtherm ida arat
                                  pln pts hwp hwp_notify hwp_act_window hwp_epp
                                 hwp_pkg_req avx512vbmi umip pku ospke avx512_vb
                                 mi2 gfni vaes vpclmulqdq avx512_vnni avx512_bit
                                 alg avx512_vpopcntdq rdpid fsrm md_clear flush_
                                 l1d arch_capabilities

 


Unfortunately this did not make a difference for me. Once I started doing some hardware transcoding the server once again locked up after about 15 minutes.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.