Jump to content
NOLA_DireWolff

Unraid 6.8.2 - 2 Hard Crashes and 1 Soft in 2 wks Syslog Attached

4 posts in this topic Last Reply

Recommended Posts

I have had 2 Hard Crashes and 1 Soft crash in the last two weeks.  I have had 100% reliability over the last year with no crashes.

 

There have been no hardware changes in months.  The most recent change was added Deluge docker.  It doesn't run 24/7.  ShinobiCCTV docker was the addition prior to that.  There were no crashes before Deluge, but I'd prefer to actually learn something than just guess.  I experienced a hard crash while connected via Wireguard yesterday.  When I got home, the fans were blowing hard, there was no web access, no SSH, no WinSCP.  I had to hard reset.

 

I restarted everything, Shinobi and Deluge (for testing) and watched plex.  I started the syslog to Cache before bed, no issues.

 

I woke up this morning - WebGUI is semi-responsive.  The UNraid animation is present over some sections and won't finish it's job.  Unassigned devices doesn't show me anything.  WinSCP doesn't work.  SSH doesn't work.  Docker section shows Unraid animation.  I only left Plex, Shinobi and Deluge running over night.  VM section won't populate, but nothing was running.  Terminal from inside the GUI does appear to work!  I want to reboot it, but I'll wait to see if there are any other terminal commands you want me to run.  

 

The syslog is attached.  I'm new to this - your help is appreciated :)  I may not know much, but the sections below are repeated almost every minute and doesn't seem necessary - is this a hint?

If you are curious about my pinning:

 

I currently have 12 CPU threads.  Six are isolated and reserved for my Win10Gaming VM.  The other six are available to Unraid.  Shinobi is pinned to 2 and 7. 

 

Feb 13 23:52:08 GLADOS kernel: ? cpuidle_enter_state+0xbf/0x141
Feb 13 23:52:08 GLADOS kernel: do_idle+0x17e/0x1fc
Feb 13 23:52:08 GLADOS kernel: cpu_startup_entry+0x6a/0x6c
Feb 13 23:52:08 GLADOS kernel: start_secondary+0x197/0x1b2
Feb 13 23:52:08 GLADOS kernel: secondary_startup_64+0xa4/0xb0
Feb 13 23:55:05 GLADOS kernel: rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-... 11-... } 1322496 jiffies s: 721 root: 0x801/.
Feb 13 23:55:05 GLADOS kernel: rcu: blocking rcu_node structures:
Feb 13 23:55:05 GLADOS kernel: Task dump for CPU 0:
Feb 13 23:55:05 GLADOS kernel: kworker/u24:3   R  running task        0 20413      2 0x80000008
Feb 13 23:55:05 GLADOS kernel: Workqueue: events_power_efficient gc_worker
Feb 13 23:55:05 GLADOS kernel: Call Trace:
Feb 13 23:55:05 GLADOS kernel: ? process_one_work+0x16e/0x24f
Feb 13 23:55:05 GLADOS kernel: ? worker_thread+0x1e2/0x2b8
Feb 13 23:55:05 GLADOS kernel: ? rescuer_thread+0x2a7/0x2a7
Feb 13 23:55:05 GLADOS kernel: ? kthread+0x10c/0x114
Feb 13 23:55:05 GLADOS kernel: ? kthread_park+0x89/0x89
Feb 13 23:55:05 GLADOS kernel: ? ret_from_fork+0x35/0x40
Feb 13 23:55:05 GLADOS kernel: Task dump for CPU 11:\

 

 

What is this?

Feb 14 04:20:09 GLADOS kernel: RIP: 0010:nf_conntrack_tuple_taken+0x7f/0x221
Feb 14 04:20:09 GLADOS kernel: Code: fb 49 c1 ef 20 4b 8d 04 fe 49 c7 c6 f0 ff ff ff 48 8b 18 f6 c3 01 0f 85 8a 01 00 00 0f b6 43 37 4c 89 f7 48 89 c1 48 6b c0 38 <48> 29 c7 48 01 df 48 39 ef 0f 84 65 01 00 00 48 8b 05 12 f2 88 00
Feb 14 04:20:09 GLADOS kernel: RSP: 0000:ffff88903bac3a08 EFLAGS: 00000202
Feb 14 04:20:09 GLADOS kernel: RAX: 0000000000000038 RBX: ffff888dbb62c7c8 RCX: 0000000000000001
Feb 14 04:20:09 GLADOS kernel: RDX: 00000000000f020a RSI: 0000000000000000 RDI: fffffffffffffff0
Feb 14 04:20:09 GLADOS kernel: RBP: ffff888dbb62d2c0 R08: 0000000000000003 R09: ffffffff81c8a9c0
Feb 14 04:20:09 GLADOS kernel: R10: 0000000000000001 R11: ffff888ffdec4400 R12: ffff88903bac3a48
Feb 14 04:20:09 GLADOS kernel: R13: ffffffff81e91140 R14: fffffffffffffff0 R15: 000000000000cddc
Feb 14 04:20:09 GLADOS kernel: FS:  0000000000000000(0000) GS:ffff88903bac0000(0000) knlGS:0000000000000000
Feb 14 04:20:09 GLADOS kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 14 04:20:09 GLADOS kernel: CR2: 0000000000000000 CR3: 0000000001e0a001 CR4: 00000000003606e0
Feb 14 04:20:09 GLADOS kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Feb 14 04:20:09 GLADOS kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Feb 14 04:20:09 GLADOS kernel: Call Trace:

 

 

Whats next?  :D  🔎

 

 

syslog.log

Edited by NOLA_DireWolff
Corrected Sentence :)

Share this post


Link to post

Digging around in other terminal stuff for high CPU items:

 

From PS -f -aux

 

root     20413 96.3  0.0      0     0 ?        R    Feb13 505:32  \_ [kworker/u24:3+events_power_efficient]

 

and 

 

root      2184  0.0  0.0   2532  1636 ?        Ss   Feb13   0:00 /usr/sbin/crond
root     20187  0.0  0.0   3748  2860 ?        S    Feb13   0:00  \_ /bin/sh -c /usr/local/emhttp/plugins/dynamix.local.master/scripts/localmaster &> /dev/null
root     20190  0.0  0.0 104628 26088 ?        SL   Feb13   0:00  |   \_ /usr/bin/php -q /usr/local/emhttp/plugins/dynamix.local.master/scripts/localmaster
root     20196  0.0  0.0   3748  2904 ?        S    Feb13   0:00  |       \_ sh -c nmblookup -M -- - 2>/dev/null|grep -Pom1 '^\S+'
root     20197  0.0  0.0  34044 11728 ?        D    Feb13   0:00  |           \_ nmblookup -M -- -
root     20198  0.0  0.0   3332  1700 ?        S    Feb13   0:00  |           \_ grep -Pom1 ^\S+
root     20188  0.0  0.0   3748  2756 ?        S    Feb13   0:00  \_ /bin/sh -c /usr/local/emhttp/plugins/dynamix.system.stats/scripts/sa1 1 1 &> /dev/null
root     20191 99.8  0.0   4016  2080 ?        R    Feb13 501:07  |   \_ /usr/lib64/sa/sadc -F -L 1 1 /var/sa

 

and 

root     14044 31.8  0.0 1269016 19428 ?       Rl   Feb13 259:21 /usr/sbin/libvirtd -d -l -f /etc/libvirt/libvirtd.conf -p /var/run/libvirt/libvirtd.pid
nobody   14158  0.0  0.0   7028  1956 ?        S    Feb13   0:00 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper
root     14159  0.0  0.0   6896   240 ?        S    Feb13   0:00  \_ /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper
root     14861  0.0  0.0 101920 11684 ?        Ss   Feb13   0:00 php-fpm: master process (/etc/php-fpm/php-fpm.conf)
root      2901  0.0  0.0 105016 13160 ?        S    05:29   0:00  \_ php-fpm: pool www
root      2939  0.0  0.0   3840  2816 ?        S    05:29   0:00  |   \_ sh -c /usr/bin/timeout 5 /usr/bin/lsof '/mnt/disks/downloads' 2>/dev/null | /bin/sort -k8 | /bin/uniq -f7 | /bin/grep -c -e REG
root      2940  0.0  0.0   2644   816 ?        S    05:29   0:00  |       \_ /usr/bin/timeout 5 /usr/bin/lsof /mnt/disks/downloads
root      2944  0.0  0.0   2748  1864 ?        D    05:29   0:00  |       |   \_ /usr/bin/lsof /mnt/disks/downloads
root      2951  0.0  0.0      0     0 ?        Z    05:29   0:00  |       |       \_ [lsof] <defunct>
root      2941  0.0  0.0  19188   800 ?        S    05:29   0:00  |       \_ /bin/sort -k8
root      2942  0.0  0.0   2456   760 ?        S    05:29   0:00  |       \_ /bin/uniq -f7
root      2943  0.0  0.0   3268  1716 ?        S    05:29   0:00  |       \_ /bin/grep -c -e REG
root      3261  0.0  0.0 105276 13804 ?        S    05:30   0:00  \_ php-fpm: pool www
root      4459  0.0  0.0   3840  2920 ?        S    05:33   0:00  |   \_ sh -c sar 1 1 -u -b -r -n DEV|grep -a '^A'|tr -d '\0'|awk '$2=="all" {u=$3;n=$4;s=$5;}; $2=="tps" {getline;r=$5;w=$6;}; $2=="kbmemfree" {getline;f=$2;c=$6+$7;d=$4;}; $2=="eth0" {x=$5;y=$6;} END{pri
root      4460  0.0  0.0   2564   768 ?        S    05:33   0:00  |       \_ sar 1 1 -u -b -r -n DEV
root      4464 99.5  0.0   4016  2196 ?        R    05:33 141:57  |       |   \_ sadc 1 2 -Z -S A_NULL A_CPU A_IO A_MEMORY A_NET_DEV
root      4461  0.0  0.0   3268  1696 ?        S    05:33   0:00  |       \_ grep -a ^A
root      4462  0.0  0.0   2472   696 ?        S    05:33   0:00  |       \_ tr -d \0
root      4463  0.0  0.0  10032  2736 ?        S    05:33   0:00  |       \_ awk $2=="all" {u=$3;n=$4;s=$5;}; $2=="tps" {getline;r=$5;w=$6;}; $2=="kbmemfree" {getline;f=$2;c=$6+$7;d=$4;}; $2=="eth0" {x=$5;y=$6;} END{print u,n,s"\n"r,w"\n"f,c,d"\n"x,y}
root      3509  0.0  0.0 105016 13556 ?        S    05:31   0:00  \_ php-fpm: pool www
root      3857  0.0  0.0   3840  2920 ?        S    05:32   0:00  |   \_ sh -c /usr/bin/timeout 5 /usr/bin/lsof '/mnt/disks/downloads' 2>/dev/null | /bin/sort -k8 | /bin/uniq -f7 | /bin/grep -c -e REG
root      3860  0.0  0.0   2644   748 ?        S    05:32   0:00  |       \_ /usr/bin/timeout 5 /usr/bin/lsof /mnt/disks/downloads
root      3864  0.0  0.0   2748  1740 ?        D    05:32   0:00  |       |   \_ /usr/bin/lsof /mnt/disks/downloads
root      3865  0.0  0.0      0     0 ?        Z    05:32   0:00  |       |       \_ [lsof] <defunct>
root      3861  0.0  0.0  19188   800 ?        S    05:32   0:00  |       \_ /bin/sort -k8
root      3862  0.0  0.0   2456   760 ?        S    05:32   0:00  |       \_ /bin/uniq -f7
root      3863  0.0  0.0   3268  1684 ?        S    05:32   0:00  |       \_ /bin/grep -c -e REG
root      4419  0.0  0.0 105276 13748 ?        S    05:33   0:00  \_ php-fpm: pool www
root      4523  0.0  0.0   3840  2968 ?        S    05:33   0:00  |   \_ sh -c sar 1 1 -u -b -r -n DEV|grep -a '^A'|tr -d '\0'|awk '$2=="all" {u=$3;n=$4;s=$5;}; $2=="tps" {getline;r=$5;w=$6;}; $2=="kbmemfree" {getline;f=$2;c=$6+$7;d=$4;}; $2=="eth0" {x=$5;y=$6;} END{pri
root      4525  0.0  0.0   2564   764 ?        S    05:33   0:00  |       \_ sar 1 1 -u -b -r -n DEV
root      4566 99.9  0.0   4016  2272 ?        R    05:33 142:34  |       |   \_ sadc 1 2 -Z -S A_NULL A_CPU A_IO A_MEMORY A_NET_DEV

 

 

Share this post


Link to post

And the top HTOP user PPIDs and other info:

 

ps -ef output:

 

root     14044         1 33 Feb13 ?        04:40:13 /usr/sbin/libvirtd -d -l -f /etc/libvirt/libvirtd.conf -p /var/run/libvirt/libvirtd.pid

root     20191 20188 99 Feb13 ?        08:31:31 /usr/lib64/sa/sadc -F -L 1 1 /var/sa

root      4464  4460 99 05:33 ?        02:32:21 sadc 1 2 -Z -S A_NULL A_CPU A_IO A_MEMORY A_NET_DEV

root      4566  4525 99 05:33 ?        02:33:02 sadc 1 2 -Z -S A_NULL A_CPU A_IO A_MEMORY A_NET_DEV

 

Other strange thing PID 14176 shows as active in htop but it doesn't show in the ps output?  :/  I'm learning, but maybe I'm not understanding something.  

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.