Jump to content

Kernel panic on boot


Recommended Posts

I've had quite a few problems with my install of Unraid 6.9 and 6.10 RC1-4 but lately it has been running quite smoothly. Until yesterday morning that is, when I realized my server had crashed again. Kernel panic at approx 5 AM, what a lovely way to start the day. I restarted the server and waited to see it come back up but no, there was a new kernel panic during boot (even during Safe boot). Can somebody please help me sort this out?

 

I can't really make anything out of the information from the first KP, the second one seems to stem from udev and some kind of disk failure (?) during boot? Needless to say, I can't access the server now. 

 

AMD Ryzen 3600X, Asus ROG Strix B450-F Gaming (BIOS updated in February 2022), 48 GB RAM and a LSI 9240-8i HBA in IT mode. I've had my fair share of panics due to power management issues but it has been running for a month or so now. 

 

Can anyone please shed some light on this?

IMG_20220518_091251.jpg

IMG_20220518_113337.jpg

Link to comment

Alright, I did what you suggested and installed 6.10 stable on my USB. It booted fine the first time but once I copy my config folder on to the USB, it throws a series of error messages in quick succession before panicking again. It's far too fast for me to register any details, even if I knew what to look for. 

 

I'll try removing all USB devices and then all disks to see if it helps. 

Edited by emilgil
Link to comment

Ok, so no luck today. Tried removing all my USB devices but ended up in kernel panic anyway. Tried removing all my HDDs too but ended up in kernel panic. There are still two m.2 SSDs connected but only because they're such a pain to remove. I can't even add the key file to the USB memory without it panicking. 

 

I can't help thinking about HW issues, could my USB memory be corrupt even if I can write the Unraid image to it? A new one is only €12 so I might give that a try. Or should I ditch my AMD gear and spend another €300 on an Intel CPU and motherboard? 

Edited by emilgil
Link to comment
On 5/18/2022 at 7:39 PM, emilgil said:

Alright, I did what you suggested and installed 6.10 stable on my USB. It booted fine the first time but once I copy my config folder on to the USB, it throws a series of error messages in quick succession before panicking again.

This suggests a problem with the current config, you might try just copying super.dat and the key and reconfigure the server, you can also try to copy a few more config files but a few at a time to see if you can find what's breaking it.

Link to comment
14 minutes ago, JorgeB said:

You can't restore a config with a trial key, only a paid one.

Yeah, I realized that there was a separate key file for the paid license. I've managed to start the array, time to run a parity check before I move on restoring other settings and disks.

  • Like 1
Link to comment
  • 2 weeks later...

Ok, so I had two more panics a few days ago. I didn't manage to catch any details on the first one but after that I enabled the syslog server. Unfortunately it doesn't give much information. I have attached a screenshot of the second panic as well as  a diagnostics dump taken between the two panics. 

 

Quote

Jun  1 10:06:00 Tower avahi-daemon[15417]: Joining mDNS multicast group on interface veth33b83a0.IPv6 with address fe80::800a:7eff:fe57:dc32.
Jun  1 10:06:00 Tower avahi-daemon[15417]: New relevant interface veth33b83a0.IPv6 for mDNS.
Jun  1 10:06:00 Tower avahi-daemon[15417]: Registering new address record for fe80::800a:7eff:fe57:dc32 on veth33b83a0.*.
Jun  1 10:06:03 Tower kernel: br-5ad5dd5266c5: port 5(veth33b83a0) entered disabled state
Jun  1 10:06:03 Tower kernel: vethcc3a819: renamed from eth0
Jun  1 10:06:03 Tower avahi-daemon[15417]: Interface veth33b83a0.IPv6 no longer relevant for mDNS.
Jun  1 10:06:03 Tower avahi-daemon[15417]: Leaving mDNS multicast group on interface veth33b83a0.IPv6 with address fe80::800a:7eff:fe57:dc32.
Jun  1 10:06:03 Tower kernel: br-5ad5dd5266c5: port 5(veth33b83a0) entered disabled state
Jun  1 10:06:03 Tower kernel: device veth33b83a0 left promiscuous mode
Jun  1 10:06:03 Tower kernel: br-5ad5dd5266c5: port 5(veth33b83a0) entered disabled state
Jun  1 10:06:03 Tower avahi-daemon[15417]: Withdrawing address record for fe80::800a:7eff:fe57:dc32 on veth33b83a0.
Jun  1 18:01:41 Tower unassigned.devices: Mounting 'Auto Mount' Devices...
Jun  1 18:01:41 Tower unassigned.devices: Disk with ID 'HGST_HMS5C4040BLE640_PL1331LAHBZULH (sdf)' is not set to auto mount.
Jun  1 18:01:41 Tower unassigned.devices: Disk with ID 'ST4000DM000-1F2168_W300WSKA (sdg)' is not set to auto mount.
Jun  1 18:01:41 Tower unassigned.devices: Disk with ID 'WDC_WD80EZAZ-11TDBA0_7SJBREDU (sdi)' is not set to auto mount.
Jun  1 18:01:41 Tower emhttpd: Starting services...
Jun  1 18:01:41 Tower emhttpd: shcmd (277): /etc/rc.d/rc.samba restart
Jun  1 18:01:41 Tower wsdd2[4649]: 'Terminated' signal received.
Jun  1 18:01:41 Tower winbindd[4652]: [2022/06/01 18:01:41.868127,  0] ../../source3/winbindd/winbindd.c:247(winbindd_sig_term_handler)
Jun  1 18:01:41 Tower winbindd[4654]: [2022/06/01 18:01:41.868127,  0] ../../source3/winbindd/winbindd.c:247(winbindd_sig_term_handler)
Jun  1 18:01:41 Tower winbindd[4654]:   Got sig[15] terminate (is_parent=0)
Jun  1 18:01:41 Tower winbindd[4652]:   Got sig[15] terminate (is_parent=1)
Jun  1 18:01:41 Tower winbindd[4727]: [2022/06/01 18:01:41.868185,  0] ../../source3/winbindd/winbindd.c:247(winbindd_sig_term_handler)
Jun  1 18:01:41 Tower wsdd2[4649]: terminating.
Jun  1 18:01:41 Tower winbindd[4727]:   Got sig[15] terminate (is_parent=0)
Jun  1 18:01:42 Tower rsyslogd: [origin software="rsyslogd" swVersion="8.2102.0" x-pid="10782" x-info="https://www.rsyslog.com"] start
Jun  1 18:01:44 Tower root: Starting Samba:  /usr/sbin/smbd -D
Jun  1 18:01:44 Tower smbd[10875]: [2022/06/01 18:01:44.021879,  0] ../../source3/smbd/server.c:1734(main)
Jun  1 18:01:44 Tower smbd[10875]:   smbd version 4.15.7 started.
Jun  1 18:01:44 Tower smbd[10875]:   Copyright Andrew Tridgell and the Samba Team 1992-2021
Jun  1 18:01:44 Tower root:                  /usr/sbin/wsdd2 -d 
Jun  1 18:01:44 Tower wsdd2[10889]: starting.
Jun  1 18:01:44 Tower root:                  /usr/sbin/winbindd -D
Jun  1 18:01:44 Tower winbindd[10890]: [2022/06/01 18:01:44.085748,  0] ../../source3/winbindd/winbindd.c:1722(main)
Jun  1 18:01:44 Tower winbindd[10890]:   winbindd version 4.15.7 started.
Jun  1 18:01:44 Tower winbindd[10890]:   Copyright Andrew Tridgell and the Samba Team 1992-2021
Jun  1 18:01:44 Tower winbindd[10892]: [2022/06/01 18:01:44.089644,  0] ../../source3/winbindd/winbindd_cache.c:3085(initialize_winbindd_cache)
Jun  1 18:01:44 Tower winbindd[10892]:   initialize_winbindd_cache: clearing cache and re-creating with version number 2
Jun  1 18:01:44 Tower emhttpd: shcmd (281): /etc/rc.d/rc.avahidaemon restart
Jun  1 18:01:44 Tower root: Stopping Avahi mDNS/DNS-SD Daemon: stopped
Jun  1 18:01:44 Tower avahi-dnsconfd[4678]: read(): EOF
Jun  1 18:01:44 Tower root: Starting Avahi mDNS/DNS-SD Daemon:  /usr/sbin/avahi-daemon -D
Jun  1 18:01:44 Tower avahi-daemon[10909]: Found user 'avahi' (UID 61) and group 'avahi' (GID 214).
Jun  1 18:01:44 Tower avahi-daemon[10909]: Successfully dropped root privileges.
Jun  1 18:01:44 Tower avahi-daemon[10909]: avahi-daemon 0.8 starting up.
Jun  1 18:01:44 Tower avahi-daemon[10909]: Successfully called chroot().
Jun  1 18:01:44 Tower avahi-daemon[10909]: Successfully dropped remaining capabilities.
Jun  1 18:01:44 Tower avahi-daemon[10909]: Loading service file /services/sftp-ssh.service.
Jun  1 18:01:44 Tower avahi-daemon[10909]: Loading service file /services/smb.service.
Jun  1 18:01:44 Tower avahi-daemon[10909]: Loading service file /services/ssh.service.
Jun  1 18:01:44 Tower avahi-daemon[10909]: Joining mDNS multicast group on interface br0.IPv4 with address 192.168.1.61.
Jun  1 18:01:44 Tower avahi-daemon[10909]: New relevant interface br0.IPv4 for mDNS.
Jun  1 18:01:44 Tower avahi-daemon[10909]: Joining mDNS multicast group on interface lo.IPv6 with address ::1.
Jun  1 18:01:44 Tower avahi-daemon[10909]: New relevant interface lo.IPv6 for mDNS.
Jun  1 18:01:44 Tower avahi-daemon[10909]: Joining mDNS multicast group on interface lo.IPv4 with address 127.0.0.1.
Jun  1 18:01:44 Tower avahi-daemon[10909]: New relevant interface lo.IPv4 for mDNS.
Jun  1 18:01:44 Tower avahi-daemon[10909]: Network interface enumeration completed.
Jun  1 18:01:44 Tower avahi-daemon[10909]: Registering new address record for 192.168.1.61 on br0.IPv4.
Jun  1 18:01:44 Tower avahi-daemon[10909]: Registering new address record for ::1 on lo.*.
Jun  1 18:01:44 Tower avahi-daemon[10909]: Registering new address record for 127.0.0.1 on lo.IPv4.
Jun  1 18:01:44 Tower emhttpd: shcmd (282): /etc/rc.d/rc.avahidnsconfd restart
Jun  1 18:01:44 Tower root: Stopping Avahi mDNS/DNS-SD DNS Server Configuration Daemon: stopped
Jun  1 18:01:44 Tower root: Starting Avahi mDNS/DNS-SD DNS Server Configuration Daemon:  /usr/sbin/avahi-dnsconfd -D
Jun  1 18:01:44 Tower avahi-dnsconfd[10918]: Successfully connected to Avahi daemon.
Jun  1 18:01:44 Tower emhttpd: shcmd (291): /usr/local/sbin/mount_image '/mnt/user/system/docker/' /var/lib/docker 48
Jun  1 18:01:44 Tower emhttpd: shcmd (293): /etc/rc.d/rc.docker start
Jun  1 18:01:44 Tower root: starting dockerd ...
Jun  1 18:01:44 Tower kernel: Bridge firewalling registered

 

IMG_20220601_175549.jpg

tower-diagnostics-20220531-2041.zip

Edited by emilgil
Link to comment
11 hours ago, JorgeB said:

Enable the syslog server and post that after a crash, it might catch something.

Not really. This is from the last crash (earlier today):

 

Quote

Jun  9 12:19:17 Tower kernel: device vethbe5bcc7 entered promiscuous mode
Jun  9 12:19:17 Tower kernel: eth0: renamed from veth9cf5cec
Jun  9 12:19:17 Tower kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethbe5bcc7: link becomes ready
Jun  9 12:19:17 Tower kernel: br-7f0f8378f740: port 2(vethbe5bcc7) entered blocking state
Jun  9 12:19:17 Tower kernel: br-7f0f8378f740: port 2(vethbe5bcc7) entered forwarding state
Jun  9 12:19:19 Tower avahi-daemon[19767]: Joining mDNS multicast group on interface vethbe5bcc7.IPv6 with address fe80::10de:36ff:fe83:eba7.
Jun  9 12:19:19 Tower avahi-daemon[19767]: New relevant interface vethbe5bcc7.IPv6 for mDNS.
Jun  9 12:19:19 Tower avahi-daemon[19767]: Registering new address record for fe80::10de:36ff:fe83:eba7 on vethbe5bcc7.*.
Jun  9 12:19:21 Tower kernel: br-7f0f8378f740: port 2(vethbe5bcc7) entered disabled state
Jun  9 12:19:21 Tower kernel: veth9cf5cec: renamed from eth0
Jun  9 12:19:21 Tower avahi-daemon[19767]: Interface vethbe5bcc7.IPv6 no longer relevant for mDNS.
Jun  9 12:19:21 Tower avahi-daemon[19767]: Leaving mDNS multicast group on interface vethbe5bcc7.IPv6 with address fe80::10de:36ff:fe83:eba7.
Jun  9 12:19:21 Tower kernel: br-7f0f8378f740: port 2(vethbe5bcc7) entered disabled state
Jun  9 12:19:21 Tower kernel: device vethbe5bcc7 left promiscuous mode
Jun  9 12:19:21 Tower kernel: br-7f0f8378f740: port 2(vethbe5bcc7) entered disabled state
Jun  9 12:19:21 Tower avahi-daemon[19767]: Withdrawing address record for fe80::10de:36ff:fe83:eba7 on vethbe5bcc7.
Jun  9 19:44:22 Tower unassigned.devices: Mounting 'Auto Mount' Devices...
Jun  9 19:44:22 Tower unassigned.devices: Partition 'sdc1' does not have a file system and cannot be mounted.
Jun  9 19:44:22 Tower unassigned.devices: Adding partition 'sdc2'... 

Then it seems something happened while I was trying to log in:

Quote

Jun  9 19:44:22 Tower unassigned.devices: Mounting partition 'sdc2' at mountpoint '/mnt/disks/1000GB'...
Jun  9 19:44:23 Tower unassigned.devices: Mount drive command: /sbin/mount -t 'ntfs' -o rw,noatime,nodiratime,nodev,nosuid,nls=utf8,umask=000 '/dev/sdc2' '/mnt/disks/1000GB'
Jun  9 19:44:23 Tower ntfs-3g[27773]: Version 2021.8.22 integrated FUSE 27
Jun  9 19:44:23 Tower ntfs-3g[27773]: Mounted /dev/sdc2 (Read-Write, label "1000GB", NTFS 3.1)
Jun  9 19:44:23 Tower ntfs-3g[27773]: Cmdline options: rw,noatime,nodiratime,nodev,nosuid,nls=utf8,umask=000
Jun  9 19:44:23 Tower ntfs-3g[27773]: Mount options: nodiratime,nodev,nosuid,nls=utf8,allow_other,nonempty,noatime,rw,default_permissions,fsname=/dev/sdc2,blkdev,blksize=4096
Jun  9 19:44:23 Tower ntfs-3g[27773]: Global ownership and permissions enforced, configuration type 1
Jun  9 19:44:23 Tower unassigned.devices: Successfully mounted 'sdc2' on '/mnt/disks/1000GB'.
Jun  9 19:44:23 Tower unassigned.devices: Adding SMB share '1000GB'.
Jun  9 19:44:23 Tower unassigned.devices: Disk with ID 'HGST_HMS5C4040BLE640_PL1331LAHBZULH (sdf)' is not set to auto mount.
Jun  9 19:44:23 Tower unassigned.devices: Disk with ID 'ST4000DM000-1F2168_W300WSKA (sdg)' is not set to auto mount.
Jun  9 19:44:23 Tower unassigned.devices: Disk with ID 'WDC_WD80EZAZ-11TDBA0_7SJBREDU (sdi)' is not set to auto mount.
Jun  9 19:44:23 Tower emhttpd: Starting services...
Jun  9 19:44:23 Tower emhttpd: shcmd (5915): /etc/rc.d/rc.samba restart
Jun  9 19:44:23 Tower wsdd2[5095]: 'Terminated' signal received.
Jun  9 19:44:23 Tower winbindd[5098]: [2022/06/09 19:44:23.324858,  0] ../../source3/winbindd/winbindd.c:247(winbindd_sig_term_handler)
Jun  9 19:44:23 Tower winbindd[5098]:   Got sig[15] terminate (is_parent=1)
Jun  9 19:44:23 Tower winbindd[5100]: [2022/06/09 19:44:23.324867,  0] ../../source3/winbindd/winbindd.c:247(winbindd_sig_term_handler)
Jun  9 19:44:23 Tower winbindd[5100]:   Got sig[15] terminate (is_parent=0)
Jun  9 19:44:23 Tower wsdd2[5095]: terminating.
Jun  9 19:44:23 Tower winbindd[5176]: [2022/06/09 19:44:23.324937,  0] ../../source3/winbindd/winbindd.c:247(winbindd_sig_term_handler)
Jun  9 19:44:23 Tower winbindd[5176]:   Got sig[15] terminate (is_parent=0)
Jun  9 19:44:23 Tower rsyslogd: [origin software="rsyslogd" swVersion="8.2102.0" x-pid="27711" x-info="https://www.rsyslog.com"] start
Jun  9 19:44:25 Tower root: Starting Samba:  /usr/sbin/smbd -D
Jun  9 19:44:25 Tower smbd[27876]: [2022/06/09 19:44:25.480457,  0] ../../source3/smbd/server.c:1734(main)
Jun  9 19:44:25 Tower smbd[27876]:   smbd version 4.15.7 started.
Jun  9 19:44:25 Tower smbd[27876]:   Copyright Andrew Tridgell and the Samba Team 1992-2021
Jun  9 19:44:25 Tower root:                  /usr/sbin/wsdd2 -d
Jun  9 19:44:25 Tower wsdd2[27890]: starting.
Jun  9 19:44:25 Tower root:                  /usr/sbin/winbindd -D
Jun  9 19:44:25 Tower winbindd[27891]: [2022/06/09 19:44:25.550503,  0] ../../source3/winbindd/winbindd.c:1722(main)
Jun  9 19:44:25 Tower winbindd[27891]:   winbindd version 4.15.7 started.
Jun  9 19:44:25 Tower winbindd[27891]:   Copyright Andrew Tridgell and the Samba Team 1992-2021
Jun  9 19:44:25 Tower winbindd[27893]: [2022/06/09 19:44:25.554416,  0] ../../source3/winbindd/winbindd_cache.c:3085(initialize_winbindd_cache)
Jun  9 19:44:25 Tower winbindd[27893]:   initialize_winbindd_cache: clearing cache and re-creating with version number 2
Jun  9 19:47:17 Tower nginx: 2022/06/09 19:47:17 [error] 5175#5175: *5566 upstream timed out (110: Connection timed out) while reading upstream, client: 192.168.1.202, server: , request: "POST /update.htm HTTP/1.1", upstream: "http://unix:/var/run/emhttpd.socket:/update.htm", host: "192.168.1.61", referrer: "http://192.168.1.61/Main"
Jun  9 19:47:19 Tower kernel: general protection fault, probably for non-canonical address 0xfbff888157600a70: 0000 [#2] SMP NOPTI
Jun  9 19:47:19 Tower kernel: CPU: 9 PID: 29474 Comm: sh Tainted: G      D           5.15.43-Unraid #1
Jun  9 19:47:19 Tower kernel: Hardware name: System manufacturer System Product Name/ROG STRIX B450-F GAMING, BIOS 4801 03/02/2022
Jun  9 19:47:19 Tower kernel: RIP: 0010:___slab_alloc+0x2e2/0x5de
Jun  9 19:47:19 Tower kernel: Code: 8b 6b 10 4d 85 ed 0f 85 95 02 00 00 48 8b 44 24 38 48 89 43 10 49 8b 04 24 48 83 c0 20 65 48 03 05 d3 f8 e2 7e 41 8b 44 24 28 <49> 8b 04 07 48 ff 43 08 48 89 03 49 8b 04 24 48 83 c0 20 65 48 03
Jun  9 19:47:19 Tower kernel: RSP: 0018:ffffc9000303fbb0 EFLAGS: 00010086
Jun  9 19:47:19 Tower kernel: RAX: 0000000000000070 RBX: ffff888c0ea70810 RCX: 0000000080200020
Jun  9 19:47:19 Tower kernel: RDX: 0000000000000200 RSI: ffffea00055d8000 RDI: 0000000044042000
Jun  9 19:47:19 Tower kernel: RBP: ffffc9000303fc70 R08: 0000000000000000 R09: 0000000080200020
Jun  9 19:47:19 Tower kernel: R10: 0000000000000202 R11: 0000000000000fe0 R12: ffff8881001cd100
Jun  9 19:47:19 Tower kernel: R13: 00000000ffffffff R14: 0000000000000dc0 R15: fbff888157600a00
Jun  9 19:47:19 Tower kernel: FS:  0000000000000000(0000) GS:ffff888c0ea40000(0000) knlGS:0000000000000000
Jun  9 19:47:19 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun  9 19:47:19 Tower kernel: CR2: 00007fff11310f80 CR3: 000000015a2e2000 CR4: 0000000000350ee0
Jun  9 19:47:19 Tower kernel: Call Trace:
Jun  9 19:47:19 Tower kernel: <TASK>
Jun  9 19:47:19 Tower kernel: ? __alloc_file+0x26/0x9c
Jun  9 19:47:19 Tower kernel: ? rcu_segcblist_enqueue+0x12/0x33
Jun  9 19:47:19 Tower kernel: ? post_alloc_hook+0x21/0x4f
Jun  9 19:47:19 Tower kernel: ? kernel_init_free_pages.part.0+0x48/0x59
Jun  9 19:47:19 Tower kernel: ? prep_new_page+0x1c/0x48
Jun  9 19:47:19 Tower kernel: ? __alloc_file+0x26/0x9c
Jun  9 19:47:19 Tower kernel: kmem_cache_alloc+0x8c/0x175
Jun  9 19:47:19 Tower kernel: __alloc_file+0x26/0x9c
Jun  9 19:47:19 Tower kernel: alloc_empty_file+0x9e/0xd3
Jun  9 19:47:19 Tower kernel: path_openat+0x46/0x94f
Jun  9 19:47:19 Tower kernel: ? cgroup_rstat_updated+0x21/0xa1
Jun  9 19:47:19 Tower kernel: ? cgroup_rstat_updated+0x21/0xa1
Jun  9 19:47:19 Tower kernel: do_filp_open+0x53/0xb0
Jun  9 19:47:19 Tower kernel: ? get_page+0x5/0xa
Jun  9 19:47:19 Tower kernel: ? lru_cache_add+0x35/0x54
Jun  9 19:47:19 Tower kernel: ? set_pte+0x5/0x8
Jun  9 19:47:19 Tower kernel: ? slab_post_alloc_hook+0x50/0x157
Jun  9 19:47:19 Tower kernel: ? getname_flags+0x29/0x150
Jun  9 19:47:19 Tower kernel: ? kmem_cache_alloc+0xff/0x175
Jun  9 19:47:19 Tower kernel: do_sys_openat2+0x72/0xde
Jun  9 19:47:19 Tower kernel: do_sys_open+0x3b/0x58
Jun  9 19:47:19 Tower kernel: do_syscall_64+0x83/0xa5
Jun  9 19:47:19 Tower kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
Jun  9 19:47:19 Tower kernel: RIP: 0033:0x148e08a24654
Jun  9 19:47:19 Tower kernel: Code: f9 41 89 f0 41 83 e2 40 75 2c 89 f0 25 00 00 41 00 3d 00 00 41 00 74 1e 44 89 c2 4c 89 ce bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2c c3 0f 1f 00 48 8d 44 24 08 c7 44 24 b8 10
Jun  9 19:47:19 Tower kernel: RSP: 002b:00007fff113109b8 EFLAGS: 00000287 ORIG_RAX: 0000000000000101
Jun  9 19:47:19 Tower kernel: RAX: ffffffffffffffda RBX: 0000148e08a35190 RCX: 0000148e08a24654
Jun  9 19:47:19 Tower kernel: RDX: 0000000000080000 RSI: 0000148e08a2ad7d RDI: 00000000ffffff9c
Jun  9 19:47:19 Tower kernel: RBP: 00007fff11310b70 R08: 0000000000080000 R09: 0000148e08a2ad7d
Jun  9 19:47:19 Tower kernel: R10: 0000000000000000 R11: 0000000000000287 R12: ffffffffffffffff
Jun  9 19:47:19 Tower kernel: R13: 0000000000000001 R14: 0000148e08a34040 R15: 0000000000418bd8
Jun  9 19:47:19 Tower kernel: </TASK>
Jun  9 19:47:19 Tower kernel: Modules linked in: xfs md_mod efivarfs iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libblake2s blake2s_x86_64 libblake2s_generic libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables bonding igb mpt3sas i2c_algo_bit raid_class ch341 input_leds edac_mce_amd kvm_amd kvm wmi_bmof mxm_wmi crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd i2c_piix4 cryptd rapl ahci ccp k10temp led_class i2c_core scsi_transport_sas usbserial tpm_crb libahci nvme tpm_tis tpm_tis_core nvme_core tpm wmi button acpi_cpufreq
Jun  9 19:47:19 Tower kernel: ---[ end trace 433c4d1883a374bf ]---
Jun  9 19:47:19 Tower kernel: RIP: 0010:___slab_alloc+0x2e2/0x5de
Jun  9 19:47:19 Tower kernel: Code: 8b 6b 10 4d 85 ed 0f 85 95 02 00 00 48 8b 44 24 38 48 89 43 10 49 8b 04 24 48 83 c0 20 65 48 03 05 d3 f8 e2 7e 41 8b 44 24 28 <49> 8b 04 07 48 ff 43 08 48 89 03 49 8b 04 24 48 83 c0 20 65 48 03
Jun  9 19:47:19 Tower kernel: RSP: 0018:ffffc90003e4fc70 EFLAGS: 00010082
Jun  9 19:47:19 Tower kernel: RAX: 0000000000000050 RBX: ffff888c0e9b0690 RCX: 0000000080150015
Jun  9 19:47:19 Tower kernel: RDX: 0000000000000200 RSI: ffffea000424bcc0 RDI: 0000000044042000
Jun  9 19:47:19 Tower kernel: RBP: ffffc90003e4fd30 R08: 0000000000000000 R09: 0000000080150015
Jun  9 19:47:19 Tower kernel: R10: 0000000000000202 R11: ffff88814e780020 R12: ffff8881001cc500
Jun  9 19:47:19 Tower kernel: R13: 00000000ffffffff R14: 0000000000000cc0 R15: fbff8881092f3780
Jun  9 19:47:19 Tower kernel: FS:  0000000000000000(0000) GS:ffff888c0ea40000(0000) knlGS:0000000000000000
Jun  9 19:47:19 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun  9 19:47:19 Tower kernel: CR2: 00007fff11310f80 CR3: 000000015a2e2000 CR4: 0000000000350ee0
 

 

Link to comment

Looks to me like a hardware issue, one more thing you can try is to boot the server in safe mode with all docker/VMs disable, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.

Link to comment
3 minutes ago, JorgeB said:

Looks to me like a hardware issue, one more thing you can try is to boot the server in safe mode with all docker/VMs disable, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.

Thanks, I'll give it a try. 

Link to comment
51 minutes ago, emilgil said:

My server crashed again as I was writing m previous post and after a reboot it crashes immediately when I try to start the array. I'm guessing that some file(s) have been corrupted by all these crashes. 

The OS run in RAM from files extracted from the flashdrive on boot.

Only to rule that possility, you can try to replace all the bz* files on the flash drive from the archive there : https://unraid.net/download

 

But I would first run several passes of Memtest in your place.

Link to comment
21 hours ago, ChatNoir said:

The OS run in RAM from files extracted from the flashdrive on boot.

Only to rule that possility, you can try to replace all the bz* files on the flash drive from the archive there : https://unraid.net/download

 

But I would first run several passes of Memtest in your place.

Ok, that changes things a bit. I just don't understand why my server crashes more frequently now than a few days ago. After reinstalling a few weeks ago it crashed after a week or so, then when I began installing my docker containers it became more and more common. The last week it has crashed daily or more. Last night it wouldn't even start the array without panicking.

 

I have ordered a new USB memory stick and once it arrives I'll run Memtest from it for a week or so. 

Link to comment
On 6/9/2022 at 10:45 PM, ChatNoir said:

But I would first run several passes of Memtest in your place.

I ran 4 passes of Memtest86 yesterday without errors. 

 

I guess I just have to reinstall Unraid again and let it sit for a week or two without installing any addons or Dockers.

Link to comment

Ok, new attempt. I ran a new session of Memtest with all four RAM sticks in place and had a truckload of errors. I then removed the smaller pair of RAM and restarted Memtest. This time it passed the test so I'll let it sit idle for a few days now before I re-enable Docker and my VMs. 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...