emilgil Posted May 18, 2022 Share Posted May 18, 2022 I've had quite a few problems with my install of Unraid 6.9 and 6.10 RC1-4 but lately it has been running quite smoothly. Until yesterday morning that is, when I realized my server had crashed again. Kernel panic at approx 5 AM, what a lovely way to start the day. I restarted the server and waited to see it come back up but no, there was a new kernel panic during boot (even during Safe boot). Can somebody please help me sort this out? I can't really make anything out of the information from the first KP, the second one seems to stem from udev and some kind of disk failure (?) during boot? Needless to say, I can't access the server now. AMD Ryzen 3600X, Asus ROG Strix B450-F Gaming (BIOS updated in February 2022), 48 GB RAM and a LSI 9240-8i HBA in IT mode. I've had my fair share of panics due to power management issues but it has been running for a month or so now. Can anyone please shed some light on this? Quote Link to comment
JorgeB Posted May 18, 2022 Share Posted May 18, 2022 Check this is you haven't' yet and if issues persist enable the syslog server and post that after a crash, together with the diagnostics. Quote Link to comment
emilgil Posted May 18, 2022 Author Share Posted May 18, 2022 3 hours ago, JorgeB said: Check this is you haven't' yet and if issues persist enable the syslog server and post that after a crash, together with the diagnostics. I wish I could, I can't even boot into safe mode... Quote Link to comment
JorgeB Posted May 18, 2022 Share Posted May 18, 2022 Sorry, missed that it won't even boot now, backup current flash, download and install v6.10.0, recreate the flash drive and see if it boots, if yes restore the config folder from the backup and check if it still boots. Quote Link to comment
emilgil Posted May 18, 2022 Author Share Posted May 18, 2022 (edited) Alright, I did what you suggested and installed 6.10 stable on my USB. It booted fine the first time but once I copy my config folder on to the USB, it throws a series of error messages in quick succession before panicking again. It's far too fast for me to register any details, even if I knew what to look for. I'll try removing all USB devices and then all disks to see if it helps. Edited May 18, 2022 by emilgil Quote Link to comment
emilgil Posted May 19, 2022 Author Share Posted May 19, 2022 (edited) Ok, so no luck today. Tried removing all my USB devices but ended up in kernel panic anyway. Tried removing all my HDDs too but ended up in kernel panic. There are still two m.2 SSDs connected but only because they're such a pain to remove. I can't even add the key file to the USB memory without it panicking. I can't help thinking about HW issues, could my USB memory be corrupt even if I can write the Unraid image to it? A new one is only €12 so I might give that a try. Or should I ditch my AMD gear and spend another €300 on an Intel CPU and motherboard? Edited May 19, 2022 by emilgil Quote Link to comment
JorgeB Posted May 20, 2022 Share Posted May 20, 2022 On 5/18/2022 at 7:39 PM, emilgil said: Alright, I did what you suggested and installed 6.10 stable on my USB. It booted fine the first time but once I copy my config folder on to the USB, it throws a series of error messages in quick succession before panicking again. This suggests a problem with the current config, you might try just copying super.dat and the key and reconfigure the server, you can also try to copy a few more config files but a few at a time to see if you can find what's breaking it. Quote Link to comment
emilgil Posted May 20, 2022 Author Share Posted May 20, 2022 (edited) Alright, after messing about with BIOS settings I finally managed to boot into Unraid. Super.dat seems to be working so let's see if we can spin up the drives now. Edited May 20, 2022 by emilgil Quote Link to comment
JorgeB Posted May 20, 2022 Share Posted May 20, 2022 You can't restore a config with a trial key, only a paid one. Quote Link to comment
emilgil Posted May 20, 2022 Author Share Posted May 20, 2022 14 minutes ago, JorgeB said: You can't restore a config with a trial key, only a paid one. Yeah, I realized that there was a separate key file for the paid license. I've managed to start the array, time to run a parity check before I move on restoring other settings and disks. 1 Quote Link to comment
emilgil Posted June 3, 2022 Author Share Posted June 3, 2022 (edited) Ok, so I had two more panics a few days ago. I didn't manage to catch any details on the first one but after that I enabled the syslog server. Unfortunately it doesn't give much information. I have attached a screenshot of the second panic as well as a diagnostics dump taken between the two panics. Quote Jun 1 10:06:00 Tower avahi-daemon[15417]: Joining mDNS multicast group on interface veth33b83a0.IPv6 with address fe80::800a:7eff:fe57:dc32. Jun 1 10:06:00 Tower avahi-daemon[15417]: New relevant interface veth33b83a0.IPv6 for mDNS. Jun 1 10:06:00 Tower avahi-daemon[15417]: Registering new address record for fe80::800a:7eff:fe57:dc32 on veth33b83a0.*. Jun 1 10:06:03 Tower kernel: br-5ad5dd5266c5: port 5(veth33b83a0) entered disabled state Jun 1 10:06:03 Tower kernel: vethcc3a819: renamed from eth0 Jun 1 10:06:03 Tower avahi-daemon[15417]: Interface veth33b83a0.IPv6 no longer relevant for mDNS. Jun 1 10:06:03 Tower avahi-daemon[15417]: Leaving mDNS multicast group on interface veth33b83a0.IPv6 with address fe80::800a:7eff:fe57:dc32. Jun 1 10:06:03 Tower kernel: br-5ad5dd5266c5: port 5(veth33b83a0) entered disabled state Jun 1 10:06:03 Tower kernel: device veth33b83a0 left promiscuous mode Jun 1 10:06:03 Tower kernel: br-5ad5dd5266c5: port 5(veth33b83a0) entered disabled state Jun 1 10:06:03 Tower avahi-daemon[15417]: Withdrawing address record for fe80::800a:7eff:fe57:dc32 on veth33b83a0. Jun 1 18:01:41 Tower unassigned.devices: Mounting 'Auto Mount' Devices... Jun 1 18:01:41 Tower unassigned.devices: Disk with ID 'HGST_HMS5C4040BLE640_PL1331LAHBZULH (sdf)' is not set to auto mount. Jun 1 18:01:41 Tower unassigned.devices: Disk with ID 'ST4000DM000-1F2168_W300WSKA (sdg)' is not set to auto mount. Jun 1 18:01:41 Tower unassigned.devices: Disk with ID 'WDC_WD80EZAZ-11TDBA0_7SJBREDU (sdi)' is not set to auto mount. Jun 1 18:01:41 Tower emhttpd: Starting services... Jun 1 18:01:41 Tower emhttpd: shcmd (277): /etc/rc.d/rc.samba restart Jun 1 18:01:41 Tower wsdd2[4649]: 'Terminated' signal received. Jun 1 18:01:41 Tower winbindd[4652]: [2022/06/01 18:01:41.868127, 0] ../../source3/winbindd/winbindd.c:247(winbindd_sig_term_handler) Jun 1 18:01:41 Tower winbindd[4654]: [2022/06/01 18:01:41.868127, 0] ../../source3/winbindd/winbindd.c:247(winbindd_sig_term_handler) Jun 1 18:01:41 Tower winbindd[4654]: Got sig[15] terminate (is_parent=0) Jun 1 18:01:41 Tower winbindd[4652]: Got sig[15] terminate (is_parent=1) Jun 1 18:01:41 Tower winbindd[4727]: [2022/06/01 18:01:41.868185, 0] ../../source3/winbindd/winbindd.c:247(winbindd_sig_term_handler) Jun 1 18:01:41 Tower wsdd2[4649]: terminating. Jun 1 18:01:41 Tower winbindd[4727]: Got sig[15] terminate (is_parent=0) Jun 1 18:01:42 Tower rsyslogd: [origin software="rsyslogd" swVersion="8.2102.0" x-pid="10782" x-info="https://www.rsyslog.com"] start Jun 1 18:01:44 Tower root: Starting Samba: /usr/sbin/smbd -D Jun 1 18:01:44 Tower smbd[10875]: [2022/06/01 18:01:44.021879, 0] ../../source3/smbd/server.c:1734(main) Jun 1 18:01:44 Tower smbd[10875]: smbd version 4.15.7 started. Jun 1 18:01:44 Tower smbd[10875]: Copyright Andrew Tridgell and the Samba Team 1992-2021 Jun 1 18:01:44 Tower root: /usr/sbin/wsdd2 -d Jun 1 18:01:44 Tower wsdd2[10889]: starting. Jun 1 18:01:44 Tower root: /usr/sbin/winbindd -D Jun 1 18:01:44 Tower winbindd[10890]: [2022/06/01 18:01:44.085748, 0] ../../source3/winbindd/winbindd.c:1722(main) Jun 1 18:01:44 Tower winbindd[10890]: winbindd version 4.15.7 started. Jun 1 18:01:44 Tower winbindd[10890]: Copyright Andrew Tridgell and the Samba Team 1992-2021 Jun 1 18:01:44 Tower winbindd[10892]: [2022/06/01 18:01:44.089644, 0] ../../source3/winbindd/winbindd_cache.c:3085(initialize_winbindd_cache) Jun 1 18:01:44 Tower winbindd[10892]: initialize_winbindd_cache: clearing cache and re-creating with version number 2 Jun 1 18:01:44 Tower emhttpd: shcmd (281): /etc/rc.d/rc.avahidaemon restart Jun 1 18:01:44 Tower root: Stopping Avahi mDNS/DNS-SD Daemon: stopped Jun 1 18:01:44 Tower avahi-dnsconfd[4678]: read(): EOF Jun 1 18:01:44 Tower root: Starting Avahi mDNS/DNS-SD Daemon: /usr/sbin/avahi-daemon -D Jun 1 18:01:44 Tower avahi-daemon[10909]: Found user 'avahi' (UID 61) and group 'avahi' (GID 214). Jun 1 18:01:44 Tower avahi-daemon[10909]: Successfully dropped root privileges. Jun 1 18:01:44 Tower avahi-daemon[10909]: avahi-daemon 0.8 starting up. Jun 1 18:01:44 Tower avahi-daemon[10909]: Successfully called chroot(). Jun 1 18:01:44 Tower avahi-daemon[10909]: Successfully dropped remaining capabilities. Jun 1 18:01:44 Tower avahi-daemon[10909]: Loading service file /services/sftp-ssh.service. Jun 1 18:01:44 Tower avahi-daemon[10909]: Loading service file /services/smb.service. Jun 1 18:01:44 Tower avahi-daemon[10909]: Loading service file /services/ssh.service. Jun 1 18:01:44 Tower avahi-daemon[10909]: Joining mDNS multicast group on interface br0.IPv4 with address 192.168.1.61. Jun 1 18:01:44 Tower avahi-daemon[10909]: New relevant interface br0.IPv4 for mDNS. Jun 1 18:01:44 Tower avahi-daemon[10909]: Joining mDNS multicast group on interface lo.IPv6 with address ::1. Jun 1 18:01:44 Tower avahi-daemon[10909]: New relevant interface lo.IPv6 for mDNS. Jun 1 18:01:44 Tower avahi-daemon[10909]: Joining mDNS multicast group on interface lo.IPv4 with address 127.0.0.1. Jun 1 18:01:44 Tower avahi-daemon[10909]: New relevant interface lo.IPv4 for mDNS. Jun 1 18:01:44 Tower avahi-daemon[10909]: Network interface enumeration completed. Jun 1 18:01:44 Tower avahi-daemon[10909]: Registering new address record for 192.168.1.61 on br0.IPv4. Jun 1 18:01:44 Tower avahi-daemon[10909]: Registering new address record for ::1 on lo.*. Jun 1 18:01:44 Tower avahi-daemon[10909]: Registering new address record for 127.0.0.1 on lo.IPv4. Jun 1 18:01:44 Tower emhttpd: shcmd (282): /etc/rc.d/rc.avahidnsconfd restart Jun 1 18:01:44 Tower root: Stopping Avahi mDNS/DNS-SD DNS Server Configuration Daemon: stopped Jun 1 18:01:44 Tower root: Starting Avahi mDNS/DNS-SD DNS Server Configuration Daemon: /usr/sbin/avahi-dnsconfd -D Jun 1 18:01:44 Tower avahi-dnsconfd[10918]: Successfully connected to Avahi daemon. Jun 1 18:01:44 Tower emhttpd: shcmd (291): /usr/local/sbin/mount_image '/mnt/user/system/docker/' /var/lib/docker 48 Jun 1 18:01:44 Tower emhttpd: shcmd (293): /etc/rc.d/rc.docker start Jun 1 18:01:44 Tower root: starting dockerd ... Jun 1 18:01:44 Tower kernel: Bridge firewalling registered tower-diagnostics-20220531-2041.zip Edited June 3, 2022 by emilgil Quote Link to comment
JorgeB Posted June 3, 2022 Share Posted June 3, 2022 Start here: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=819173 Quote Link to comment
emilgil Posted June 5, 2022 Author Share Posted June 5, 2022 Right, so even before these last two panics I had memory speed locked to 2400 MHz and power supply control set to Typical Idle. As far as I can remember, C-states were disabled but I'll double check. Quote Link to comment
emilgil Posted June 8, 2022 Author Share Posted June 8, 2022 Another panic tonight. C-states are disabled, PSIC set to Typical Current Idle, RAM speed 2666 MHz. Quote Link to comment
JorgeB Posted June 9, 2022 Share Posted June 9, 2022 Enable the syslog server and post that after a crash, it might catch something. Quote Link to comment
emilgil Posted June 9, 2022 Author Share Posted June 9, 2022 11 hours ago, JorgeB said: Enable the syslog server and post that after a crash, it might catch something. Not really. This is from the last crash (earlier today): Quote Jun 9 12:19:17 Tower kernel: device vethbe5bcc7 entered promiscuous mode Jun 9 12:19:17 Tower kernel: eth0: renamed from veth9cf5cec Jun 9 12:19:17 Tower kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethbe5bcc7: link becomes ready Jun 9 12:19:17 Tower kernel: br-7f0f8378f740: port 2(vethbe5bcc7) entered blocking state Jun 9 12:19:17 Tower kernel: br-7f0f8378f740: port 2(vethbe5bcc7) entered forwarding state Jun 9 12:19:19 Tower avahi-daemon[19767]: Joining mDNS multicast group on interface vethbe5bcc7.IPv6 with address fe80::10de:36ff:fe83:eba7. Jun 9 12:19:19 Tower avahi-daemon[19767]: New relevant interface vethbe5bcc7.IPv6 for mDNS. Jun 9 12:19:19 Tower avahi-daemon[19767]: Registering new address record for fe80::10de:36ff:fe83:eba7 on vethbe5bcc7.*. Jun 9 12:19:21 Tower kernel: br-7f0f8378f740: port 2(vethbe5bcc7) entered disabled state Jun 9 12:19:21 Tower kernel: veth9cf5cec: renamed from eth0 Jun 9 12:19:21 Tower avahi-daemon[19767]: Interface vethbe5bcc7.IPv6 no longer relevant for mDNS. Jun 9 12:19:21 Tower avahi-daemon[19767]: Leaving mDNS multicast group on interface vethbe5bcc7.IPv6 with address fe80::10de:36ff:fe83:eba7. Jun 9 12:19:21 Tower kernel: br-7f0f8378f740: port 2(vethbe5bcc7) entered disabled state Jun 9 12:19:21 Tower kernel: device vethbe5bcc7 left promiscuous mode Jun 9 12:19:21 Tower kernel: br-7f0f8378f740: port 2(vethbe5bcc7) entered disabled state Jun 9 12:19:21 Tower avahi-daemon[19767]: Withdrawing address record for fe80::10de:36ff:fe83:eba7 on vethbe5bcc7. Jun 9 19:44:22 Tower unassigned.devices: Mounting 'Auto Mount' Devices... Jun 9 19:44:22 Tower unassigned.devices: Partition 'sdc1' does not have a file system and cannot be mounted. Jun 9 19:44:22 Tower unassigned.devices: Adding partition 'sdc2'... Then it seems something happened while I was trying to log in: Quote Jun 9 19:44:22 Tower unassigned.devices: Mounting partition 'sdc2' at mountpoint '/mnt/disks/1000GB'... Jun 9 19:44:23 Tower unassigned.devices: Mount drive command: /sbin/mount -t 'ntfs' -o rw,noatime,nodiratime,nodev,nosuid,nls=utf8,umask=000 '/dev/sdc2' '/mnt/disks/1000GB' Jun 9 19:44:23 Tower ntfs-3g[27773]: Version 2021.8.22 integrated FUSE 27 Jun 9 19:44:23 Tower ntfs-3g[27773]: Mounted /dev/sdc2 (Read-Write, label "1000GB", NTFS 3.1) Jun 9 19:44:23 Tower ntfs-3g[27773]: Cmdline options: rw,noatime,nodiratime,nodev,nosuid,nls=utf8,umask=000 Jun 9 19:44:23 Tower ntfs-3g[27773]: Mount options: nodiratime,nodev,nosuid,nls=utf8,allow_other,nonempty,noatime,rw,default_permissions,fsname=/dev/sdc2,blkdev,blksize=4096 Jun 9 19:44:23 Tower ntfs-3g[27773]: Global ownership and permissions enforced, configuration type 1 Jun 9 19:44:23 Tower unassigned.devices: Successfully mounted 'sdc2' on '/mnt/disks/1000GB'. Jun 9 19:44:23 Tower unassigned.devices: Adding SMB share '1000GB'. Jun 9 19:44:23 Tower unassigned.devices: Disk with ID 'HGST_HMS5C4040BLE640_PL1331LAHBZULH (sdf)' is not set to auto mount. Jun 9 19:44:23 Tower unassigned.devices: Disk with ID 'ST4000DM000-1F2168_W300WSKA (sdg)' is not set to auto mount. Jun 9 19:44:23 Tower unassigned.devices: Disk with ID 'WDC_WD80EZAZ-11TDBA0_7SJBREDU (sdi)' is not set to auto mount. Jun 9 19:44:23 Tower emhttpd: Starting services... Jun 9 19:44:23 Tower emhttpd: shcmd (5915): /etc/rc.d/rc.samba restart Jun 9 19:44:23 Tower wsdd2[5095]: 'Terminated' signal received. Jun 9 19:44:23 Tower winbindd[5098]: [2022/06/09 19:44:23.324858, 0] ../../source3/winbindd/winbindd.c:247(winbindd_sig_term_handler) Jun 9 19:44:23 Tower winbindd[5098]: Got sig[15] terminate (is_parent=1) Jun 9 19:44:23 Tower winbindd[5100]: [2022/06/09 19:44:23.324867, 0] ../../source3/winbindd/winbindd.c:247(winbindd_sig_term_handler) Jun 9 19:44:23 Tower winbindd[5100]: Got sig[15] terminate (is_parent=0) Jun 9 19:44:23 Tower wsdd2[5095]: terminating. Jun 9 19:44:23 Tower winbindd[5176]: [2022/06/09 19:44:23.324937, 0] ../../source3/winbindd/winbindd.c:247(winbindd_sig_term_handler) Jun 9 19:44:23 Tower winbindd[5176]: Got sig[15] terminate (is_parent=0) Jun 9 19:44:23 Tower rsyslogd: [origin software="rsyslogd" swVersion="8.2102.0" x-pid="27711" x-info="https://www.rsyslog.com"] start Jun 9 19:44:25 Tower root: Starting Samba: /usr/sbin/smbd -D Jun 9 19:44:25 Tower smbd[27876]: [2022/06/09 19:44:25.480457, 0] ../../source3/smbd/server.c:1734(main) Jun 9 19:44:25 Tower smbd[27876]: smbd version 4.15.7 started. Jun 9 19:44:25 Tower smbd[27876]: Copyright Andrew Tridgell and the Samba Team 1992-2021 Jun 9 19:44:25 Tower root: /usr/sbin/wsdd2 -d Jun 9 19:44:25 Tower wsdd2[27890]: starting. Jun 9 19:44:25 Tower root: /usr/sbin/winbindd -D Jun 9 19:44:25 Tower winbindd[27891]: [2022/06/09 19:44:25.550503, 0] ../../source3/winbindd/winbindd.c:1722(main) Jun 9 19:44:25 Tower winbindd[27891]: winbindd version 4.15.7 started. Jun 9 19:44:25 Tower winbindd[27891]: Copyright Andrew Tridgell and the Samba Team 1992-2021 Jun 9 19:44:25 Tower winbindd[27893]: [2022/06/09 19:44:25.554416, 0] ../../source3/winbindd/winbindd_cache.c:3085(initialize_winbindd_cache) Jun 9 19:44:25 Tower winbindd[27893]: initialize_winbindd_cache: clearing cache and re-creating with version number 2 Jun 9 19:47:17 Tower nginx: 2022/06/09 19:47:17 [error] 5175#5175: *5566 upstream timed out (110: Connection timed out) while reading upstream, client: 192.168.1.202, server: , request: "POST /update.htm HTTP/1.1", upstream: "http://unix:/var/run/emhttpd.socket:/update.htm", host: "192.168.1.61", referrer: "http://192.168.1.61/Main" Jun 9 19:47:19 Tower kernel: general protection fault, probably for non-canonical address 0xfbff888157600a70: 0000 [#2] SMP NOPTI Jun 9 19:47:19 Tower kernel: CPU: 9 PID: 29474 Comm: sh Tainted: G D 5.15.43-Unraid #1 Jun 9 19:47:19 Tower kernel: Hardware name: System manufacturer System Product Name/ROG STRIX B450-F GAMING, BIOS 4801 03/02/2022 Jun 9 19:47:19 Tower kernel: RIP: 0010:___slab_alloc+0x2e2/0x5de Jun 9 19:47:19 Tower kernel: Code: 8b 6b 10 4d 85 ed 0f 85 95 02 00 00 48 8b 44 24 38 48 89 43 10 49 8b 04 24 48 83 c0 20 65 48 03 05 d3 f8 e2 7e 41 8b 44 24 28 <49> 8b 04 07 48 ff 43 08 48 89 03 49 8b 04 24 48 83 c0 20 65 48 03 Jun 9 19:47:19 Tower kernel: RSP: 0018:ffffc9000303fbb0 EFLAGS: 00010086 Jun 9 19:47:19 Tower kernel: RAX: 0000000000000070 RBX: ffff888c0ea70810 RCX: 0000000080200020 Jun 9 19:47:19 Tower kernel: RDX: 0000000000000200 RSI: ffffea00055d8000 RDI: 0000000044042000 Jun 9 19:47:19 Tower kernel: RBP: ffffc9000303fc70 R08: 0000000000000000 R09: 0000000080200020 Jun 9 19:47:19 Tower kernel: R10: 0000000000000202 R11: 0000000000000fe0 R12: ffff8881001cd100 Jun 9 19:47:19 Tower kernel: R13: 00000000ffffffff R14: 0000000000000dc0 R15: fbff888157600a00 Jun 9 19:47:19 Tower kernel: FS: 0000000000000000(0000) GS:ffff888c0ea40000(0000) knlGS:0000000000000000 Jun 9 19:47:19 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 9 19:47:19 Tower kernel: CR2: 00007fff11310f80 CR3: 000000015a2e2000 CR4: 0000000000350ee0 Jun 9 19:47:19 Tower kernel: Call Trace: Jun 9 19:47:19 Tower kernel: <TASK> Jun 9 19:47:19 Tower kernel: ? __alloc_file+0x26/0x9c Jun 9 19:47:19 Tower kernel: ? rcu_segcblist_enqueue+0x12/0x33 Jun 9 19:47:19 Tower kernel: ? post_alloc_hook+0x21/0x4f Jun 9 19:47:19 Tower kernel: ? kernel_init_free_pages.part.0+0x48/0x59 Jun 9 19:47:19 Tower kernel: ? prep_new_page+0x1c/0x48 Jun 9 19:47:19 Tower kernel: ? __alloc_file+0x26/0x9c Jun 9 19:47:19 Tower kernel: kmem_cache_alloc+0x8c/0x175 Jun 9 19:47:19 Tower kernel: __alloc_file+0x26/0x9c Jun 9 19:47:19 Tower kernel: alloc_empty_file+0x9e/0xd3 Jun 9 19:47:19 Tower kernel: path_openat+0x46/0x94f Jun 9 19:47:19 Tower kernel: ? cgroup_rstat_updated+0x21/0xa1 Jun 9 19:47:19 Tower kernel: ? cgroup_rstat_updated+0x21/0xa1 Jun 9 19:47:19 Tower kernel: do_filp_open+0x53/0xb0 Jun 9 19:47:19 Tower kernel: ? get_page+0x5/0xa Jun 9 19:47:19 Tower kernel: ? lru_cache_add+0x35/0x54 Jun 9 19:47:19 Tower kernel: ? set_pte+0x5/0x8 Jun 9 19:47:19 Tower kernel: ? slab_post_alloc_hook+0x50/0x157 Jun 9 19:47:19 Tower kernel: ? getname_flags+0x29/0x150 Jun 9 19:47:19 Tower kernel: ? kmem_cache_alloc+0xff/0x175 Jun 9 19:47:19 Tower kernel: do_sys_openat2+0x72/0xde Jun 9 19:47:19 Tower kernel: do_sys_open+0x3b/0x58 Jun 9 19:47:19 Tower kernel: do_syscall_64+0x83/0xa5 Jun 9 19:47:19 Tower kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae Jun 9 19:47:19 Tower kernel: RIP: 0033:0x148e08a24654 Jun 9 19:47:19 Tower kernel: Code: f9 41 89 f0 41 83 e2 40 75 2c 89 f0 25 00 00 41 00 3d 00 00 41 00 74 1e 44 89 c2 4c 89 ce bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2c c3 0f 1f 00 48 8d 44 24 08 c7 44 24 b8 10 Jun 9 19:47:19 Tower kernel: RSP: 002b:00007fff113109b8 EFLAGS: 00000287 ORIG_RAX: 0000000000000101 Jun 9 19:47:19 Tower kernel: RAX: ffffffffffffffda RBX: 0000148e08a35190 RCX: 0000148e08a24654 Jun 9 19:47:19 Tower kernel: RDX: 0000000000080000 RSI: 0000148e08a2ad7d RDI: 00000000ffffff9c Jun 9 19:47:19 Tower kernel: RBP: 00007fff11310b70 R08: 0000000000080000 R09: 0000148e08a2ad7d Jun 9 19:47:19 Tower kernel: R10: 0000000000000000 R11: 0000000000000287 R12: ffffffffffffffff Jun 9 19:47:19 Tower kernel: R13: 0000000000000001 R14: 0000148e08a34040 R15: 0000000000418bd8 Jun 9 19:47:19 Tower kernel: </TASK> Jun 9 19:47:19 Tower kernel: Modules linked in: xfs md_mod efivarfs iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libblake2s blake2s_x86_64 libblake2s_generic libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables bonding igb mpt3sas i2c_algo_bit raid_class ch341 input_leds edac_mce_amd kvm_amd kvm wmi_bmof mxm_wmi crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd i2c_piix4 cryptd rapl ahci ccp k10temp led_class i2c_core scsi_transport_sas usbserial tpm_crb libahci nvme tpm_tis tpm_tis_core nvme_core tpm wmi button acpi_cpufreq Jun 9 19:47:19 Tower kernel: ---[ end trace 433c4d1883a374bf ]--- Jun 9 19:47:19 Tower kernel: RIP: 0010:___slab_alloc+0x2e2/0x5de Jun 9 19:47:19 Tower kernel: Code: 8b 6b 10 4d 85 ed 0f 85 95 02 00 00 48 8b 44 24 38 48 89 43 10 49 8b 04 24 48 83 c0 20 65 48 03 05 d3 f8 e2 7e 41 8b 44 24 28 <49> 8b 04 07 48 ff 43 08 48 89 03 49 8b 04 24 48 83 c0 20 65 48 03 Jun 9 19:47:19 Tower kernel: RSP: 0018:ffffc90003e4fc70 EFLAGS: 00010082 Jun 9 19:47:19 Tower kernel: RAX: 0000000000000050 RBX: ffff888c0e9b0690 RCX: 0000000080150015 Jun 9 19:47:19 Tower kernel: RDX: 0000000000000200 RSI: ffffea000424bcc0 RDI: 0000000044042000 Jun 9 19:47:19 Tower kernel: RBP: ffffc90003e4fd30 R08: 0000000000000000 R09: 0000000080150015 Jun 9 19:47:19 Tower kernel: R10: 0000000000000202 R11: ffff88814e780020 R12: ffff8881001cc500 Jun 9 19:47:19 Tower kernel: R13: 00000000ffffffff R14: 0000000000000cc0 R15: fbff8881092f3780 Jun 9 19:47:19 Tower kernel: FS: 0000000000000000(0000) GS:ffff888c0ea40000(0000) knlGS:0000000000000000 Jun 9 19:47:19 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 9 19:47:19 Tower kernel: CR2: 00007fff11310f80 CR3: 000000015a2e2000 CR4: 0000000000350ee0 Quote Link to comment
JorgeB Posted June 9, 2022 Share Posted June 9, 2022 Looks to me like a hardware issue, one more thing you can try is to boot the server in safe mode with all docker/VMs disable, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one. Quote Link to comment
emilgil Posted June 9, 2022 Author Share Posted June 9, 2022 3 minutes ago, JorgeB said: Looks to me like a hardware issue, one more thing you can try is to boot the server in safe mode with all docker/VMs disable, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one. Thanks, I'll give it a try. Quote Link to comment
emilgil Posted June 9, 2022 Author Share Posted June 9, 2022 My server crashed again as I was writing m previous post and after a reboot it crashes immediately when I try to start the array. I'm guessing that some file(s) have been corrupted by all these crashes. Quote Link to comment
ChatNoir Posted June 9, 2022 Share Posted June 9, 2022 51 minutes ago, emilgil said: My server crashed again as I was writing m previous post and after a reboot it crashes immediately when I try to start the array. I'm guessing that some file(s) have been corrupted by all these crashes. The OS run in RAM from files extracted from the flashdrive on boot. Only to rule that possility, you can try to replace all the bz* files on the flash drive from the archive there : https://unraid.net/download But I would first run several passes of Memtest in your place. Quote Link to comment
emilgil Posted June 10, 2022 Author Share Posted June 10, 2022 21 hours ago, ChatNoir said: The OS run in RAM from files extracted from the flashdrive on boot. Only to rule that possility, you can try to replace all the bz* files on the flash drive from the archive there : https://unraid.net/download But I would first run several passes of Memtest in your place. Ok, that changes things a bit. I just don't understand why my server crashes more frequently now than a few days ago. After reinstalling a few weeks ago it crashed after a week or so, then when I began installing my docker containers it became more and more common. The last week it has crashed daily or more. Last night it wouldn't even start the array without panicking. I have ordered a new USB memory stick and once it arrives I'll run Memtest from it for a week or so. Quote Link to comment
emilgil Posted June 10, 2022 Author Share Posted June 10, 2022 I just replaced all the bz* files from the original zip file and it only took a few minutes for my server to panic, I didn't even try to start the array. Quote Link to comment
emilgil Posted June 15, 2022 Author Share Posted June 15, 2022 On 6/9/2022 at 10:45 PM, ChatNoir said: But I would first run several passes of Memtest in your place. I ran 4 passes of Memtest86 yesterday without errors. I guess I just have to reinstall Unraid again and let it sit for a week or two without installing any addons or Dockers. Quote Link to comment
emilgil Posted June 18, 2022 Author Share Posted June 18, 2022 Still no cigar. Sometimes it works for a couple of hours, other times just a few minutes. Sometimes Docker is activated, sometimes not. Sometimes it crashes when I start the array, sometimes not. I'm going crazy over this! Quote Link to comment
emilgil Posted June 21, 2022 Author Share Posted June 21, 2022 Ok, new attempt. I ran a new session of Memtest with all four RAM sticks in place and had a truckload of errors. I then removed the smaller pair of RAM and restarted Memtest. This time it passed the test so I'll let it sit idle for a few days now before I re-enable Docker and my VMs. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.