DingusKahn Posted August 4, 2023 Share Posted August 4, 2023 Recently, my server has been becoming unresponsive and all docker containers and VMs disappear. If i try to reboot the server nothing happens, and I have to shut if off manually. It also looks like most of the cores in my server are pegged at 100% usage despite the cpu temp hovering around 38c. Any ideas? I've attached diagnostics after I power cycled the server since I could not get diagnostics from when the server was unresponsive. Server Diagnostics .zip Quote Link to comment
DingusKahn Posted August 4, 2023 Author Share Posted August 4, 2023 40 minutes later and it happened again... Here is the syslog dingleberry-syslog-20230804-1459.zip Quote Link to comment
DingusKahn Posted August 5, 2023 Author Share Posted August 5, 2023 Okay so I've been messing around with this all day, and I haven't really made much progress. Ive removed all plugins, and replaced the bz* files on the usb, and the server still doesnt work. In fact now it is not even able to mount disks, crashes when it tries to do so. I've now got it up and running in safe mode and am running SMART extended tests on the disks just to see if anything comes up but that doesnt seem very likely to me. If nothing comes up I guess my next step is to mess around with the ram in the system to see if one or both sticks are causing the issue (xmp is not enabled). Does anyone have any ideas at all? i'd appreciate any tips at all at this point I'm a little lost/ out of my depth and not sure where to go next. Thanks in advance! Quote Link to comment
itimpi Posted August 5, 2023 Share Posted August 5, 2023 I would suggest you enable the syslog server with the option to mirror to flash set. Hopefully then if it happens again you may have a syslog file in the ‘logs’ folder on the flash drive that shows what happened leading up to the crash. Quote Link to comment
DingusKahn Posted August 5, 2023 Author Share Posted August 5, 2023 I ran the SMART test, nothing came up. I'll run memtest shortly and enable the syslog server as was mentioned and see if anything pops up. I hope RAM is the issue because that would be quite the easy fix, but we shall see. Thanks! Quote Link to comment
DingusKahn Posted August 6, 2023 Author Share Posted August 6, 2023 Server jut crashed again. Aug 6 16:55:47 Dingleberry emhttpd: mounting /mnt/cache Aug 6 16:55:47 Dingleberry emhttpd: shcmd (85526): mkdir -p /mnt/cache Aug 6 16:55:47 Dingleberry emhttpd: shcmd (85527): /usr/sbin/zpool import -N -o autoexpand=on -d /dev/nvme0n1p1 -d /dev/nvme1n1p1 13916249097028097922 cache Aug 6 16:55:47 Dingleberry kernel: VERIFY3(range_tree_space(smla->smla_rt) + sme->sme_run <= smla->smla_sm->sm_size) failed (281451084271616 <= 8589934592) Aug 6 16:55:47 Dingleberry kernel: PANIC at space_map.c:405:space_map_load_callback() Aug 6 16:55:47 Dingleberry kernel: Showing stack for process 20456 Aug 6 16:55:47 Dingleberry kernel: CPU: 10 PID: 20456 Comm: metaslab_group_ Tainted: P O 6.1.38-Unraid #2 Aug 6 16:55:47 Dingleberry kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X470D4U, BIOS P4.20 04/14/2021 Aug 6 16:55:47 Dingleberry kernel: Call Trace: Aug 6 16:55:47 Dingleberry kernel: <TASK> Aug 6 16:55:47 Dingleberry kernel: dump_stack_lvl+0x44/0x5c This appears to be the issue, as it also is show as an error in the syslog. Ive attached the full thing in the to this message. Please let me know what this means if possible. Thanks! syslog.txt Quote Link to comment
Solution DingusKahn Posted August 6, 2023 Author Solution Share Posted August 6, 2023 Okay another update. Memtest passed with flying colors. It looks like the issue was the zfs pool on the cache drives. I erased the drives (there wasnt much on there to begin with) and lo and behold, the server came back online. I do believe that was the issue, but if someone smarter than me wants to confirm that by reading the syslog please go right ahead. Someone linked this to me in another post and this seems to be the way to do it if you don't go nuts and just want to delete the cache like I did. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.