Jump to content

Unraid webui becomes unresponsive and all dockers and VMs disappear


Go to solution Solved by DingusKahn,

Recommended Posts

Recently, my server has been becoming unresponsive and all docker containers and VMs disappear. If i try to reboot the server nothing happens, and I have to shut if off manually. It also looks like most of the cores in my server are pegged at 100% usage despite the cpu temp hovering around 38c. Any ideas? I've attached diagnostics after I power cycled the server since I could not get diagnostics from when the server was unresponsive.

Screenshot 2023-08-04 101629.png

Server Diagnostics .zip

Link to comment

Okay so I've been messing around with this all day, and I haven't really made much progress. Ive removed all plugins, and replaced the bz* files on the usb, and the server still doesnt work. In fact now it is not even able to mount disks, crashes when it tries to do so. I've now got it up and running in safe mode and am running SMART extended tests on the disks just to see if anything comes up but that doesnt seem very likely to me. If nothing comes up I guess my next step is to mess around with the ram in the system to see if one or both sticks are causing the issue (xmp is not enabled). Does anyone have any ideas at all? i'd appreciate any tips at all at this point I'm a little lost/ out of my depth and not sure where to go next. Thanks in advance!

Link to comment

Server jut crashed again.

Aug  6 16:55:47 Dingleberry emhttpd: mounting /mnt/cache
Aug  6 16:55:47 Dingleberry emhttpd: shcmd (85526): mkdir -p /mnt/cache
Aug  6 16:55:47 Dingleberry emhttpd: shcmd (85527): /usr/sbin/zpool import -N -o autoexpand=on  -d /dev/nvme0n1p1 -d /dev/nvme1n1p1 13916249097028097922 cache
Aug  6 16:55:47 Dingleberry kernel: VERIFY3(range_tree_space(smla->smla_rt) + sme->sme_run <= smla->smla_sm->sm_size) failed (281451084271616 <= 8589934592)
Aug  6 16:55:47 Dingleberry kernel: PANIC at space_map.c:405:space_map_load_callback()
Aug  6 16:55:47 Dingleberry kernel: Showing stack for process 20456
Aug  6 16:55:47 Dingleberry kernel: CPU: 10 PID: 20456 Comm: metaslab_group_ Tainted: P           O       6.1.38-Unraid #2
Aug  6 16:55:47 Dingleberry kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X470D4U, BIOS P4.20 04/14/2021
Aug  6 16:55:47 Dingleberry kernel: Call Trace:
Aug  6 16:55:47 Dingleberry kernel: <TASK>
Aug  6 16:55:47 Dingleberry kernel: dump_stack_lvl+0x44/0x5c

 

This appears to be the issue, as it also is show as an error in the syslog. Ive attached the full thing in the to this message. Please let me know what this means if possible. Thanks!

syslog.txt

Link to comment
  • Solution

Okay another update. Memtest passed with flying colors. It looks like the issue was the zfs pool on the cache drives. I erased the drives (there wasnt much on there to begin with) and lo and behold, the server came back online. I do believe that was the issue, but if someone smarter than me wants to confirm that by reading the syslog please go right ahead.

 

Someone linked this to me in another post and this seems to be the way to do it if you don't go nuts and just want to delete the cache like I did.

 

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...