Reclaim Posted November 12, 2023 Share Posted November 12, 2023 Hey, I've been struggling with unclean shutdowns for some time now. Unraid get´s completely unresponsive, no webui, no pinging, no ssh, no Caps Lock light on the keyboard. Only way to restore it, is to cut power. I had to cut power yesterday, after that i enabled Syslog. After 18 hours (12.Nov 07:71) it crashes again. In syslog i see: Nov 12 07:51:31 unRAID kernel: kernel BUG at arch/x86/kvm/mmu/tdp_mmu.c:560! After this unclean restart i changed the docker net from macvlan to ipvlan. Since then the Array does not start anymore. Stuck at "Array Starting". Can someone help? unraid-diagnostics-20231112-1414.zip syslog Quote Link to comment
Reclaim Posted November 12, 2023 Author Share Posted November 12, 2023 Update: I was waiting 1 hour in "Array Starting". Tryed the Power Button, it says "shutting down" on console but nothing happens. Then I designated to cut power again. Now booted in Safe Mode and Docker, VM´s disabled I was able to start the Array again. Here is the new Syslogsyslog (1) I see Nov 12 14:28:43 unRAID kernel: WARNING: CPU: 4 PID: 40 at kernel/kthread.c:141 free_kthread_struct+0x1e/0x43 after i started the Array last time. Quote Link to comment
Reclaim Posted November 12, 2023 Author Share Posted November 12, 2023 Update: Now I can´t start the array even when booted in Safe Mode. It stuck at mounting the zfs cache. This is in the Syslog: Nov 12 23:53:03 unRAID kernel: PANIC at dnode_sync.c:301:free_children() syslog (3) Quote Link to comment
Solution JorgeB Posted November 13, 2023 Solution Share Posted November 13, 2023 There's likely an underlying hardware issue, but you can try importing the pool read-only: zpool import -o readonly=on cache If it works backup and re-format the pool Quote Link to comment
Reclaim Posted November 13, 2023 Author Share Posted November 13, 2023 Thanks, i could import the pool as read-only. After the backup i will reformat the pool. Do you think it's still a hardware problem? How can i determination which component has the fault or have you a guess? Quote Link to comment
JorgeB Posted November 13, 2023 Share Posted November 13, 2023 There are a lot of call traces logged, also the pool getting corrupt may be a symptom, start by running memtest. Quote Link to comment
Reclaim Posted November 18, 2023 Author Share Posted November 18, 2023 Update: I only formatted the Cache Pool, since then the Server is stable. Uptime now 4.5 Days. Is it possible that a out of memory error caused the corruption? One or two times i had "Out of memory errors detected on your server" on Notifications. Quote Link to comment
JorgeB Posted November 19, 2023 Share Posted November 19, 2023 20 hours ago, Reclaim said: Is it possible that a out of memory error caused the corruption? Seems unlikely, but won't say it's not possible. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.