Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

kernel panic -> server crash every day or 2 days

Featured Replies

Start happening after upgrading my motherboard/cpu/ram

 

UnRaid version: 6.12.6

 

- at first, I though it was a plugin so I removed and reinstall all plugins and dockers -> stop crashing a soon as I start the array but start crashing(kernel panic) once or 2 time a day...

 

- second: lower my new cpu performance and disable DDR5 boost and increase my fans speed at lower temp (maybe a temperature issue) -> crash less but still crash after few days

 

- third: I added a second nvme in my cache pool to use raid1, did btrfs scrub and perform a full balance (I saw some btrf error in the kernel panic message when it crash) -> was hoping it fix some issue and hope itsnt my cache1 nvme who failed...

 

- replace my usb key for a brand new one -> still crash after 1-2 days

 

so now I need some help reading what the kernel saying lol

 

here a older diagnostics file (because I am unable to have a new one, browser seem freezing after some time) and my syslog

 

I will post a new diagnostics file as soon as I will be able to have it

 

Thank you for your help :)

Best Regards

 

syslog-192.168.1.253.log tower-diagnostics-20240201-1753.zip

  • Community Expert

There are constant call traces logged, if the problem started after changing the hardware suggest starting by running memtest.

  • Author

I forgot to mention that

 

- memtest done and PASSED

- BIOS -> I disabled the memory auto boost and disable XMP boost or what ever in the bios who can overclock the RAM

 

  • Community Expert

One thing you can try is to boot the server in safe mode with all docker/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one. 

  • Author

try that too lol

found a culpry -> nvidia plugin and the script for the keylase patch

 

I reinstalled all dockers and all plugins, stop running the keylase patch -> run smoothly for few days and start crashing again

 

but I find out yesterday that its may be my VM. I stopped running it since then and didnt crash yet.

 

VM are the only thing I didnt reinstalled yet

 

so I guess with the BTFS error code from the kernel and my VM who are in the cache SSD NVME, could it really be my SSD NVME who fail or it could be another thing??

 

Am I wrong or it could also be the Memory Ram?

 

because both

 

MemTest = PASSED

 

and

 

SMART extended test = Completed without error

SMART error log:

Error Information (NVMe Log 0x01, 16 of 63 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS  Message
  0        452     0  0xe002  0x4004  0x004            0     1     -  Invalid Field in Command
  1        451     0  0x0014  0x4004  0x028            0     0     -  Invalid Field in Command

 

 

thanks taking the time reading and helping me JorgeB

  • Community Expert
19 hours ago, FTW said:

Am I wrong or it could also be the Memory Ram?

It can still be the RAM, memtest is only definitive if it finds an error, you can try running with just a couple of RAM sticks, if the same try the other two, that will basically rule out the RAM.

  • 2 weeks later...
  • Author

hello, after few test, isnt the RAM

 

I attached a new diagnostic and syslog

 

-> using the brand new CPU Intel i9 14900k, thanks to google, find out that it could be the issue of all those kernel panic.... seem to be the E-core or Temp issue even using a WaterCooling hardware. Running with E-core disable and seem stable since then not crashing but still have some kernel issue base on the logs...

 

if anyone can look at it and have a clue of my issue please!

 

Thanks a lot for your Help

tower-syslog-20240214-0808.zip tower-diagnostics-20240214-0306.zip

  • Community Expert

There are still multiple call traces, did you try this?

 

On 2/4/2024 at 12:39 PM, JorgeB said:

ne thing you can try is to boot the server in safe mode with all docker/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one. 

 

  • Author

yes I did and it was stable (see to be a temperature issue or CPU from what I found on google)

 

but now, I got a bigger issue :(

 

unraid  Pool 'cache' has encountered an uncorrectable I/O failure and has been suspended.

 

unable to mount my cache pool after a crash...

 

- switch nvme slot and same issue

 

the nvme is a brand new from less than 1 month ago

 

what's should I do to fix?

 

cannot scrub since the Array don't want to start, get stuck at cache pool. all others hdd are mounted

 

I even tried to downgrade OS version (6.12.4 been stable on my other server) to since I did the upgrade yesterday to the newer one (6.12.8). Unsuccessful

 

Feb 20 03:21:17 Tower emhttpd: mounting /mnt/cache
Feb 20 03:21:17 Tower emhttpd: shcmd (256): mkdir -p /mnt/cache
Feb 20 03:21:17 Tower emhttpd: /usr/sbin/zpool import -d /dev/mapper/nvme0n1p1 2>&1
Feb 20 03:21:17 Tower emhttpd:    pool: cache
Feb 20 03:21:17 Tower emhttpd:      id: 13452574719722492999
Feb 20 03:21:17 Tower emhttpd: shcmd (257): /usr/sbin/zpool import -N -o autoexpand=on  -d /dev/mapper/nvme0n1p1 13452574719722492999 cache
Feb 20 03:21:17 Tower kernel: WARNING: Pool 'cache' has encountered an uncorrectable I/O failure and has been suspended.
Feb 20 03:21:17 Tower kernel: 
Feb 20 03:23:22 Tower shutdown[26121]: shutting down for system halt
Feb 20 03:23:22 Tower init: Switching to runlevel: 0
Feb 20 03:23:22 Tower init: Trying to re-exec init
Feb 20 03:23:28 Tower monitor: Stop running nchan processes
Feb 20 03:23:42 Tower ntpd[1803]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized

 

better format the nvme from the cache pool? if yes, how do I do this?

 

I got only 1 nvme in the cache pool at the moment :/ didnt make it on raid yet, was testing the new GEN 5 NVME from Crucial...

 

thanks

 

if need, diag and whole syslog can be post later after work

 

  • Community Expert

Post the output of:

zpool import

 

  • Author
12 hours ago, JorgeB said:

Post the output of:

zpool import

 

 

it said this after I start the Array

 

no pools available to import

 

  • Author

here my diagnostics before start the Array

 

the diagnostics after trying to start the Array seem to get stuck at

/usr/sbin/zpool status 2>/dev/null|todos >>'/tower-diagnostics-20240221-0302/system/zfs-info.txt'

 

so I am unable to get a better diagn after Start of the Array

 

unable also to attach my syslog.txt... upload failed but every drives are mounted successfully except the Cache drive

 

+ that

Feb 21 02:29:20 Tower ntpd[1818]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized

 

that I don't know how to fix it....

 

zpool import output = no pools available to import for both before and after Starting the Array

 

as a resume: Array failed to start because it get stuck at  mounting the Cache (NVME) which is suspended

Feb 21 02:58:43 Tower kernel: WARNING: Pool 'cache' has encountered an uncorrectable I/O failure and has been suspended.

 

 

oh and my cache pool are ZFS

tower-diagnostics-20240221-0229.zip

Edited by FTW

  • Community Expert
10 hours ago, FTW said:

it said this after I start the Array

And before array start?

  • Author

same thing Before and After

  • Community Expert

If zpool import doesn't find any pools to import, you can try some options, like -F, see here, but I'm afraid most likely you will need to destroy the pool and restore from a backup

  • Author

damn okk

 

so what the correct command to try with the -F??

 

zpool import -F my-nvme-id. ?

 

and to destroy?

 

zpool import -D my-nvme-id or /mnt/cache?

  • Community Expert

First try

zpool import -F cache

 

  • Author
root@Tower:~# zpool import
no pools available to import
root@Tower:~# zpool import -F cache
cannot import 'cache': no such pool available
root@Tower:~# zpool import -D cache
cannot import 'cache': no such pool available
root@Tower:~# zpool destroy -f Cache
cannot open 'Cache': no such pool
root@Tower:~# zpool destroy -f cache
cannot open 'cache': no such pool

that's before I click on Array Start :/

Edited by FTW

  • Community Expert

I forgot to mention that since the pool is encrypted you need to start the array first, so that the device is decrypted.

  • Author

okay, but starting my array, make it get stuck at mounting my Cache Pools because of the suspend

 

my whole array still offline and at the moment in terminal, zpool import -F cache seem to get stuck, no answer and nothing in logs

  • Community Expert

That's why I don't recommend using encryption, it just complicates things when there's a problem, post the syslog after array start to see the luks open command:

 

cp /var/log/syslog /boot/syslog.txt

 

  • Author
  • Author

here the syslog and I do understand it know why you don't recommend using encryption uhuh 😪😅

  • Author

so what would be my best option to format my cache pool? Start Array with 'No Device' in my cache pool? or there a special way to format it?

  • Community Expert

If the server is not responding reboot, then without starting the array type:

 

cryptsetup luksOpen /dev/nvme0n1p1 nvme0n1p1 --allow-discards

 

You will need to enter the password, after that try the zpool import commands again.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.