Another Kernel Panic crash

linenoise · January 12, 2021

This has been a reoccurring issue where Unraid crashes several times a week. The server is non responsive and the console has a crash report on the screen with the major error being a kernal panic, see pictures below. I took 2 pictures from two different crashes. I read through the posts here concerning other Kernel panic issues. Step I have tried so far:

1. did mem86 test (the latest one from the website not the one on unraid boot) no issue

2. check the memory speed. Actually the memory is rated for 2133 mHz but the auto select in bio down clocked to 1600. I tried to set to 2133 but wouldn't even boot.

3. all other overclocking is disabled just put everything to auto detect

4. brand new powersupply so i'm not suspecting anything strange there.

Here are the syslogs, they are after 2 reboots so not sure how helpful they are.

tower-syslog-20210112-2124.zip

Thanks for your help

Linenoise

Hardware.xml

JorgeB · January 13, 2021

Macvlan call traces are usually the result of having dockers with a custom IP address:

linenoise · January 13, 2021

10 hours ago, JorgeB said:

Macvlan call traces are usually the result of having dockers with a custom IP address:

Yes, I have several dockers with custom IP address: I have:

br0 = same subnet as unraid 10.0.0.x/24

Pi-hole br0

NginxProxyManager br0

Nextcloud br0

had shinobi on br0 which was on a separate vlan but disabled that a while ago and still got crashes.

Pi-hole I had running for a few years without issue. Nextcloud and NginxProxyManager are the 2 I've installed recently around the time the crashes started.

On a similar note, not sure if this indicates any issues or is just a bug with unraid, but when I use a custom IP address with dockers and try to change the host port, it seems to ignore the change and still use the default port.

I will shutdown nextcloud docker and see if the network stabilizes.

thanks

linenoise · January 25, 2021

On 1/13/2021 at 2:05 PM, linenoise said:

Yes, I have several dockers with custom IP address: I have:

br0 = same subnet as unraid 10.0.0.x/24

Pi-hole br0

NginxProxyManager br0

Nextcloud br0

had shinobi on br0 which was on a separate vlan but disabled that a while ago and still got crashes.

Pi-hole I had running for a few years without issue. Nextcloud and NginxProxyManager are the 2 I've installed recently around the time the crashes started.

On a similar note, not sure if this indicates any issues or is just a bug with unraid, but when I use a custom IP address with dockers and try to change the host port, it seems to ignore the change and still use the default port.

I will shutdown nextcloud docker and see if the network stabilizes.

thanks

Ok this looks like it fixed the trace issue but now I have other errors, this might be due to curopt data caused by the lockups from the original issue. These are the error messages I'm getting.

crash on Jan 18 2021

crash on Jan 19 2021

crash Jan 20 2021

Crash

Jan 25 2021

I see the message to run XFS repair but not sure if 1. that is the main cause of the crashing 2. which drive to run it on.

Looking at (dm-14) as the source, and looking at the XFS_repair at https://wiki.unraid.net/index.php/Check_Disk_Filesystems#xfs_repair I ran xfs_repair -v /dev/sg14 which I found in the /dev directory shown here. Not getting anykind of status, or indication it is doing anything, see screenshot below. I'll let it sit for a while.

image.png.c7fe0ee6a7ebf29a5ec84d5da3d5069c.png

Thanks,

Linenoise

JorgeB · January 25, 2021

1 hour ago, linenoise said:

Looking at (dm-14)

You need to run on the disk, not the device or parity will be invalid, dm-14 means you're using encryption, it should be disk 15 (or cache if there isn't one), but would need the diags to be sure.

linenoise · January 25, 2021

Here is the diag file.

tower-diagnostics-20210125-1034.zip

Thanks

linenoise · January 25, 2021

I don't have a disk 15, so I am am guessing the cache drive. I put the array in maint mode and ran the xfs_repair found in Main > cache > check file system. The following is the output I got using the default settings with -n flag.

Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
ignored because the -n option was used. Expect spurious inconsistencies
which may be resolved by first mounting the filesystem to replay the log.
- scan filesystem freespace and inode maps...
sb_ifree 91730, counted 91613
sb_fdblocks 216766133, counted 218441001
- found root inode chunk
Phase 3 - for each AG...
- scan (but don't clear) agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
bad CRC for inode 797857
bad CRC for inode 797857, would rewrite
would have cleared inode 797857
- agno = 1
- agno = 2
Metadata corruption detected at 0x459730, xfs_dir3_block block 0x3a8aa5b8/0x1000
bad directory block magic # 0x58444433 in block 0 for directory inode 1078977760
corrupt block 0 in directory inode 1078977760
would junk block
no . entry for directory 1078977760
no .. entry for directory 1078977760
problem with directory contents in inode 1078977760
would have cleared inode 1078977760
imap claims in-use inode 1082884434 is free, would correct imap
imap claims in-use inode 1082884435 is free, would correct imap
imap claims in-use inode 1082884436 is free, would correct imap
imap claims in-use inode 1082884437 is free, would correct imap
imap claims in-use inode 1082884438 is free, would correct imap
imap claims in-use inode 1082884439 is free, would correct imap
imap claims in-use inode 1082884440 is free, would correct imap
imap claims in-use inode 1082884441 is free, would correct imap

.... I deleted a bunch of these lines which where just incrementing through inode

picking up with an intresting bit

imap claims in-use inode 1082924397 is free, would correct imap
- agno = 3
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 1
- agno = 3
- agno = 2
- agno = 0
bad CRC for inode 797857, would rewrite
would have cleared inode 797857
bad directory block magic # 0x58444433 in block 0 for directory inode 1078977760
corrupt block 0 in directory inode 1078977760
would junk block
no . entry for directory 1078977760
no .. entry for directory 1078977760
problem with directory contents in inode 1078977760
would have cleared inode 1078977760
entry "tmp" in shortform directory 546483779 references free inode 1078977760
would have junked entry "tmp" in directory inode 546483779
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
- traversing filesystem ...
entry "tmp" in shortform directory inode 546483779 points to free inode 1078977760
would junk entry
- traversal finished ...
- moving disconnected inodes to lost+found ...
disconnected inode 1077444767, would move to lost+found
disconnected inode 1077520792, would move to lost+found
disconnected inode 1078977881, would move to lost+found
disconnected inode 1082643091, would move to lost+found

... Ending with this

disconnected inode 1082924396, would move to lost+found
disconnected inode 1082924397, would move to lost+found
Phase 7 - verify link counts...
would have reset inode 546483779 nlinks from 3 to 2
No modify flag set, skipping filesystem flush and exiting.

JorgeB · January 25, 2021

Run without -n or nothing will be done.

linenoise · January 25, 2021

Just to document my steps for anyone else looking at this.

Removed -n flag got this message.

Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed. Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair. If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.

Ran with -L flag got this in return

Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed. Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair. If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.

Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
destroyed because the -L option was used.
- scan filesystem freespace and inode maps...
sb_ifree 91730, counted 91613
sb_fdblocks 216766133, counted 218441001
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
bad CRC for inode 797857
bad CRC for inode 797857, will rewrite
cleared inode 797857
- agno = 1
- agno = 2
Metadata corruption detected at 0x459730, xfs_dir3_block block 0x3a8aa5b8/0x1000
bad directory block magic # 0x58444433 in block 0 for directory inode 1078977760
corrupt block 0 in directory inode 1078977760
will junk block
no . entry for directory 1078977760
no .. entry for directory 1078977760
problem with directory contents in inode 1078977760
cleared inode 1078977760
correcting imap
correcting imap
correcting imap
correcting imap
correcting imap
correcting imap

-------- deleted receptive message ------

Will spin up without maintenance mode and see if this fixes the crashes. Hop the log file wasn't too important since I had it destroyed to continue with repair.

superboki · January 27, 2021

hi

did you solve your problem?

juchong · February 18, 2021

I'm having a similar issue as well... What's going on?!

linenoise · October 17, 2021

Closing the loop on this. It looks like the kernal panic was a result of using custom docker IP addresses for some dockers namely: Nginx reverse proxy manager, nextcloud, and PiHole. It has something to do with the way Unraid manages and bridges custom IP address on a single ethernet port.

I got frustrated with trying to get it to work inside unraid and just moved the applications to another server that was running portainer. I am now in the process of building a new unraid server which will have 1 x 10gig fiber card, and 4 x 1gig copper ethernet ports. I'm getting ready to post the best way to set up docker to take advantage of the multiple ports. i.e., stream movies, share files, over the 10gb port, use 1 x gb port as an unraid management port, split the remaining 3 x 1 gb ports between dockers and VMs but not sure how to do this. Another question I have is, if I install portainer and use that for the docker manager will that bypass the limitations of the unraid built in docker? I am going to post this to a new thread. Will link the thread back here after I create it.

linenoise · October 17, 2021

Here is the link to my new server build thread about the physical ports and docker I mentioned above in case your looking.

Another Kernel Panic crash

Recommended Posts

linenoise

Link to comment

JorgeB

Link to comment

linenoise

Link to comment

linenoise

Link to comment

JorgeB

Link to comment

linenoise

Link to comment

linenoise

Link to comment

JorgeB

Link to comment

linenoise

Link to comment

superboki

Link to comment

juchong

Link to comment

linenoise

Link to comment

linenoise

Link to comment

Join the conversation