Another Kernel Panic crash


Recommended Posts

This has been a reoccurring issue where Unraid crashes several times a week.  The server is non responsive and the console has a crash report on the screen with the major error being a kernal panic, see pictures below.  I took 2 pictures from two different crashes.  I read through the posts here concerning other Kernel panic issues.  Step I have tried so far:

 

1. did mem86 test (the latest one from the website not the one on unraid boot) no issue

2. check the memory speed.  Actually the memory is rated for 2133 mHz but the auto select in bio down clocked to 1600.  I tried to set to 2133 but wouldn't even boot.

3. all other overclocking is disabled just put everything to auto detect

4. brand new powersupply so i'm not suspecting anything strange there.

 

Here are the syslogs, they are after 2 reboots so not sure how helpful they are.

 

tower-syslog-20210112-2124.zip20210112_142225.thumb.jpg.73b6891681fb290c5f9e0e84c66a2362.jpg20201210_074050.thumb.jpg.cfa41199175476d490bc9d52add81905.jpg

 

Thanks for your help

 

Linenoise

Hardware.xml

Link to comment
10 hours ago, JorgeB said:

Macvlan call traces are usually the result of having dockers with a custom IP address:

 

Yes, I have several dockers with custom IP address: I have: 

br0 = same subnet as unraid 10.0.0.x/24

 Pi-hole br0 

NginxProxyManager br0

Nextcloud br0

had shinobi on br0 which was on a separate vlan but disabled that a while ago and still got crashes.

 

Pi-hole I had running for a few years without issue.  Nextcloud and NginxProxyManager are the 2 I've installed recently around the time the crashes started.

 

On a similar note, not sure if this indicates any issues or is just a bug with unraid, but when I use a custom IP address with dockers and try to change the host port, it seems to ignore the change and still use the default port. 

 

I will shutdown nextcloud docker and see if the network stabilizes. 

 

thanks

 

 

Link to comment
  • 2 weeks later...
On 1/13/2021 at 2:05 PM, linenoise said:

Yes, I have several dockers with custom IP address: I have: 

br0 = same subnet as unraid 10.0.0.x/24

 Pi-hole br0 

NginxProxyManager br0

Nextcloud br0

had shinobi on br0 which was on a separate vlan but disabled that a while ago and still got crashes.

 

Pi-hole I had running for a few years without issue.  Nextcloud and NginxProxyManager are the 2 I've installed recently around the time the crashes started.

 

On a similar note, not sure if this indicates any issues or is just a bug with unraid, but when I use a custom IP address with dockers and try to change the host port, it seems to ignore the change and still use the default port. 

 

I will shutdown nextcloud docker and see if the network stabilizes. 

 

thanks

 

 

Ok this looks like it fixed the trace issue but now I have other errors, this might be due to curopt data caused by the lockups from the original issue.  These are the error messages I'm getting.

 

20210118_082116.thumb.jpg.de5ca6267a9942f0853439e8bd63eca2.jpg

 

crash on Jan 18 2021

 

20210119_075917.thumb.jpg.fda04077df76a010dffda1496eeb9122.jpg

 

crash on Jan 19 2021

 

20210120_155100.thumb.jpg.94df6d963a2054034bf2b14ec1b82c3f.jpg

 

crash Jan 20 2021

 

20210125_085328.thumb.jpg.4e71f0469e6ce474b7e68a1fa98551b0.jpg

 

Crash 

Jan 25 2021

 

I see the message to run XFS repair but not sure if 1. that is the main cause of the crashing 2. which drive to run it on.

 

Looking at (dm-14) as the source, and looking at the XFS_repair at https://wiki.unraid.net/index.php/Check_Disk_Filesystems#xfs_repair I ran xfs_repair -v /dev/sg14 which I found in the /dev directory shown here.  Not getting anykind of status, or indication it is doing anything, see screenshot below.  I'll let it sit for a while.

 

image.png.c7fe0ee6a7ebf29a5ec84d5da3d5069c.png

 

 

Thanks,

 

Linenoise

Link to comment

I don't have a disk 15, so I am am guessing the cache drive.  I put the array in maint mode and ran the xfs_repair found in  Main > cache > check file system.  The following is the output I got using the default settings with -n flag.

 

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
ignored because the -n option was used.  Expect spurious inconsistencies
which may be resolved by first mounting the filesystem to replay the log.
        - scan filesystem freespace and inode maps...
sb_ifree 91730, counted 91613
sb_fdblocks 216766133, counted 218441001
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
bad CRC for inode 797857
bad CRC for inode 797857, would rewrite
would have cleared inode 797857
        - agno = 1
        - agno = 2
Metadata corruption detected at 0x459730, xfs_dir3_block block 0x3a8aa5b8/0x1000
bad directory block magic # 0x58444433 in block 0 for directory inode 1078977760
corrupt block 0 in directory inode 1078977760
    would junk block
no . entry for directory 1078977760
no .. entry for directory 1078977760
problem with directory contents in inode 1078977760
would have cleared inode 1078977760
imap claims in-use inode 1082884434 is free, would correct imap
imap claims in-use inode 1082884435 is free, would correct imap
imap claims in-use inode 1082884436 is free, would correct imap
imap claims in-use inode 1082884437 is free, would correct imap
imap claims in-use inode 1082884438 is free, would correct imap
imap claims in-use inode 1082884439 is free, would correct imap
imap claims in-use inode 1082884440 is free, would correct imap
imap claims in-use inode 1082884441 is free, would correct imap

 

.... I deleted a bunch of these lines which where just incrementing through inode

 

picking up with an intresting bit

 

imap claims in-use inode 1082924397 is free, would correct imap
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 1
        - agno = 3
        - agno = 2
        - agno = 0
bad CRC for inode 797857, would rewrite
would have cleared inode 797857
bad directory block magic # 0x58444433 in block 0 for directory inode 1078977760
corrupt block 0 in directory inode 1078977760
    would junk block
no . entry for directory 1078977760
no .. entry for directory 1078977760
problem with directory contents in inode 1078977760
would have cleared inode 1078977760
entry "tmp" in shortform directory 546483779 references free inode 1078977760
would have junked entry "tmp" in directory inode 546483779
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
entry "tmp" in shortform directory inode 546483779 points to free inode 1078977760
would junk entry
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
disconnected inode 1077444767, would move to lost+found
disconnected inode 1077520792, would move to lost+found
disconnected inode 1078977881, would move to lost+found
disconnected inode 1082643091, would move to lost+found

 

...  Ending with this

disconnected inode 1082924396, would move to lost+found
disconnected inode 1082924397, would move to lost+found
Phase 7 - verify link counts...
would have reset inode 546483779 nlinks from 3 to 2
No modify flag set, skipping filesystem flush and exiting.

 

 

 

Link to comment

Just to document my steps for anyone else looking at this.

 

Removed -n flag got this message.

 

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair.  If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.

 

Ran with -L flag got this in return

 


Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair.  If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
destroyed because the -L option was used.
        - scan filesystem freespace and inode maps...
sb_ifree 91730, counted 91613
sb_fdblocks 216766133, counted 218441001
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
bad CRC for inode 797857
bad CRC for inode 797857, will rewrite
cleared inode 797857
        - agno = 1
        - agno = 2
Metadata corruption detected at 0x459730, xfs_dir3_block block 0x3a8aa5b8/0x1000
bad directory block magic # 0x58444433 in block 0 for directory inode 1078977760
corrupt block 0 in directory inode 1078977760
    will junk block
no . entry for directory 1078977760
no .. entry for directory 1078977760
problem with directory contents in inode 1078977760
cleared inode 1078977760
correcting imap
correcting imap
correcting imap
correcting imap
correcting imap
correcting imap

 

-------- deleted receptive message ------

 

Will spin up without maintenance mode and see if this fixes the crashes.  Hop the log file wasn't too important since I had it destroyed to continue with repair.

 

 

 

Link to comment
  • 3 weeks later...
  • 7 months later...

Closing the loop on this.  It looks like the kernal panic was a result of using custom docker IP addresses for some dockers namely: Nginx reverse proxy manager, nextcloud, and PiHole.  It has something to do with the way Unraid manages and bridges custom IP address on a single ethernet port.

I got frustrated with trying to get it to work inside unraid and just moved the applications to another server that was running portainer.  I am now in the process of building a new unraid server which will have 1 x 10gig fiber card, and 4 x 1gig copper ethernet ports.  I'm getting ready to post the best way to set up docker to take advantage of the multiple ports.  i.e., stream movies, share files, over the 10gb port, use 1 x gb port as an unraid management port, split the remaining 3 x 1 gb ports between dockers and VMs  but not sure how to do this.   Another question I have is, if I install portainer and use that for the docker manager will that bypass the limitations of the unraid built in docker?   I am going to post this to a new thread.  Will link the thread back here after I create it.

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.