linenoise Posted January 12, 2021 Share Posted January 12, 2021 This has been a reoccurring issue where Unraid crashes several times a week. The server is non responsive and the console has a crash report on the screen with the major error being a kernal panic, see pictures below. I took 2 pictures from two different crashes. I read through the posts here concerning other Kernel panic issues. Step I have tried so far: 1. did mem86 test (the latest one from the website not the one on unraid boot) no issue 2. check the memory speed. Actually the memory is rated for 2133 mHz but the auto select in bio down clocked to 1600. I tried to set to 2133 but wouldn't even boot. 3. all other overclocking is disabled just put everything to auto detect 4. brand new powersupply so i'm not suspecting anything strange there. Here are the syslogs, they are after 2 reboots so not sure how helpful they are. tower-syslog-20210112-2124.zip Thanks for your help Linenoise Hardware.xml Quote Link to comment
JorgeB Posted January 13, 2021 Share Posted January 13, 2021 Macvlan call traces are usually the result of having dockers with a custom IP address: Quote Link to comment
linenoise Posted January 13, 2021 Author Share Posted January 13, 2021 10 hours ago, JorgeB said: Macvlan call traces are usually the result of having dockers with a custom IP address: Yes, I have several dockers with custom IP address: I have: br0 = same subnet as unraid 10.0.0.x/24 Pi-hole br0 NginxProxyManager br0 Nextcloud br0 had shinobi on br0 which was on a separate vlan but disabled that a while ago and still got crashes. Pi-hole I had running for a few years without issue. Nextcloud and NginxProxyManager are the 2 I've installed recently around the time the crashes started. On a similar note, not sure if this indicates any issues or is just a bug with unraid, but when I use a custom IP address with dockers and try to change the host port, it seems to ignore the change and still use the default port. I will shutdown nextcloud docker and see if the network stabilizes. thanks Quote Link to comment
linenoise Posted January 25, 2021 Author Share Posted January 25, 2021 On 1/13/2021 at 2:05 PM, linenoise said: Yes, I have several dockers with custom IP address: I have: br0 = same subnet as unraid 10.0.0.x/24 Pi-hole br0 NginxProxyManager br0 Nextcloud br0 had shinobi on br0 which was on a separate vlan but disabled that a while ago and still got crashes. Pi-hole I had running for a few years without issue. Nextcloud and NginxProxyManager are the 2 I've installed recently around the time the crashes started. On a similar note, not sure if this indicates any issues or is just a bug with unraid, but when I use a custom IP address with dockers and try to change the host port, it seems to ignore the change and still use the default port. I will shutdown nextcloud docker and see if the network stabilizes. thanks Ok this looks like it fixed the trace issue but now I have other errors, this might be due to curopt data caused by the lockups from the original issue. These are the error messages I'm getting. crash on Jan 18 2021 crash on Jan 19 2021 crash Jan 20 2021 Crash Jan 25 2021 I see the message to run XFS repair but not sure if 1. that is the main cause of the crashing 2. which drive to run it on. Looking at (dm-14) as the source, and looking at the XFS_repair at https://wiki.unraid.net/index.php/Check_Disk_Filesystems#xfs_repair I ran xfs_repair -v /dev/sg14 which I found in the /dev directory shown here. Not getting anykind of status, or indication it is doing anything, see screenshot below. I'll let it sit for a while. Thanks, Linenoise Quote Link to comment
JorgeB Posted January 25, 2021 Share Posted January 25, 2021 1 hour ago, linenoise said: Looking at (dm-14) You need to run on the disk, not the device or parity will be invalid, dm-14 means you're using encryption, it should be disk 15 (or cache if there isn't one), but would need the diags to be sure. Quote Link to comment
linenoise Posted January 25, 2021 Author Share Posted January 25, 2021 Here is the diag file. tower-diagnostics-20210125-1034.zip Thanks Quote Link to comment
linenoise Posted January 25, 2021 Author Share Posted January 25, 2021 I don't have a disk 15, so I am am guessing the cache drive. I put the array in maint mode and ran the xfs_repair found in Main > cache > check file system. The following is the output I got using the default settings with -n flag. Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ALERT: The filesystem has valuable metadata changes in a log which is being ignored because the -n option was used. Expect spurious inconsistencies which may be resolved by first mounting the filesystem to replay the log. - scan filesystem freespace and inode maps... sb_ifree 91730, counted 91613 sb_fdblocks 216766133, counted 218441001 - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 bad CRC for inode 797857 bad CRC for inode 797857, would rewrite would have cleared inode 797857 - agno = 1 - agno = 2 Metadata corruption detected at 0x459730, xfs_dir3_block block 0x3a8aa5b8/0x1000 bad directory block magic # 0x58444433 in block 0 for directory inode 1078977760 corrupt block 0 in directory inode 1078977760 would junk block no . entry for directory 1078977760 no .. entry for directory 1078977760 problem with directory contents in inode 1078977760 would have cleared inode 1078977760 imap claims in-use inode 1082884434 is free, would correct imap imap claims in-use inode 1082884435 is free, would correct imap imap claims in-use inode 1082884436 is free, would correct imap imap claims in-use inode 1082884437 is free, would correct imap imap claims in-use inode 1082884438 is free, would correct imap imap claims in-use inode 1082884439 is free, would correct imap imap claims in-use inode 1082884440 is free, would correct imap imap claims in-use inode 1082884441 is free, would correct imap .... I deleted a bunch of these lines which where just incrementing through inode picking up with an intresting bit imap claims in-use inode 1082924397 is free, would correct imap - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 1 - agno = 3 - agno = 2 - agno = 0 bad CRC for inode 797857, would rewrite would have cleared inode 797857 bad directory block magic # 0x58444433 in block 0 for directory inode 1078977760 corrupt block 0 in directory inode 1078977760 would junk block no . entry for directory 1078977760 no .. entry for directory 1078977760 problem with directory contents in inode 1078977760 would have cleared inode 1078977760 entry "tmp" in shortform directory 546483779 references free inode 1078977760 would have junked entry "tmp" in directory inode 546483779 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... entry "tmp" in shortform directory inode 546483779 points to free inode 1078977760 would junk entry - traversal finished ... - moving disconnected inodes to lost+found ... disconnected inode 1077444767, would move to lost+found disconnected inode 1077520792, would move to lost+found disconnected inode 1078977881, would move to lost+found disconnected inode 1082643091, would move to lost+found ... Ending with this disconnected inode 1082924396, would move to lost+found disconnected inode 1082924397, would move to lost+found Phase 7 - verify link counts... would have reset inode 546483779 nlinks from 3 to 2 No modify flag set, skipping filesystem flush and exiting. Quote Link to comment
JorgeB Posted January 25, 2021 Share Posted January 25, 2021 Run without -n or nothing will be done. Quote Link to comment
linenoise Posted January 25, 2021 Author Share Posted January 25, 2021 Just to document my steps for anyone else looking at this. Removed -n flag got this message. Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. Ran with -L flag got this in return Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ALERT: The filesystem has valuable metadata changes in a log which is being destroyed because the -L option was used. - scan filesystem freespace and inode maps... sb_ifree 91730, counted 91613 sb_fdblocks 216766133, counted 218441001 - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 bad CRC for inode 797857 bad CRC for inode 797857, will rewrite cleared inode 797857 - agno = 1 - agno = 2 Metadata corruption detected at 0x459730, xfs_dir3_block block 0x3a8aa5b8/0x1000 bad directory block magic # 0x58444433 in block 0 for directory inode 1078977760 corrupt block 0 in directory inode 1078977760 will junk block no . entry for directory 1078977760 no .. entry for directory 1078977760 problem with directory contents in inode 1078977760 cleared inode 1078977760 correcting imap correcting imap correcting imap correcting imap correcting imap correcting imap -------- deleted receptive message ------ Will spin up without maintenance mode and see if this fixes the crashes. Hop the log file wasn't too important since I had it destroyed to continue with repair. Quote Link to comment
superboki Posted January 27, 2021 Share Posted January 27, 2021 hi did you solve your problem? Quote Link to comment
juchong Posted February 18, 2021 Share Posted February 18, 2021 I'm having a similar issue as well... What's going on?! Quote Link to comment
linenoise Posted October 17, 2021 Author Share Posted October 17, 2021 Closing the loop on this. It looks like the kernal panic was a result of using custom docker IP addresses for some dockers namely: Nginx reverse proxy manager, nextcloud, and PiHole. It has something to do with the way Unraid manages and bridges custom IP address on a single ethernet port. I got frustrated with trying to get it to work inside unraid and just moved the applications to another server that was running portainer. I am now in the process of building a new unraid server which will have 1 x 10gig fiber card, and 4 x 1gig copper ethernet ports. I'm getting ready to post the best way to set up docker to take advantage of the multiple ports. i.e., stream movies, share files, over the 10gb port, use 1 x gb port as an unraid management port, split the remaining 3 x 1 gb ports between dockers and VMs but not sure how to do this. Another question I have is, if I install portainer and use that for the docker manager will that bypass the limitations of the unraid built in docker? I am going to post this to a new thread. Will link the thread back here after I create it. Quote Link to comment
linenoise Posted October 17, 2021 Author Share Posted October 17, 2021 Here is the link to my new server build thread about the physical ports and docker I mentioned above in case your looking. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.