Gnur Posted April 4, 2021 Share Posted April 4, 2021 Hi, I have been dealing with kernel panics since 6.9.1 upgrade. I need help to identify and fix the issue. Strangely it only happens during the night. Any help will be appreciate, Best regards, André Rung tower-diagnostics-20210404-1044.zip Quote Link to comment
JorgeB Posted April 5, 2021 Share Posted April 5, 2021 Macvlan call traces are usually the result of having dockers with a custom IP address, more info below. Also there might be a fix for this in v6.9.2. Quote Link to comment
bonienl Posted April 5, 2021 Share Posted April 5, 2021 Unraid 6.9.2 will include a kernel patch, which hopefully addresses these macvlan call traces. Quote Link to comment
Gnur Posted April 5, 2021 Author Share Posted April 5, 2021 I just got a kernel panic again, hopefully I have enabled the syslog server, so I have the syslog, hope it helps. Best regards, André Rung syslog-192.168.0.175.log Quote Link to comment
bonienl Posted April 5, 2021 Share Posted April 5, 2021 29 minutes ago, Gnur said: I just got a kernel panic again, hopefully I have enabled the syslog server, so I have the syslog, hope it helps. Best regards, André Rung syslog-192.168.0.175.log 195.22 kB · 0 downloads You have a layer 2 loop in your Unraid connection. Likely two or more interfaces are configured in the same bridge group, causing the loop. Either disconnect physical interfaces or reconfigure Unraid to use bonding when multiple interfaces are wanted. Quote Link to comment
Gnur Posted April 5, 2021 Author Share Posted April 5, 2021 5 minutes ago, bonienl said: You have a layer 2 loop in your Unraid connection. Likely two or more interfaces are configured in the same bridge group, causing the loop. Either disconnect physical interfaces or reconfigure Unraid to use bonding when multiple interfaces are wanted. I do have bond configured, I may have it wrong... Best regards, André Rung Quote Link to comment
bonienl Posted April 5, 2021 Share Posted April 5, 2021 balance-rr (round-robin) requires a switch to support this mode, better change to another balance mode, chose either mode 5 or 6, which works independent of the switch. Quote Link to comment
Gnur Posted April 5, 2021 Author Share Posted April 5, 2021 Ok, I have switched to option (5). Let's see if it does any difference. Thank you, André Rung Quote Link to comment
Gnur Posted April 6, 2021 Author Share Posted April 6, 2021 Hello, no luck... kernel panic again this morning. Apr 6 02:00:14 Tower emhttpd: read SMART /dev/sdi Apr 6 02:00:20 Tower emhttpd: read SMART /dev/sdh Apr 6 02:00:27 Tower emhttpd: read SMART /dev/sdk Apr 6 02:00:33 Tower emhttpd: read SMART /dev/sdd Apr 6 02:00:42 Tower emhttpd: read SMART /dev/sde Apr 6 02:00:49 Tower emhttpd: read SMART /dev/sdf Apr 6 02:16:16 Tower emhttpd: spinning down /dev/sde Apr 6 02:16:18 Tower emhttpd: spinning down /dev/sdd Apr 6 02:16:18 Tower emhttpd: spinning down /dev/sdf Apr 6 02:26:51 Tower emhttpd: spinning down /dev/sdk Apr 6 02:28:42 Tower emhttpd: spinning down /dev/sdi Apr 6 02:28:43 Tower emhttpd: spinning down /dev/sdh Apr 6 02:58:10 Tower emhttpd: read SMART /dev/sdf Apr 6 03:13:11 Tower emhttpd: spinning down /dev/sdf Apr 6 03:31:07 Tower emhttpd: read SMART /dev/sdd Apr 6 03:31:15 Tower kernel: XFS (md4): Metadata corruption detected at xfs_dinode_verify+0xa3/0x581 [xfs], inode 0x86ecaf8b dinode Apr 6 03:31:15 Tower kernel: XFS (md4): Unmount and run xfs_repair Apr 6 03:31:15 Tower kernel: XFS (md4): First 128 bytes of corrupted metadata buffer: Apr 6 03:31:15 Tower kernel: 00000000: 49 4e 41 f8 03 01 00 00 00 00 00 63 00 00 00 64 INA........c...d Apr 6 03:31:15 Tower kernel: 00000010: 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00 ................ Apr 6 03:31:15 Tower kernel: 00000020: 60 63 d6 e8 0e 05 5a fa 60 63 d6 e8 0e 33 21 ed `c....Z.`c...3!. Apr 6 03:31:15 Tower kernel: 00000030: 60 63 d6 e8 0e 33 21 ed 00 00 00 00 00 00 00 1b `c...3!......... Apr 6 03:31:15 Tower kernel: 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Apr 6 03:31:15 Tower kernel: 00000050: 00 00 00 02 00 00 00 00 00 00 00 00 a5 a7 6c 21 ..............l! Apr 6 03:31:15 Tower kernel: 00000060: ff ff ff ff 0a 41 8c 46 00 00 00 00 00 00 00 04 .....A.F........ Apr 6 03:31:15 Tower kernel: 00000070: 00 00 00 1b 00 1a f3 e3 00 00 00 00 00 00 00 00 ................ Apr 6 03:31:15 Tower kernel: XFS (md4): Metadata corruption detected at xfs_dinode_verify+0xa3/0x581 [xfs], inode 0x86ecaf8b dinode Apr 6 03:31:15 Tower kernel: XFS (md4): Unmount and run xfs_repair Apr 6 03:31:15 Tower kernel: XFS (md4): First 128 bytes of corrupted metadata buffer: Apr 6 03:31:15 Tower kernel: 00000000: 49 4e 41 f8 03 01 00 00 00 00 00 63 00 00 00 64 INA........c...d Apr 6 03:31:15 Tower kernel: 00000010: 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00 ................ Apr 6 03:31:15 Tower kernel: 00000020: 60 63 d6 e8 0e 05 5a fa 60 63 d6 e8 0e 33 21 ed `c....Z.`c...3!. Apr 6 03:31:15 Tower kernel: 00000030: 60 63 d6 e8 0e 33 21 ed 00 00 00 00 00 00 00 1b `c...3!......... Apr 6 03:31:15 Tower kernel: 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Apr 6 03:31:15 Tower kernel: 00000050: 00 00 00 02 00 00 00 00 00 00 00 00 a5 a7 6c 21 ..............l! Apr 6 03:31:15 Tower kernel: 00000060: ff ff ff ff 0a 41 8c 46 00 00 00 00 00 00 00 04 .....A.F........ Apr 6 03:31:15 Tower kernel: 00000070: 00 00 00 1b 00 1a f3 e3 00 00 00 00 00 00 00 00 ................ Apr 6 03:31:15 Tower emhttpd: read SMART /dev/sde Apr 6 03:32:05 Tower emhttpd: read SMART /dev/sdi Apr 6 03:32:11 Tower emhttpd: read SMART /dev/sdf Apr 6 03:32:20 Tower emhttpd: read SMART /dev/sdj Apr 6 03:32:22 Tower emhttpd: read SMART /dev/sdg Apr 6 03:34:28 Tower emhttpd: read SMART /dev/sdh Apr 6 03:40:09 Tower crond[1835]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null Apr 6 03:46:16 Tower emhttpd: spinning down /dev/sdd Apr 6 03:46:16 Tower emhttpd: spinning down /dev/sde Apr 6 03:47:12 Tower emhttpd: spinning down /dev/sdf Apr 6 03:56:41 Tower emhttpd: spinning down /dev/sdi Apr 6 03:56:55 Tower emhttpd: spinning down /dev/sdj Apr 6 03:56:55 Tower emhttpd: spinning down /dev/sdh Apr 6 03:56:55 Tower emhttpd: spinning down /dev/sdg Apr 6 10:20:24 Tower root: Delaying execution of fix common problems scan for 10 minutes Anyway I can workaround this issue? Best regards, André Rung Quote Link to comment
JorgeB Posted April 6, 2021 Share Posted April 6, 2021 9 minutes ago, Gnur said: XFS (md4): Metadata corruption detected at You need to check filesystem on disk4. Quote Link to comment
Gnur Posted April 6, 2021 Author Share Posted April 6, 2021 Ok, filesystem checked with -n flag: Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 bad CRC for inode 2263658379 bad CRC for inode 2263658379, would rewrite would have cleared inode 2263658379 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 3 - agno = 1 - agno = 2 bad CRC for inode 2263658379, would rewrite would have cleared inode 2263658379 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... Metadata corruption detected at 0x469ae8, inode 0x86ecaf8b dinode couldn't map inode 2263658379, err = 117 - traversal finished ... - moving disconnected inodes to lost+found ... disconnected inode 2263658380, would move to lost+found Phase 7 - verify link counts... Metadata corruption detected at 0x469ae8, inode 0x86ecaf8b dinode couldn't map inode 2263658379, err = 117, can't compare link counts No modify flag set, skipping filesystem flush and exiting. What now? Should I run it again with a different check? Best Regards, André Quote Link to comment
JorgeB Posted April 6, 2021 Share Posted April 6, 2021 Run it again without -n or nothing will be done, if it asks for -L use it. Quote Link to comment
Gnur Posted April 6, 2021 Author Share Posted April 6, 2021 OK... done... Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 bad CRC for inode 2263658379 bad CRC for inode 2263658379, will rewrite cleared inode 2263658379 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... done Should I start the array now? Best regards, André Rung Quote Link to comment
JorgeB Posted April 6, 2021 Share Posted April 6, 2021 1 minute ago, Gnur said: Should I start the array now? yes Quote Link to comment
Gnur Posted April 6, 2021 Author Share Posted April 6, 2021 Array started, any other check or should I just wait for the next kernel panic and keep checking the filesystems? Best regards, André Rung Quote Link to comment
JorgeB Posted April 6, 2021 Share Posted April 6, 2021 20 minutes ago, Gnur said: keep checking the filesystems? You should check a filesystem if there's an error on syslog about it. Quote Link to comment
Gnur Posted April 6, 2021 Author Share Posted April 6, 2021 OK, thank you. I'll keep an eye on the syslog. Best regards, André Rung Quote Link to comment
trurl Posted April 6, 2021 Share Posted April 6, 2021 8 hours ago, Gnur said: moving disconnected inodes to lost+found 8 hours ago, Gnur said: Array started, any other check Did you check your new lost+found share that resulted from the repair? Quote Link to comment
Gnur Posted April 6, 2021 Author Share Posted April 6, 2021 Nope, how do I do that? I checked /mnt/disk4 and I didn't find anything... should I put it back in maint mode? Quote Link to comment
trurl Posted April 6, 2021 Share Posted April 6, 2021 44 minutes ago, trurl said: lost+found share It is a user share just like all top level folders on your disks. If you don't have a user share by that name then I guess repair didn't put anything there. Quote Link to comment
Gnur Posted April 6, 2021 Author Share Posted April 6, 2021 Nope, there is no lost+found on /mnt/disk4 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.