Kernel panic since 6.9.1 upgrade

Gnur · April 4, 2021

Hi, I have been dealing with kernel panics since 6.9.1 upgrade. I need help to identify and fix the issue.

Strangely it only happens during the night.

Any help will be appreciate,

Best regards,

André Rung

tower-diagnostics-20210404-1044.zip

JorgeB · April 5, 2021

Macvlan call traces are usually the result of having dockers with a custom IP address, more info below.

Also there might be a fix for this in v6.9.2.

bonienl · April 5, 2021

Unraid 6.9.2 will include a kernel patch, which hopefully addresses these macvlan call traces.

Gnur · April 5, 2021

I just got a kernel panic again, hopefully I have enabled the syslog server, so I have the syslog, hope it helps.

Best regards,

André Rung

syslog-192.168.0.175.log

bonienl · April 5, 2021

29 minutes ago, Gnur said:

I just got a kernel panic again, hopefully I have enabled the syslog server, so I have the syslog, hope it helps.

Best regards,

André Rung

syslog-192.168.0.175.log 195.22 kB · 0 downloads

You have a layer 2 loop in your Unraid connection. Likely two or more interfaces are configured in the same bridge group, causing the loop.

Either disconnect physical interfaces or reconfigure Unraid to use bonding when multiple interfaces are wanted.

Gnur · April 5, 2021

5 minutes ago, bonienl said:

You have a layer 2 loop in your Unraid connection. Likely two or more interfaces are configured in the same bridge group, causing the loop.

Either disconnect physical interfaces or reconfigure Unraid to use bonding when multiple interfaces are wanted.

I do have bond configured, I may have it wrong...

image.png.7f3008d4708b8be15f214ebd4955f69b.png

Best regards,

André Rung

bonienl · April 5, 2021

balance-rr (round-robin) requires a switch to support this mode, better change to another balance mode, chose either mode 5 or 6, which works independent of the switch.

Gnur · April 5, 2021

Ok, I have switched to option (5). Let's see if it does any difference.

Thank you,

André Rung

Gnur · April 6, 2021

Hello, no luck... kernel panic again this morning.

Apr  6 02:00:14 Tower emhttpd: read SMART /dev/sdi
Apr  6 02:00:20 Tower emhttpd: read SMART /dev/sdh
Apr  6 02:00:27 Tower emhttpd: read SMART /dev/sdk
Apr  6 02:00:33 Tower emhttpd: read SMART /dev/sdd
Apr  6 02:00:42 Tower emhttpd: read SMART /dev/sde
Apr  6 02:00:49 Tower emhttpd: read SMART /dev/sdf
Apr  6 02:16:16 Tower emhttpd: spinning down /dev/sde
Apr  6 02:16:18 Tower emhttpd: spinning down /dev/sdd
Apr  6 02:16:18 Tower emhttpd: spinning down /dev/sdf
Apr  6 02:26:51 Tower emhttpd: spinning down /dev/sdk
Apr  6 02:28:42 Tower emhttpd: spinning down /dev/sdi
Apr  6 02:28:43 Tower emhttpd: spinning down /dev/sdh
Apr  6 02:58:10 Tower emhttpd: read SMART /dev/sdf
Apr  6 03:13:11 Tower emhttpd: spinning down /dev/sdf
Apr  6 03:31:07 Tower emhttpd: read SMART /dev/sdd
Apr  6 03:31:15 Tower kernel: XFS (md4): Metadata corruption detected at xfs_dinode_verify+0xa3/0x581 [xfs], inode 0x86ecaf8b dinode
Apr  6 03:31:15 Tower kernel: XFS (md4): Unmount and run xfs_repair
Apr  6 03:31:15 Tower kernel: XFS (md4): First 128 bytes of corrupted metadata buffer:
Apr  6 03:31:15 Tower kernel: 00000000: 49 4e 41 f8 03 01 00 00 00 00 00 63 00 00 00 64  INA........c...d
Apr  6 03:31:15 Tower kernel: 00000010: 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00  ................
Apr  6 03:31:15 Tower kernel: 00000020: 60 63 d6 e8 0e 05 5a fa 60 63 d6 e8 0e 33 21 ed  `c....Z.`c...3!.
Apr  6 03:31:15 Tower kernel: 00000030: 60 63 d6 e8 0e 33 21 ed 00 00 00 00 00 00 00 1b  `c...3!.........
Apr  6 03:31:15 Tower kernel: 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Apr  6 03:31:15 Tower kernel: 00000050: 00 00 00 02 00 00 00 00 00 00 00 00 a5 a7 6c 21  ..............l!
Apr  6 03:31:15 Tower kernel: 00000060: ff ff ff ff 0a 41 8c 46 00 00 00 00 00 00 00 04  .....A.F........
Apr  6 03:31:15 Tower kernel: 00000070: 00 00 00 1b 00 1a f3 e3 00 00 00 00 00 00 00 00  ................
Apr  6 03:31:15 Tower kernel: XFS (md4): Metadata corruption detected at xfs_dinode_verify+0xa3/0x581 [xfs], inode 0x86ecaf8b dinode
Apr  6 03:31:15 Tower kernel: XFS (md4): Unmount and run xfs_repair
Apr  6 03:31:15 Tower kernel: XFS (md4): First 128 bytes of corrupted metadata buffer:
Apr  6 03:31:15 Tower kernel: 00000000: 49 4e 41 f8 03 01 00 00 00 00 00 63 00 00 00 64  INA........c...d
Apr  6 03:31:15 Tower kernel: 00000010: 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00  ................
Apr  6 03:31:15 Tower kernel: 00000020: 60 63 d6 e8 0e 05 5a fa 60 63 d6 e8 0e 33 21 ed  `c....Z.`c...3!.
Apr  6 03:31:15 Tower kernel: 00000030: 60 63 d6 e8 0e 33 21 ed 00 00 00 00 00 00 00 1b  `c...3!.........
Apr  6 03:31:15 Tower kernel: 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Apr  6 03:31:15 Tower kernel: 00000050: 00 00 00 02 00 00 00 00 00 00 00 00 a5 a7 6c 21  ..............l!
Apr  6 03:31:15 Tower kernel: 00000060: ff ff ff ff 0a 41 8c 46 00 00 00 00 00 00 00 04  .....A.F........
Apr  6 03:31:15 Tower kernel: 00000070: 00 00 00 1b 00 1a f3 e3 00 00 00 00 00 00 00 00  ................
Apr  6 03:31:15 Tower emhttpd: read SMART /dev/sde
Apr  6 03:32:05 Tower emhttpd: read SMART /dev/sdi
Apr  6 03:32:11 Tower emhttpd: read SMART /dev/sdf
Apr  6 03:32:20 Tower emhttpd: read SMART /dev/sdj
Apr  6 03:32:22 Tower emhttpd: read SMART /dev/sdg
Apr  6 03:34:28 Tower emhttpd: read SMART /dev/sdh
Apr  6 03:40:09 Tower crond[1835]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Apr  6 03:46:16 Tower emhttpd: spinning down /dev/sdd
Apr  6 03:46:16 Tower emhttpd: spinning down /dev/sde
Apr  6 03:47:12 Tower emhttpd: spinning down /dev/sdf
Apr  6 03:56:41 Tower emhttpd: spinning down /dev/sdi
Apr  6 03:56:55 Tower emhttpd: spinning down /dev/sdj
Apr  6 03:56:55 Tower emhttpd: spinning down /dev/sdh
Apr  6 03:56:55 Tower emhttpd: spinning down /dev/sdg
Apr  6 10:20:24 Tower root: Delaying execution of fix common problems scan for 10 minutes

Anyway I can workaround this issue?

Best regards,

André Rung

JorgeB · April 6, 2021

9 minutes ago, Gnur said:

XFS (md4): Metadata corruption detected at

You need to check filesystem on disk4.

Gnur · April 6, 2021

Ok, filesystem checked with -n flag:

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
bad CRC for inode 2263658379
bad CRC for inode 2263658379, would rewrite
would have cleared inode 2263658379
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 3
        - agno = 1
        - agno = 2
bad CRC for inode 2263658379, would rewrite
would have cleared inode 2263658379
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
Metadata corruption detected at 0x469ae8, inode 0x86ecaf8b dinode
couldn't map inode 2263658379, err = 117
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
disconnected inode 2263658380, would move to lost+found
Phase 7 - verify link counts...
Metadata corruption detected at 0x469ae8, inode 0x86ecaf8b dinode
couldn't map inode 2263658379, err = 117, can't compare link counts
No modify flag set, skipping filesystem flush and exiting.

What now? Should I run it again with a different check?

Best Regards,

André

JorgeB · April 6, 2021

Run it again without -n or nothing will be done, if it asks for -L use it.

Gnur · April 6, 2021

OK... done...

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
bad CRC for inode 2263658379
bad CRC for inode 2263658379, will rewrite
cleared inode 2263658379
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done

Should I start the array now?

Best regards,

André Rung

JorgeB · April 6, 2021

1 minute ago, Gnur said:

Should I start the array now?

yes

Gnur · April 6, 2021

Array started, any other check or should I just wait for the next kernel panic and keep checking the filesystems?

Best regards,

André Rung

JorgeB · April 6, 2021

20 minutes ago, Gnur said:

keep checking the filesystems?

You should check a filesystem if there's an error on syslog about it.

Gnur · April 6, 2021

OK, thank you. I'll keep an eye on the syslog.

Best regards,

André Rung

trurl · April 6, 2021

8 hours ago, Gnur said:

moving disconnected inodes to lost+found

8 hours ago, Gnur said:

Array started, any other check

Did you check your new lost+found share that resulted from the repair?

Gnur · April 6, 2021

Nope, how do I do that? I checked /mnt/disk4 and I didn't find anything... should I put it back in maint mode?

trurl · April 6, 2021

44 minutes ago, trurl said:

lost+found share

It is a user share just like all top level folders on your disks. If you don't have a user share by that name then I guess repair didn't put anything there.

Gnur · April 6, 2021

Nope, there is no lost+found on /mnt/disk4

Kernel panic since 6.9.1 upgrade

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation