Server died a day after 6.12.10 upgrade

April 10, 20242 yr

Last year I upgraded my server to 6.12.x and immediately suffered from stability issues due to btrfs. I downgraded back to 6.11.x and the server has been online without interruption for another 8 months. On Monday I upgraded again, this time recreating the cache pool as zfs. The server was stable for about 24 hours but it died in the early hours of this morning, although clearly not with a btrfs issue this time. I was unable to contact the server over the network and had to force a reboot. I've grabbed and attached diagnostics, although the syslog data is form post-reboot. I have a copy of syslog data that I had sent to an external server, so I've pulled about 10K lines from that and also attached it here.

Hopefully there's something in here to give pointers. Again, I'd stress that the server has been running without the slightest hiccup on UPS for well over a year, interrupted only by the issues that I had during the last aborted attempt to move to 6.12. It would be really nice if I didn't have to downgrade again!

thanks

tower-diagnostics-20240410-1022.zip syslogresults_20240410_102435.zip

Quote

April 10, 20242 yr

Community Expert

There are Macvlan related call traces, and those will usually end up crashing the server, switching to ipvlan should fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enabled, top right)), then reboot.

Quote

April 10, 20242 yr

Author

Thanks, I'll make that switch. I knew about that from the last upgrade, but the advice back then was also steering towards a second NIC (which I added at the time) and so this time I wasn't changing anything until I was more sure it was necessary.

Quote

April 15, 20242 yr

Author

The server was stable until today after making the ipvlan switch. Today I spotted that my containers were effectively unreachable (web servers responsive but not returning content post logon/timing out). I found that if I tried to ls /mnt that the terminal would hang, but the same on /mnt/disk1 was fine. Nothing in syslog at the time of the issues being seen, but issue much earlier in the day (when things had seemed okay still), e.g.

PANIC: zfs: removing nonexistent segment from range tree

I couldn't reboot the server because issuing a reboot would also hang, so eventually had to do a hard reset. After the reset the array got stuck starting with the cache pool the apparent culprit. I rebooted (which was now possible) with a plan to mount the cache read-only, but after the reboot the array has started fine. This all feels like it's related to the cache pool (single SSD), but again I'd had zero problems before the 6.12.x upgrade when on btrfs and only moved to zfs to get around the apparent issues with btrfs on 6.12.x. So my question at this point, assuming that zfs doesn't like something about my hardware that btrfs on 6.11.x was fine with, is whether I should consider reformatting the cache pool to xfs instead.

thanks

Quote

April 15, 20242 yr

Author

Just to add, server has been back online for under an hour, symptoms with /mnt/user access hanging (and admin UI not being available etc.) have returned. I can't run diagnostics because it also hangs, but there's nothing new in syslog since the server/disk became unresponsive. In other scenarios I'd think that this was likely to be a disk issue, but again it seems like a massive coincidence that I had no issues for the 6 months before the upgrade and the zfs change.

Quote

April 15, 20242 yr

Community Expert

37 minutes ago, turnma said:

PANIC: zfs: removing nonexistent segment from range tree

This suggests a problem with a zfs filesystem, since there's no fsck for zfs, you would need to backup and recreate the pool, do you have more than one zfs filesystem?

Quote

April 15, 20242 yr

Author

No, just the one. I only created this when I moved from btrfs last week, so if zfs is only going to last a week at a time then am I better recreating the pool as xfs?

Quote

April 15, 20242 yr

Community Expert

If you changed from btrfs to zfs because you were having issues with btrfs, there may be an underlying hardware issue if now zfs also has issues, but you can try xfs.

Quote

April 15, 20242 yr

Author

I did move for that reason, but also btrfs was problem-free for two years before the upgrade, so if there's a hardware issue then it's only become apparent since the upgrade. Thanks, I'll try xfs.

Quote

April 22, 20242 yr

Author

One week stable, so hopefully looking positive for xfs.🤞

Quote

Server died a day after 6.12.10 upgrade

Featured Replies

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)