turnma Posted April 10, 2024 Posted April 10, 2024 Last year I upgraded my server to 6.12.x and immediately suffered from stability issues due to btrfs. I downgraded back to 6.11.x and the server has been online without interruption for another 8 months. On Monday I upgraded again, this time recreating the cache pool as zfs. The server was stable for about 24 hours but it died in the early hours of this morning, although clearly not with a btrfs issue this time. I was unable to contact the server over the network and had to force a reboot. I've grabbed and attached diagnostics, although the syslog data is form post-reboot. I have a copy of syslog data that I had sent to an external server, so I've pulled about 10K lines from that and also attached it here. Hopefully there's something in here to give pointers. Again, I'd stress that the server has been running without the slightest hiccup on UPS for well over a year, interrupted only by the issues that I had during the last aborted attempt to move to 6.12. It would be really nice if I didn't have to downgrade again! thanks tower-diagnostics-20240410-1022.zip syslogresults_20240410_102435.zip Quote
JorgeB Posted April 10, 2024 Posted April 10, 2024 There are Macvlan related call traces, and those will usually end up crashing the server, switching to ipvlan should fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enabled, top right)), then reboot. Quote
turnma Posted April 10, 2024 Author Posted April 10, 2024 Thanks, I'll make that switch. I knew about that from the last upgrade, but the advice back then was also steering towards a second NIC (which I added at the time) and so this time I wasn't changing anything until I was more sure it was necessary. Quote
turnma Posted April 15, 2024 Author Posted April 15, 2024 The server was stable until today after making the ipvlan switch. Today I spotted that my containers were effectively unreachable (web servers responsive but not returning content post logon/timing out). I found that if I tried to ls /mnt that the terminal would hang, but the same on /mnt/disk1 was fine. Nothing in syslog at the time of the issues being seen, but issue much earlier in the day (when things had seemed okay still), e.g. PANIC: zfs: removing nonexistent segment from range tree I couldn't reboot the server because issuing a reboot would also hang, so eventually had to do a hard reset. After the reset the array got stuck starting with the cache pool the apparent culprit. I rebooted (which was now possible) with a plan to mount the cache read-only, but after the reboot the array has started fine. This all feels like it's related to the cache pool (single SSD), but again I'd had zero problems before the 6.12.x upgrade when on btrfs and only moved to zfs to get around the apparent issues with btrfs on 6.12.x. So my question at this point, assuming that zfs doesn't like something about my hardware that btrfs on 6.11.x was fine with, is whether I should consider reformatting the cache pool to xfs instead. thanks Quote
turnma Posted April 15, 2024 Author Posted April 15, 2024 Just to add, server has been back online for under an hour, symptoms with /mnt/user access hanging (and admin UI not being available etc.) have returned. I can't run diagnostics because it also hangs, but there's nothing new in syslog since the server/disk became unresponsive. In other scenarios I'd think that this was likely to be a disk issue, but again it seems like a massive coincidence that I had no issues for the 6 months before the upgrade and the zfs change. Quote
JorgeB Posted April 15, 2024 Posted April 15, 2024 37 minutes ago, turnma said: PANIC: zfs: removing nonexistent segment from range tree This suggests a problem with a zfs filesystem, since there's no fsck for zfs, you would need to backup and recreate the pool, do you have more than one zfs filesystem? Quote
turnma Posted April 15, 2024 Author Posted April 15, 2024 No, just the one. I only created this when I moved from btrfs last week, so if zfs is only going to last a week at a time then am I better recreating the pool as xfs? Quote
JorgeB Posted April 15, 2024 Posted April 15, 2024 If you changed from btrfs to zfs because you were having issues with btrfs, there may be an underlying hardware issue if now zfs also has issues, but you can try xfs. Quote
turnma Posted April 15, 2024 Author Posted April 15, 2024 I did move for that reason, but also btrfs was problem-free for two years before the upgrade, so if there's a hardware issue then it's only become apparent since the upgrade. Thanks, I'll try xfs. Quote
turnma Posted April 22, 2024 Author Posted April 22, 2024 One week stable, so hopefully looking positive for xfs.🤞 Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.