BTRFS error and Read-only cache since updating to 6.12

Pedro Rocha · December 7, 2023

One more, upgraded from 6.11.5 to 6.12.6 without a hiccup, for a few hours...

Quote

Dec 7 02:22:27 Tower kernel: BTRFS error (device loop2): block=837271552 write time tree block corruption detected
Dec 7 02:22:27 Tower kernel: BTRFS: error (device loop2) in btrfs_commit_transaction:2494: errno=-5 IO failure (Error while writing out transaction)
Dec 7 02:22:27 Tower kernel: BTRFS info (device loop2: state E): forced readonly
Dec 7 02:22:27 Tower kernel: BTRFS warning (device loop2: state E): Skipping commit of aborted transaction.
Dec 7 02:22:27 Tower kernel: BTRFS: error (device loop2: state EA) in cleanup_transaction:1992: errno=-5 IO failure
D

Will revert now :(

tower-diagnostics-20231207-0254.zip

itimpi · December 7, 2023

That error indicates a corrupt docker image file and is very unlikely to be related to the upgrade.

also your diagnostics show that you are using macvlan for your docker networking. This is known to cause crashes in the 6.12.x series. To avoid this you should either switch to using ipvlan or follow the directions in the 6.12.4 release notes to continue using macvlan (in particular disabling bridging on eth0).

Pedro Rocha · December 7, 2023

2 hours ago, itimpi said:

That error indicates a corrupt docker image file and is very unlikely to be related to the upgrade.

also your diagnostics show that you are using macvlan for your docker networking. This is known to cause crashes in the 6.12.x series. To avoid this you should either switch to using ipvlan or follow the directions in the 6.12.4 release notes to continue using macvlan (in particular disabling bridging on eth0).

I thought that also, even with the suspicious timing, until I read this thread - others seem to having the exact same issue after upgrading...

Just ran a btrfs scrub (back at 6.11.5), with no errors.

Can the new 6.12.6 kernel/btrfs version have improved checks, that do not exist in 6.11.5?

Anyway, if everything is ok for a few days, I'll try the upgrade again (disabling bridging).

I had macvlan issues in a previous upgrade (6.12.0), but not like these - this time everything was working wonderfully, up until the corruption errors

Thanks for taking the time to look into this!

eir3ann · December 8, 2023

Just wanted to add +1 for me as well. Unraid seems to be running just fine, but the dockers all become unresponsive after upgrading to 6.12.x- currently running 6.12.6.

For those of you who have downgraded to 6.11.5 for a while now has that fixed the issue for you completely? My server runs fine after a restart- it takes a few days before it stops responding so I'm hoping for a little feedback from someone who has been running after downgrading for a bit before I spend the time downgrading.

Thanks!

JorgeB · December 8, 2023

There have been some possible "write time tree block corruption detected" false positives with the newer kernels, they seem to not be so frequent since 6.12.4+, but it can still happen, instead of downgrading suggest recreating the docker image and/or switching the pool to zfs.

eir3ann · December 8, 2023

3 hours ago, JorgeB said:

There have been some possible "write time tree block corruption detected" false positives with the newer kernels, they seem to not be so frequent since 6.12.4+, but it can still happen, instead of downgrading suggest recreating the docker image and/or switching the pool to zfs.

Thanks for the response- I just recreated the docker image and I'll see how that goes for a few days. I appreciate it!

Pedro Rocha · December 11, 2023

On 12/8/2023 at 2:57 PM, eir3ann said:

Thanks for the response- I just recreated the docker image and I'll see how that goes for a few days. I appreciate it!

Hi - how is it going with the new image?

Everything ok?

Gareth321 · January 2

I had this issue since the upgrade to 6.12. I tried everything. unRAID would lock up every 1-3 days. It eventually resulted in database corruption in my containers. The only thing which solved it is changing the cache drives from BTRFS to ZFS.

wayner · January 2

8 minutes ago, Gareth321 said:

I had this issue since the upgrade to 6.12. I tried everything. unRAID would lock up every 1-3 days. It eventually resulted in database corruption in my containers. The only thing which solved it is changing the cache drives from BTRFS to ZFS.

You and at least a few dozen others. I am one of them.

ghesp · January 8

Also having issues with this.

I've tried:

Backing up my cache data, formatting cache, rebuilding cache - Same issue
Downgrading back to 6.11 - Same issue
Downgrading, formatting cache as xfs, rebuilding cache - Same issue, still btrfs issues, not sure why..

Currently now upgrading back to 6.12, going to reformat cache as xfs again and rebuild by docker.img

Absolutely ridiculous..

Gareth321 · January 8

1 minute ago, ghesp said:

Also having issues with this.

I've tried:

Backing up my cache data, formatting cache, rebuilding cache - Same issue

Downgrading back to 6.11 - Same issue

Downgrading, formatting cache as xfs, rebuilding cache - Same issue, still btrfs issues, not sure why..

Currently now upgrading back to 6.12, going to reformat cache as xfs again and rebuild by docker.img

Absolutely ridiculous..

Try ZFS. Seems to have fixed this for many of us.

JorgeB · January 8

6 minutes ago, ghesp said:

Downgrading back to 6.11 - Same issue

Then you can't blame it on the upgrade, try zfs, if you also have issue with it there could be an underlying hardware issue.

turnma · March 1

Hi. Just revisiting this to see if anything has been fixed in this area since last year. I've been trying to keep an eye on release notes but haven't spotted anything relevant. As I posted year, my server was rock solid before the upgrade to 6.12.x and then disastrous post-upgrade. After downgrading it's been fine again, so far 144 days of uptime. So obviously I'm nervous about updating but don't want to be stuck on an old version forever. I see mentions of using ZFS, but I'm not quite clear on the migration path. I can't use ZFS before the upgrade, and I couldn't keep the server up long enough last time to think about moving data around to allow a migration to ZFS post-upgrade.

Thanks

JorgeB · March 2

14 hours ago, turnma said:

and I couldn't keep the server up long enough last time to think about moving data around to allow a migration to ZFS post-upgrade.

Haven't seen any btrfs kernel issues with the latest releases, and that suggests you may have a different problem, enable the syslog server and post that after upgrading to 6.12.8 if it still crashes.

turnma · March 3

Thanks. I've set up syslog to log remotely so hopefully that will capture any warning signs when I next try to upgrade.

Sptz87 · March 3

On 3/1/2024 at 7:00 PM, turnma said:

Hi. Just revisiting this to see if anything has been fixed in this area since last year. I've been trying to keep an eye on release notes but haven't spotted anything relevant. As I posted year, my server was rock solid before the upgrade to 6.12.x and then disastrous post-upgrade. After downgrading it's been fine again, so far 144 days of uptime. So obviously I'm nervous about updating but don't want to be stuck on an old version forever. I see mentions of using ZFS, but I'm not quite clear on the migration path. I can't use ZFS before the upgrade, and I couldn't keep the server up long enough last time to think about moving data around to allow a migration to ZFS post-upgrade.

Thanks

Honestly, check spaceinvader videos on upgrading cache pool to zfs. I'm using it right now after having a disastrous event with BTRFS like you, with 2TB + 1TB drives on raid0. Working flawlessly so far and the performance is leap and bounds from BTRFS.

turnma · April 8

Thanks, I'm currently emptying my cache back onto the seat so that I can upgrade and then go straight to zfs cache. Fingers crossed.

blinkito · April 9

Hi there,

One more affected over here.

The server has been running like a rock for almost years with months of uptime between maintenance reboots. Currently running 6.12.6

Note that I wasn’t using cache ssd for nothing else but caching a smb share for my main PC.

I had enough space unused on the cache ssd so I configured my appdata folder, database folder and docker image to be cached (array as secondary) and now it's the third day I see the docker service stopped when I check the server in the morning. Logs are full of brfrs errors.

I know docker is off cause I have Home Assistant with some lights automated and those lights are unresponsive in the morning that's what I confirmed when I check the server status.

I'm quite sad and frustrated to find these comments here related to the btfrs errors and the fact that switching to ZFS or downgrading is the only solution. And also seeing that most of the moderators replies are suggesting a hardware issue when it's clear that it has something to do within Unraid.

Will do the ZFS conversion on the cache ssd and report back but hope it solves the problem. Downgrading would be the last thing I'd do since I don’t want to miss the security patches from those last versions.

I have a legacy key and was planning to buy some new keys to support your new company pricing adventure since I was really happy with the software but with things like this…if the issue persists I'm really thinking of switching to another option.

Edited April 9 by blinkito

JorgeB · April 9

37 minutes ago, blinkito said:

when it's clear that it has something to do within Unraid.

Some cases have been hardware related, but there some that may not be, but note that reformatting the pool btrfs usually also fixes the problem, so no need to change to zfs if you don't want to, and it's not an Unraid issue, at most it can be a kernel issue, possibly detecting some previously undetected corruption.

Gareth321 · April 9

7 minutes ago, JorgeB said:

but note that reformatting the pool btrfs usually also fixes the problem, so no need to change to zfs if you don't want to

I would like to note that many of us reformatted to BTRFS and had the exact same corruption occur again. I formatted at least six times before someone suggested ZFS. I have since had zero issues on the same hardware, and there are dozens of reports of this same solution working for users. So the explanation that 6.12 revealed some previously undetected corruption is clearly not what’s happening here. It’s also clearly not a hardware issue in the vast majority of cases since users report no issues before 6.12, *and* report no issues once migrated to ZFS.

Edited April 9 by Gareth321

blinkito · April 9

29 minutes ago, JorgeB said:

at most it can be a kernel issue, possibly detecting some previously undetected corruption.

Like I said before, I was using the cache for an smb share and it's been since I put on the cache the docker.img and appdata that corruption has come. Before I had never experienced data corruption.

23 minutes ago, Gareth321 said:

I would like to note that many of us reformatted to BTRFS and had the exact same corruption occur again. I formatted at least six times before someone suggested ZFS. I have since had zero issues on the same hardware, and there are dozens of reports of this same solution working for users.

That's why I'm going to take the short way and reformat the cache drive to ZFS. Will do it in a couple of hours and will post back tomorrow morning to see if it's been stable through the night.

Edited April 9 by blinkito

JorgeB · April 9

28 minutes ago, Gareth321 said:

I formatted at least six times before someone suggested ZFS. I have since had zero issues on the same hardware, and there are dozens of reports of this same solution working for users.

OK, in that case it suggests a kernel issue, that for some reason only affects some users, or some kernel/hardware combination(s).

turnma · April 9

Just reporting back, 16 hours since upgrade, with cache pool now running on zfs. No issues yet.

Gareth321 · April 9

2 minutes ago, JorgeB said:

OK, in that case it suggests a kernel issue, that for some reason only affects some users, or some kernel/hardware combination(s).

Agreed. Despite how many reports there are, we still must comprise only a small minority of users with BTRFS cache. So we have resigned ourselves to the fact that this problem is intermittent and difficult to track down, and will likely not be fixed for our hardware or container combination, or whatever causes it. The good news is we have a solution: ZFS.

turnma · April 10

Sigh, hopes raised too soon. Not a btfrs error, but server died in the early hours of the morning. I'll raise a separate post, but looks like I'm cursed getting onto 6.12!

BTRFS error and Read-only cache since updating to 6.12

Recommended Posts

Link to comment

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

turnma

Gareth321

JorgeB

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation