BTRFS error and Read-only cache since updating to 6.12


Recommended Posts

One more, upgraded from 6.11.5 to 6.12.6 without a hiccup, for a few hours...

Quote

Dec  7 02:22:27 Tower kernel: BTRFS error (device loop2): block=837271552 write time tree block corruption detected
Dec  7 02:22:27 Tower kernel: BTRFS: error (device loop2) in btrfs_commit_transaction:2494: errno=-5 IO failure (Error while writing out transaction)
Dec  7 02:22:27 Tower kernel: BTRFS info (device loop2: state E): forced readonly
Dec  7 02:22:27 Tower kernel: BTRFS warning (device loop2: state E): Skipping commit of aborted transaction.
Dec  7 02:22:27 Tower kernel: BTRFS: error (device loop2: state EA) in cleanup_transaction:1992: errno=-5 IO failure
D

 

Will revert now :(

tower-diagnostics-20231207-0254.zip

Link to comment

That error indicates a corrupt docker image file and is very unlikely to be related to the upgrade.

 

also your diagnostics show that you are using macvlan for your docker networking.   This is known to cause crashes in the 6.12.x series.    To avoid this you should either switch to using ipvlan or follow the directions in the 6.12.4 release notes to continue using macvlan (in particular disabling bridging on eth0).

 

 

Link to comment
2 hours ago, itimpi said:

That error indicates a corrupt docker image file and is very unlikely to be related to the upgrade.

 

also your diagnostics show that you are using macvlan for your docker networking.   This is known to cause crashes in the 6.12.x series.    To avoid this you should either switch to using ipvlan or follow the directions in the 6.12.4 release notes to continue using macvlan (in particular disabling bridging on eth0).

 

 

I thought that also, even with the suspicious timing, until I read this thread - others seem to having the exact same issue after upgrading...

Just ran a btrfs scrub (back at 6.11.5), with no errors.

Can the new 6.12.6 kernel/btrfs version have improved checks, that do not exist in 6.11.5?

 

Anyway, if everything is ok for a few days, I'll try the upgrade again (disabling bridging).

I had macvlan issues in a previous upgrade (6.12.0), but not like these - this time everything was working wonderfully, up until the corruption errors :(

 

Thanks for taking the time to look into this!

Link to comment

Just wanted to add +1 for me as well.  Unraid seems to be running just fine, but the dockers all become unresponsive after upgrading to 6.12.x- currently running 6.12.6. 

 

For those of you who have downgraded to 6.11.5 for a while now has that fixed the issue for you completely?  My server runs fine after a restart- it takes a few days before it stops responding so I'm hoping for a little feedback from someone who has been running after downgrading for a bit before I spend the time downgrading.

 

Thanks!

Link to comment
3 hours ago, JorgeB said:

There have been some possible "write time tree block corruption detected" false positives with the newer kernels, they seem to not be so frequent since 6.12.4+, but it can still happen, instead of downgrading suggest recreating the docker image and/or switching the pool to zfs.

 

Thanks for the response- I just recreated the docker image and I'll see how that goes for a few days.  I appreciate it!

Link to comment
  • 4 weeks later...

I had this issue since the upgrade to 6.12. I tried everything. unRAID would lock up every 1-3 days. It eventually resulted in database corruption in my containers. The only thing which solved it is changing the cache drives from BTRFS to ZFS.

Link to comment
8 minutes ago, Gareth321 said:

I had this issue since the upgrade to 6.12. I tried everything. unRAID would lock up every 1-3 days. It eventually resulted in database corruption in my containers. The only thing which solved it is changing the cache drives from BTRFS to ZFS.

You and at least a few dozen others.  I am one of them.

Link to comment

Also having issues with this.

 

I've tried:

  • Backing up my cache data, formatting cache, rebuilding cache - Same issue
  • Downgrading back to 6.11 - Same issue
  • Downgrading, formatting cache as xfs, rebuilding cache - Same issue, still btrfs issues, not sure why..

Currently now upgrading back to 6.12, going to reformat cache as xfs again and rebuild by docker.img

Absolutely ridiculous..

Link to comment
1 minute ago, ghesp said:

Also having issues with this.

 

I've tried:

  • Backing up my cache data, formatting cache, rebuilding cache - Same issue
  • Downgrading back to 6.11 - Same issue
  • Downgrading, formatting cache as xfs, rebuilding cache - Same issue, still btrfs issues, not sure why..

Currently now upgrading back to 6.12, going to reformat cache as xfs again and rebuild by docker.img

Absolutely ridiculous..


Try ZFS. Seems to have fixed this for many of us.

  • Thanks 1
Link to comment
  • 1 month later...

Hi.  Just revisiting this to see if anything has been fixed in this area since last year.  I've been trying to keep an eye on release notes but haven't spotted anything relevant.  As I posted year, my server was rock solid before the upgrade to 6.12.x and then disastrous post-upgrade.  After downgrading it's been fine again, so far 144 days of uptime.  So obviously I'm nervous about updating but don't want to be stuck on an old version forever.  I see mentions of using ZFS, but I'm not quite clear on the migration path.  I can't use ZFS before the upgrade, and I couldn't keep the server up long enough last time to think about moving data around to allow a migration to ZFS post-upgrade. 

 

Thanks

Link to comment
14 hours ago, turnma said:

and I couldn't keep the server up long enough last time to think about moving data around to allow a migration to ZFS post-upgrade. 

Haven't seen any btrfs kernel issues with the latest releases, and that suggests you may have a different problem, enable the syslog server and post that after upgrading to 6.12.8 if it still crashes.

Link to comment
On 3/1/2024 at 7:00 PM, turnma said:

Hi.  Just revisiting this to see if anything has been fixed in this area since last year.  I've been trying to keep an eye on release notes but haven't spotted anything relevant.  As I posted year, my server was rock solid before the upgrade to 6.12.x and then disastrous post-upgrade.  After downgrading it's been fine again, so far 144 days of uptime.  So obviously I'm nervous about updating but don't want to be stuck on an old version forever.  I see mentions of using ZFS, but I'm not quite clear on the migration path.  I can't use ZFS before the upgrade, and I couldn't keep the server up long enough last time to think about moving data around to allow a migration to ZFS post-upgrade. 

 

Thanks

 

Honestly, check spaceinvader videos on upgrading cache pool to zfs. I'm using it right now after having a disastrous event with BTRFS like you, with 2TB + 1TB drives on raid0. Working flawlessly so far and the performance is leap and bounds from BTRFS.

Link to comment
  • 1 month later...

Hi there,

 

One more affected over here.

 

The server has been running like a rock for almost years with months of uptime between maintenance reboots. Currently running 6.12.6

 

Note that I wasn’t using cache ssd for nothing else but caching a smb share for my main PC.

 

I had enough space unused on the cache ssd so I configured my appdata folder, database folder and docker image to be cached (array as secondary) and now it's the third day I see the docker service stopped when I check the server in the morning. Logs are full of brfrs errors.

 

I know docker is off cause I have Home Assistant with some lights automated and those lights are unresponsive in the morning that's what I confirmed when I check the server status.

 

I'm quite sad and frustrated to find these comments here related to the btfrs errors and the fact that switching to ZFS or downgrading is the only solution. And also seeing that most of the moderators replies are suggesting a hardware issue when it's clear that it has something to do within Unraid.

 

Will do the ZFS conversion on the cache ssd and report back but hope it solves the problem. Downgrading would be the last thing I'd do since I don’t want to miss the security patches from those last versions.

 

I have a legacy key and was planning to buy some new keys to support your new company pricing adventure since I was really happy with the software but with things like this…if the issue persists I'm really thinking of switching to another option.

Edited by blinkito
  • Upvote 1
Link to comment
37 minutes ago, blinkito said:

when it's clear that it has something to do within Unraid.

Some cases have been hardware related, but there some that may not be, but note that reformatting the pool btrfs usually also fixes the problem, so no need to change to zfs if you don't want to, and it's not an Unraid issue, at most it can be a kernel issue, possibly detecting some previously undetected corruption.

Link to comment
7 minutes ago, JorgeB said:

but note that reformatting the pool btrfs usually also fixes the problem, so no need to change to zfs if you don't want to


I would like to note that many of us reformatted to BTRFS and had the exact same corruption occur again. I formatted at least six times before someone suggested ZFS. I have since had zero issues on the same hardware, and there are dozens of reports of this same solution working for users. So the explanation that 6.12 revealed some previously undetected corruption is clearly not what’s happening here. It’s also clearly not a hardware issue in the vast majority of cases since users report no issues before 6.12, *and* report no issues once migrated to ZFS.

Edited by Gareth321
  • Upvote 1
Link to comment
29 minutes ago, JorgeB said:

at most it can be a kernel issue, possibly detecting some previously undetected corruption.

 

Like I said before, I was using the cache for an smb share and it's been since I put on the cache the docker.img and appdata that corruption has come. Before I had never experienced data corruption.

 

23 minutes ago, Gareth321 said:


I would like to note that many of us reformatted to BTRFS and had the exact same corruption occur again. I formatted at least six times before someone suggested ZFS. I have since had zero issues on the same hardware, and there are dozens of reports of this same solution working for users.

 

That's why I'm going to take the short way and reformat the cache drive to ZFS. Will do it in a couple of hours and will post back tomorrow morning to see if it's been stable through the night.

Edited by blinkito
Link to comment
28 minutes ago, Gareth321 said:

I formatted at least six times before someone suggested ZFS. I have since had zero issues on the same hardware, and there are dozens of reports of this same solution working for users.

OK, in that case it suggests a kernel issue,  that for some reason only affects some users, or some kernel/hardware combination(s).

  • Upvote 1
Link to comment
2 minutes ago, JorgeB said:

OK, in that case it suggests a kernel issue,  that for some reason only affects some users, or some kernel/hardware combination(s).


Agreed. Despite how many reports there are, we still must comprise only a small minority of users with BTRFS cache. So we have resigned ourselves to the fact that this problem is intermittent and difficult to track down, and will likely not be fixed for our hardware or container combination, or whatever causes it. The good news is we have a solution: ZFS.

  • Like 1
  • Upvote 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.