BTRFS error and Read-only cache since updating to 6.12

Fl4v1en · June 25, 2023

Hello all.

Used Unraid since nearly 2 years, it was working quite well. No crash.

Since the release of 6.12 and 6.12.1, my cache randomly crash with BTRFS error and switch to read-only.

It's always the same, it starts with

Jun 25 14:15:33 Becky kernel: BTRFS error (device nvme0n1p1): block=762904576 write time tree block corruption detected
Jun 25 14:15:33 Becky kernel: BTRFS: error (device nvme0n1p1) in btrfs_commit_transaction:2460: errno=-5 IO failure (Error while writing out transaction)
Jun 25 14:15:33 Becky kernel: BTRFS info (device nvme0n1p1: state E): forced readonly
Jun 25 14:15:33 Becky kernel: BTRFS warning (device nvme0n1p1: state E): Skipping commit of aborted transaction.
Jun 25 14:15:33 Becky kernel: BTRFS: error (device nvme0n1p1: state EA) in cleanup_transaction:1958: errno=-5 IO failure

No need to reboot hardware, just stopping the array and restarting switch my cache back online.

Did 2 different memtest (Passmark's and Memtest86+), the hardware is stable. Temperatures are okay, CPU never go above 75°C. Even switched SSD, from a SATA to an NVMe. Same errors.

Do anyone has an idea ? Im a really thinking to switch my cache from BTRFS to xfs.

becky-diagnostics-20230625-1604.zip

Edited June 25, 2023 by Fl4v1en

Fl4v1en · June 25, 2023

For now, I tried switching my dockers from Btrfs vdisk to Directory.

JorgeB · June 26, 2023

17 hours ago, Fl4v1en said:
write time tree block corruption detected

This usually means bad RAM or other kernel memory corruption, you could try redoing the pool or using zfs to see if it's more stable.

mhyclak · July 17, 2023

I am having the same symptoms after upgrading from 6.11.5 to 6.12.2. btrfs mirror of 2 NVMe drives. I reformatted it once after the upgrade to 6.12.2 and it triggered again yesterday sometime. Docker and VMs are running on a separate btrfs mirror (/mnt/user/virtualization). /mnt/user/cache is usually what I've noticed go read-only - most of what's going through that is system backups (Time Machine, AOMEI Backupper and Windows Backup). I attached diags in case there's something similar. These issues only started after the upgrade, no other changes to hardware have been made.

phoenix-diagnostics-20230717-0759.zip

JorgeB · July 17, 2023

2 hours ago, mhyclak said:

I am having the same symptoms

Btrfs is detecting checksum errors so fist thing is to run memtest.

Arron · July 25, 2023

I too am having this problem. System was running fine until the upgrade to 6.12 and still happening after 6.12.3. I have two nvme pools; cache and VMs. Didn't have any BTRFS errors with the VM's NVME drive so I swapped the two and now the newly swapped VMs drive for the cache drive is receiving the same BTRFS errors. The old cache nvme drive now being used for my VMs hasnt had a single BTRFS error since the swap. I've successfully cleared the errors by doing a scrub and finding the corrupt data but anytime a new file is downloaded from nzbget i get same BTRFS error and have to locate the corrupt data to clear the errors. This is a daily thing now. If I dont clear the errors daily, eventually the corrupt data get to be too much and docker containers begin to fail and I have to move all data on cache pool onto the array, reformat drive, and move data back onto cache pool. Pretty frustrating to say the least.

Jul 25 02:05:12 Unraid kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme1n1p1 errs: wr 0, rd 0, flush 0, corrupt 81, gen 0
Jul 25 02:05:12 Unraid kernel: BTRFS warning (device nvme1n1p1): csum failed root 5 ino 855986 off 24414867456 csum 0x329e22d6 expected csum 0x43aeae62 mirror 1
Jul 25 02:05:12 Unraid kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme1n1p1 errs: wr 0, rd 0, flush 0, corrupt 82, gen 0
Jul 25 02:05:12 Unraid kernel: BTRFS warning (device nvme1n1p1): csum failed root 5 ino 855986 off 24414867456 csum 0x329e22d6 expected csum 0x43aeae62 mirror 1

Edit: Added Diag files

unraid-diagnostics-20230731-0722.zip

Edited July 31, 2023 by Arron

DOM_EU · July 25, 2023

On 6/25/2023 at 4:21 PM, Fl4v1en said:
Hello all.

Used Unraid since nearly 2 years, it was working quite well. No crash.

Since the release of 6.12 and 6.12.1, my cache randomly crash with BTRFS error and switch to read-only.

It's always the same, it starts with
Jun 25 14:15:33 Becky kernel: BTRFS error (device nvme0n1p1): block=762904576 write time tree block corruption detected
Jun 25 14:15:33 Becky kernel: BTRFS: error (device nvme0n1p1) in btrfs_commit_transaction:2460: errno=-5 IO failure (Error while writing out transaction)
Jun 25 14:15:33 Becky kernel: BTRFS info (device nvme0n1p1: state E): forced readonly
Jun 25 14:15:33 Becky kernel: BTRFS warning (device nvme0n1p1: state E): Skipping commit of aborted transaction.
Jun 25 14:15:33 Becky kernel: BTRFS: error (device nvme0n1p1: state EA) in cleanup_transaction:1958: errno=-5 IO failure
No need to reboot hardware, just stopping the array and restarting switch my cache back online.

Did 2 different memtest (Passmark's and Memtest86+), the hardware is stable. Temperatures are okay, CPU never go above 75°C. Even switched SSD, from a SATA to an NVMe. Same errors.

Do anyone has an idea ? Im a really thinking to switch my cache from BTRFS to xfs.

becky-diagnostics-20230625-1604.zip 106.47 kB · 0 downloads

I have exactly the same problem, with the same error pattern.
The error exists since the upgrade from v6.11.5 to v6.12.3

I never had any problems with version 6.11.5.

I'll try it now also once with the change from Btrfs vdisk to Directory

JorgeB · July 31, 2023

Please post the diagnostics.

turnma · July 31, 2023

I'm getting the same since upgrading from 6.11.something to 6.12.3 - cache disk going read only after a period of time. Server had been rock-solid and running without issue or reboot for nearly 3 months before the upgrade. Now it's died with 5 or 6 separate issues over the last couple of days. I'm currently (unrelated, it seems) rebuilding the array disk and the cache has gone read only again. Once the array rebuilds then I'll have to sort the cache again, but the server is unusable in this state because it means that all the Docker containers go offline every day or so. Diagnostics attached, thanks.

tower-diagnostics-20230731-2309.zip

JorgeB · August 1, 2023

9 hours ago, turnma said:

cache disk going read only after a period of time.

You can downgrade back to v6.11.5 to see if the issue stops, in case it's a kernel/btrfs bug, if it remains run memtest.

turnma · August 1, 2023

Thanks. Downgraded now, so I’ll see pretty quickly if it helps. It’s been a steady stream of issues since the upgrade, so if I get through 24 hours without issues then that will be a positive sign.

TimTaylor · August 2, 2023

Same problem with 6.12.2 and 6.12.3 ... my nas is now broken cause i cant get back to 6.11

JorgeB · August 2, 2023

14 minutes ago, TimTaylor said:

my nas is now broken cause i cant get back to 6.11

why not?

TimTaylor · August 2, 2023

Actually i did revert do 6.11.5, now dockers are not starting anymore

Quote

Execution error

Server error

Im getting crazy with this

JorgeB · August 2, 2023

25 minutes ago, TimTaylor said:

now dockers are not starting anymore

See the release notes, there's a procedure you must do after downgrading.

TimTaylor · August 2, 2023

I dont get it. The only thing i see there is:

Quote

If you revert back from 6.12 to 6.11.5 or earlier, you have to force update all your Docker containers and start them manually after downgrading. This is necessary because of the underlying change to cgroup v2 starting with 6.12.0-rc1.

If i do this:

Quote

TOTAL DATA PULLED: 0 B

Removing container: OnlyOfficeDocumentServer

Error: Server error

Command executiondocker create
  --name='OnlyOfficeDocumentServer'
  --net='br0'
  --ip='192.168.50.252'
  -e TZ="Europe/Berlin"
  -e HOST_OS="Unraid"
  -e HOST_HOSTNAME="NAS"
  -e HOST_CONTAINERNAME="OnlyOfficeDocumentServer"
  -e 'TCP_PORT_80'='80'
  -e 'TCP_PORT_443'='443'
  -e 'JWT_SECRET'=''
  -l net.unraid.docker.managed=dockerman
  -l net.unraid.docker.webui='http://[IP]:[PORT:80]'
  -l net.unraid.docker.icon='https://raw.githubusercontent.com/SiwatINC/unraid-ca-repository/master/icons/onlyoffice.png'
  -v '/mnt/user/appdata/onlyofficeds/logs':'/var/log/onlyoffice':'rw'
  -v '/mnt/user/appdata/onlyofficeds/Data':'/var/www/onlyoffice/Data':'rw'
  -v '/mnt/user/appdata/onlyofficeds/fonts':'/usr/share/fonts':'rw' 'onlyoffice/documentserver'

Error response from daemon: Conflict. The container name "/OnlyOfficeDocumentServer" is already in use by container "18f2c1ef672d6222dfe1362d0ea4702ad39448b70d8789f1554cc11c011d19c8". You have to remove (or rename) that container to be able to reuse that name.

The command failed.

Edited August 2, 2023 by TimTaylor

mhyclak · August 3, 2023

On 7/17/2023 at 10:33 AM, JorgeB said:

Btrfs is detecting checksum errors so fist thing is to run memtest.

Reverting to 6.11.5 has resolved the issues for me, so I suspect it must be something with the kernel or btrfs versions in 6.12.

JorgeB · August 3, 2023

It's a possibility since there have been more cases than usual, though for me it's been working fine.

local.bin · August 3, 2023

I have the same issue since upgrading to 6.12.

I have removed the dockers, deleted docker img file and recreated all the dockers, in the hope it solves the problem.

At least its running through as the other 6.12 server is now borked.

turnma · August 13, 2023

Just wanted to report back that since I downgraded (on the 1st) it's been rock solid again. Slightly nervous now about what this means for the future, because obviously this means that upgrades are effectively out of the question until the bugs are fixed.

JorgeB · August 14, 2023

If it's a kernel issue, and it may be but still a corner case, since most users are not affected, it should be fixed in an upcoming release, try v6.13 once available which will include a much newer kernel.

Shadowfita · August 26, 2023

On 8/3/2023 at 11:03 PM, mhyclak said:

Reverting to 6.11.5 has resolved the issues for me, so I suspect it must be something with the kernel or btrfs versions in 6.12.

I've made an account just to reply and say thank you very much to those suggesting the 6.11.5 rollback, as it has fixed my issues. I was experiencing the same btrfs errors discussed in this thread.

wayner · September 5, 2023

I will just add a ME TOO to this thread. My system was rock solid under 6.11.5 and since I upgraded a few days ago to 6.12.4 I am having btrfs issues. I will try rolling back to 6.11.5.

krh1009 · September 15, 2023

I'm adding a ME TOO. Same problem running 6.12.1

ramius87 · December 6, 2023

I did not know about this issue, but I am now experiencing the same after upgrading from 6.11.1 to 6.12.4. I will be converting my cache pool to zfs and reporting the results.

BTRFS error and Read-only cache since updating to 6.12

Recommended Posts

Link to comment

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

turnma

Gareth321

JorgeB

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation