Jump to content

BTRFS Failures in Cache pool?


Go to solution Solved by JorgeB,

Recommended Posts

Posted (edited)

Hi all, thanks for taking the time.
I noticed some errors in a docker this morning and went to reboot it, upon trying to start i was greeted with a;
'Execution error 403'

Since then, all dockers have failed to reboot, and I seem to have read-only access to my cache pool.

The following is visible in the logs:
 

May  7 01:39:34 Tower kernel: BTRFS error (device loop2: state EAL): bdev /dev/loop2 errs: wr 988, rd 0, flush 0, corrupt 2, gen 0
May  7 01:39:39 Tower kernel: loop: Write error at byte offset 3140636672, length 4096.
May  7 01:39:39 Tower kernel: I/O error, dev loop2, sector 6134056 op 0x1:(WRITE) flags 0x100000 phys_seg 4 prio class 2
May  7 01:39:39 Tower kernel: BTRFS error (device loop2: state EAL): bdev /dev/loop2 errs: wr 989, rd 0, flush 0, corrupt 2, gen 0
May  7 01:39:44 Tower kernel: loop: Write error at byte offset 3140636672, length 4096.
May  7 01:39:44 Tower kernel: I/O error, dev loop2, sector 6134056 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 2
May  7 01:39:44 Tower kernel: BTRFS error (device loop2: state EAL): bdev /dev/loop2 errs: wr 990, rd 0, flush 0, corrupt 2, gen 0
May  7 01:39:45 Tower kernel: loop: Write error at byte offset 3140636672, length 4096.
May  7 01:39:45 Tower kernel: I/O error, dev loop2, sector 6134056 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 2
May  7 01:39:45 Tower kernel: BTRFS error (device loop2: state EAL): bdev /dev/loop2 errs: wr 991, rd 0, flush 0, corrupt 2, gen 0


I have seen no other errors or warnings present. 

Running unraid 6.12.4
My main array consists of 3x 20TB exos 2x data,  1x Parity.
The cache pool with issues is 2x 2tb lexar nm790 in a btrfs raid 1.
There is also a secondary cache pool with a single sata ssd, this seems to be fine (its also empty).

All discs have plenty of free space.

My understanding is this is some sort of filesystem failure on the 2x2tb cache pool? How would I go about recovering from here?

I've already started copying all non-replicated from the cache discs, nothing is critical of course. 

Appreciate the help as I'm a bit lost on what the actual issue is or the way forward

Edited by Jellepepe
Link to comment
  • Solution
May  6 22:48:27 Tower kernel: BTRFS error (device nvme0n1p1): block=1336104894464 write time tree block corruption detected
May  6 22:48:27 Tower kernel: BTRFS: error (device nvme0n1p1) in btrfs_commit_transaction:2466: errno=-5 IO failure (Error while writing out transaction)
May  6 22:48:27 Tower kernel: BTRFS info (device nvme0n1p1: state E): forced readonly

 

Pool filesystem went read-only, this can be caused by a hardware issue like bad RAM or a filesystem problem, I would start by running memtest, if nothing is found backup and recreate the pool, using btrfs again or try zfs.

Link to comment
Posted (edited)


I appreciate it, is it ok If I leave this unsolved until this is done, in case I encounter any other issues during the process?

Is there any downsides to switching to ZFS I should keep in mind?

7 minutes ago, JorgeB said:

if nothing is found backup and recreate the pool, using btrfs again or try zfs.

Edited by Jellepepe
Link to comment
10 hours ago, JorgeB said:
May  6 22:48:27 Tower kernel: BTRFS error (device nvme0n1p1): block=1336104894464 write time tree block corruption detected
May  6 22:48:27 Tower kernel: BTRFS: error (device nvme0n1p1) in btrfs_commit_transaction:2466: errno=-5 IO failure (Error while writing out transaction)
May  6 22:48:27 Tower kernel: BTRFS info (device nvme0n1p1: state E): forced readonly

 

Pool filesystem went read-only, this can be caused by a hardware issue like bad RAM or a filesystem problem, I would start by running memtest, if nothing is found backup and recreate the pool, using btrfs again or try zfs.

Quick update: I finished backups and started running memtest, currently 4 passes deep but will leave it running overnight.
Assuming this is doesnt find anything, is there any way to identify what could have caused the corruption or general best practices to keep in mind?
I've been super happy with unraid running almost exactly 6 months without any issues, but this is somewhat worrying as I never had any corruption issues on my old hyper-V setup, wondering if I might be doing something wrong 😅

Link to comment

While this error usually meant a RAM problem, some users see it after upgrading from v6.11 to v6.12, so possibly some kernel issue, for those, sometimes re-formatting the filesystem btrfs solves the issue, for others it doesn't, in that case best to use zfs, but if you also get problems with zfs, then it's likely a hardware issue.

Link to comment
14 hours ago, JorgeB said:

While this error usually meant a RAM problem, some users see it after upgrading from v6.11 to v6.12, so possibly some kernel issue, for those, sometimes re-formatting the filesystem btrfs solves the issue, for others it doesn't, in that case best to use zfs, but if you also get problems with zfs, then it's likely a hardware issue.

Alright, after running memtest overnight with no errors, rebooted unraid and found the btrfs cache pool mounted rw again with 0 errors.
Ran check & scrub with no failures too, same for the docker disk, quite odd.

Either way I reformatted as a zfs pool, that all went without issues.
Ended up updating to 6.12.10 while i was at it too.

I did notice a few traces that seemed to be related to macvlan network thing, not sure why that was enabled since I installed 6.12.4 originally and I read it should default to ipvlan?
Either way I switched to ipvlan now which seems to not have affected anything.

I wonder if a crash/error related to that might have triggered a btrfs failure state even though no actual filesystem issues were present? For now I will keep an eye on things but it seems to be fine with no issues. Marked as solved :)

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...