Cache read only - BTRFS corrupted?


eribob

Recommended Posts

My cache array went to "read only file system" today and the following message is repeated many times in the system log: 

Nov 14 21:18:16 Monsterservern kernel: blk_update_request: I/O error, dev loop2, sector 6665664 op 0x1:(WRITE) flags 0x100000 phys_seg 4 prio class 0
Nov 14 21:18:16 Monsterservern kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 136, rd 0, flush 0, corrupt 0, gen 0

Is the BTRFS corrupted? Why did it happen? How can I fix it?

 

The exact same thing actually happened to my other BTRFS pool today as well. I restarted the server and it works normally again, but after running a BTRFS scrub I get "uncorrectable errors" in that pool: 

UUID:             109edb7d-32a7-4c8c-9dfd-d8901216e5e1
Scrub started:    Sat Nov 14 09:47:37 2020
Status:           finished
Duration:         0:03:25
Total to scrub:   786.37GiB
Rate:             3.83GiB/s
Error summary:    csum=6
  Corrected:      0
  Uncorrectable:  6
  Unverified:     0

Attached diagnostics.

cache_filesystem_corrupted_201114.zip

Link to comment

Wow that is great help! I was wondering why these issues were building up. So since I run 4 RAM sticks I should limit them to 2667? I guess both pools need to be reformatted then. Is it worth trying to do a "btrfs check --repair" first? It seems that it can corrupt your pool, but I have nothing to loose if I am about to wipe it anyway? In that case, can you give me an example of how to run such a command? 

 

Also, what is the easiest way to format the cache pool? 

 

Thanks! 

Erik

Link to comment
15 minutes ago, eribob said:

So since I run 4 RAM sticks I should limit them to 2667?

Yes.

 

15 minutes ago, eribob said:

Is it worth trying to do a "btrfs check --repair" first?

Unlikely to help and it can't fix the data corruption, best to just re-format.

 

16 minutes ago, eribob said:

Also, what is the easiest way to format the cache pool? 

With the array stopped wipe the SSDs with:

blkdiscard /dev/sdX

Then start the array and format the pool

Link to comment

its interesting that my Cache recently got into read-only as well. just found today. O.o

now i need to copy all appdata, and reformat ?

Nov 18 20:40:54 Tower kernel: loop: Write error at byte offset 2887852032, length 4096.
Nov 18 20:40:54 Tower kernel: print_req_error: I/O error, dev loop2, sector 5640336
Nov 18 20:40:54 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 195, rd 0, flush 0, corrupt 0, gen 0
Nov 18 20:40:59 Tower kernel: loop: Write error at byte offset 3727376384, length 4096.
Nov 18 20:40:59 Tower kernel: print_req_error: I/O error, dev loop2, sector 7280032
Nov 18 20:40:59 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 196, rd 0, flush 0, corrupt 0, gen 0
Nov 18 20:41:05 Tower kernel: loop: Write error at byte offset 2887852032, length 4096.
Nov 18 20:41:05 Tower kernel: print_req_error: I/O error, dev loop2, sector 5640336
Nov 18 20:41:05 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 197, rd 0, flush 0, corrupt 0, gen 0
Nov 18 20:41:05 Tower kernel: loop: Write error at byte offset 3727376384, length 4096.
Nov 18 20:41:05 Tower kernel: print_req_error: I/O error, dev loop2, sector 7280032
Nov 18 20:41:05 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 198, rd 0, flush 0, corrupt 0, gen 0

 

Edited by tokra
Link to comment

Hi,

The solution worked for a couple of days, but just now one of my BTRFS pools again went into read only mode. I changed my RAM to 2133MHz (the "auto" setting in BIOS). 

 

The system log says the following: 

Nov 22 19:44:56 Monsterservern kernel: BTRFS error (device nvme0n1p1): block=1141445836800 write time tree block corruption detected
Nov 22 19:44:56 Monsterservern kernel: BTRFS: error (device nvme0n1p1) in btrfs_commit_transaction:2323: errno=-5 IO failure (Error while writing out transaction)
Nov 22 19:44:56 Monsterservern kernel: BTRFS info (device nvme0n1p1): forced readonly
Nov 22 19:44:56 Monsterservern kernel: BTRFS warning (device nvme0n1p1): Skipping commit of aborted transaction.
Nov 22 19:44:56 Monsterservern kernel: BTRFS: error (device nvme0n1p1) in cleanup_transaction:1894: errno=-5 IO failure

 

Diagnostics are attached.

 

What is the problem? It is really annoying now...

 

/Erik

monsterservern-diagnostics-20201122-1953.zip

Link to comment

Update! 

I ran a Memtest and after about 15 minutes I got a lot of errors. So I removed my two oldest RAM-sticks and re-ran the test for about 25 minutes without error. I know that is a bit short (not even one pass hehe) but I figured that since I got the errors so soon the first time I would get them again if the remaining RAM-sticks were the faulty ones. 

 

So it was probably a memory issue? I just hope that I will not get any more corruption in my BTRFS now... fingers crossed. 

 

I also ran "btrfs check --readonly /dev/nvme0n1p1" and  "btrfs check --readonly /dev/nvme1n1p1" (the two disks that are part of the BTRFS pool in question) and got no errors. Can I then assume that my BTRFS filesystem is intact for that pool? 

 

BIG thanks!

 

/Erik

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.