Jump to content

BTRFS error: corruption


Simom
Go to solution Solved by Simom,

Recommended Posts

Hello!

 

I am running a RAID1 SATA SSD Cache pool and I am getting some BTRFS errors:

Jan 29 23:57:12 Turing kernel: BTRFS warning (device sdd1): csum failed root 5 ino 3599 off 2499960832 csum 0x60341ddd expected csum 0x88e58ce3 mirror 2
Jan 29 23:57:12 Turing kernel: BTRFS error (device sdd1): bdev /dev/sdd1 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
Jan 29 23:57:12 Turing kernel: BTRFS warning (device sdd1): csum failed root 5 ino 3599 off 2499964928 csum 0x1470dccc expected csum 0x8188ffff mirror 2
Jan 29 23:57:12 Turing kernel: BTRFS error (device sdd1): bdev /dev/sdd1 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
Jan 29 23:57:12 Turing kernel: BTRFS warning (device sdd1): csum failed root 5 ino 3599 off 2499960832 csum 0x60341ddd expected csum 0x88e58ce3 mirror 1
Jan 29 23:57:12 Turing kernel: BTRFS error (device sdd1): bdev /dev/sde1 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
Jan 29 23:57:12 Turing kernel: BTRFS warning (device sdd1): csum failed root 5 ino 3599 off 2499964928 csum 0x1470dccc expected csum 0x8188ffff mirror 1
Jan 29 23:57:12 Turing kernel: BTRFS error (device sdd1): bdev /dev/sde1 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
Jan 29 23:57:12 Turing kernel: BTRFS warning (device sdd1): csum failed root 5 ino 3599 off 2499960832 csum 0x60341ddd expected csum 0x88e58ce3 mirror 2
Jan 29 23:57:12 Turing kernel: BTRFS error (device sdd1): bdev /dev/sdd1 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0
Jan 29 23:57:12 Turing kernel: BTRFS warning (device sdd1): csum failed root 5 ino 3599 off 2499964928 csum 0x1470dccc expected csum 0x8188ffff mirror 2
Jan 29 23:57:12 Turing kernel: BTRFS error (device sdd1): bdev /dev/sdd1 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0

 

As I had some issues with this pool (and another one) some time ago with similiar looking logs, I decided to fromat both drives about 2 days ago. I recreated the cache pool and moved the data back and not even after 48h I got the error above.

 

I would be thankful if someone has an idea or can point me into a direction!
(Diagnostics are attached)

 

turing-diagnostics-20230130-0205.zip

Link to comment

Hey, thanks for the response. You are kinda right. but that error is from another cache pool (my nvme one) (I am currently working on clearing the pool you mentioned, so I can scrub and format it).

As far as I understand the error you mentioned shouldn't be connected to the one I posted about because the devices are part of different pools or am I missing something?

Link to comment
31 minutes ago, JorgeB said:

unexpected csum errors can be the result of RAM issues

 

23 minutes ago, Simom said:

swapped CPU, MoBo and RAM

In which case, the errors could have been the result of previous bad RAM.

 

In any case,

32 minutes ago, JorgeB said:

suggest running memtest

 

Link to comment
18 hours ago, trurl said:

In which case, the errors could have been the result of previous bad RAM.

I think I didn't clearly state my previous trouble shooting steps and this is leading me to some confussion. So all in order:

  • I had some csum erros like the one in my first comment
  • I swapped systems with new CPU, MoBo and RAM
  • unassigned both drives, formatted them and created a new pool
  • not even 48h later I get a new csum error

As I created a new pool, this still might be a problem with my current RAM, but not with my old one, or am I missing something?

 

Memtest is running, nothing found this far. Any advice on how long I should leave this running (I read 24-48 hours somewhere)?

Link to comment
  • 11 months later...
On 2/4/2023 at 2:16 AM, Simom said:

Thanks for the quick response! I will try that and see how it goes.

Hey i am having similar issues with raid 1 config, how do you manage to move the data from the cache to array and move it back again? Since i am planning to re-format the nvme disks

Link to comment
  • Solution

Just realized, that I never followed up to this:
I switched from macvlan to ipvlan for my docker containers and that seems to have fixed it. No crashing, no corrupting since that.

(I guess the macvlan stuff lead to kernel panics, that lead to the corruption of the files; but I am no expert).

 

p.s.

I also read that there have been changes to macvlan.

Link to comment
3 hours ago, trurl said:

Nothing can move open files. Disable Docker and  VM Manager in Settings. Dynamix File Manager will let you work directly with the disks and pools on the server.

So after i disable docker and vm, i can move the file from my cache to array? And after formatting the cache i can move the file back and use the vm and docker without additional settings?

Link to comment

In theory yes, you should also make sure that no one else is accessing the files over smb, nfs or whatever before starting to move the files. 
But I would highly advise to check if the files system runs as expected before moving important data back to the cache. 

Link to comment
8 hours ago, Simom said:

In theory yes, you should also make sure that no one else is accessing the files over smb, nfs or whatever before starting to move the files. 
But I would highly advise to check if the files system runs as expected before moving important data back to the cache. 

Ok, also do i need to re-direct the vm vdisk directory and other things for the vm? Or i can just hit play button after

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...