Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

BTRFS error CSUM

Featured Replies

Hey folks,

 

recently I got an error on one device of my cache pool (2x2TB SSD).

 

Mar 8 14:29:12 Tower kernel: BTRFS warning (device sdi1): checksum error at logical 35017768960 on dev /dev/sdi1, physical 18878087168: metadata leaf (level 0) in tree 5

 

So i startet a scrub and checked the box to correct errors when possible. The output in the logs is this:

 

Mar 8 14:29:12 Tower kernel: BTRFS error (device sdi1): bdev /dev/sdi1 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
Mar 8 14:29:13 Tower kernel: BTRFS error (device sdi1): unable to fixup (regular) error at logical 35017768960 on dev /dev/sdh1
Mar 8 14:29:13 Tower kernel: BTRFS error (device sdi1): unable to fixup (regular) error at logical 35017768960 on dev /dev/sdi1

 

 

So, since I have a cache pool of two identical devices, why do I have a uncorrectable error? Is there any way to solve this?

 

Thank you in advance!

 

P.s. Dont know if its from importancy, but I deleted orphaned docker images the day before i noticed it... And second, since I have a Raid1 Cache Pool with two devices, why cant the error not repaired?

csum.png

poolstats.png

Edited by hundsboog
addition

  • Community Expert

Checksum errors mean btrfs is detecting data corruption, make sure you're RAM is not overclocked since is a known source of data corruption with Ryzen/Threadripper and/or run memtest.

  • Author

@JorgeB, the page you linked was my source for my light night work yesterday. I have some really sporadically lockups, maybe once a month. This is caused around the Threadripper/RAM/C6 State problem. I really couldnt figure out a pattern when it crashes but its always around little more than idle. Yesterday I did another approach to get rid of it.

So, what I did was to bring the RAM manually back to the Mhz your table supposed, set the power to "typical" and *enabled" the deep sleep feature which could possibly lead to those crashes. I found this comment on reddit:

 

Quote

Enabling deep sleep in bios, seems to have resulted in idle/sleep functioning properly under linux kernels 4.14 and 4.17 without having to disable any of the C6, S3, and Power Supply Idle functions in kernel or bios.

 

The C6 states disabling should than be not necessary. By now, the server runs like expected but Im syslogging to the array anyways, because the lockup can happen anytime withing that month...

 

So, I think like you supposed this is caused by some RAM error. All RAM stick are working, I memtested it without any errors. Is there a way to get rid of the BTFRS errors then? Move all data to the array, erase it and put it back? Do I have to do it, although it is a cache pool with two disks in Raid1? Would the corrupted data also be copied or is there a way to figure out which file is broke to delete it in advance?

 

Thank you for you patience!!

  • Community Expert
2 minutes ago, hundsboog said:

The C6 states disabling should than be not necessary.

Correct, like mentioned in the link, just set the correct power supply idle control, only if that option doesn't exist you should disable c-states.

 

 

6 minutes ago, hundsboog said:

Is there a way to get rid of the BTFRS errors then?

After a scrub the corrupt files(s) will be listed in the syslog, delete/replace from backups.

 

  • Author
6 minutes ago, JorgeB said:

Correct, like mentioned in the link, just set the correct power supply idle control, only if that option doesn't exist you should disable c-states.

 

 

After a scrub the corrupt files(s) will be listed in the syslog, delete/replace from backups.

 

 

 

So this should be the culprits?

 

Mar  8 14:54:28 Tower kernel: BTRFS warning (device sdi1): checksum error at logical 908941897728 on dev /dev/sdi1, physical 752108482560, root 5, inode 6129850, offset 3596288, length 4096, links 1 (path: Nextcloud/appdata_ocies0pc9lnb/preview/b/b/1/7/d/1/6/109600/3398-3398-max.png)
Mar  8 14:54:28 Tower kernel: BTRFS error (device sdi1): bdev /dev/sdi1 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
Mar  8 14:54:28 Tower kernel: BTRFS error (device sdi1): unable to fixup (regular) error at logical 908941897728 on dev /dev/sdi1
Mar  8 14:54:28 Tower kernel: BTRFS warning (device sdi1): checksum error at logical 908941897728 on dev /dev/sdh1, physical 752087511040, root 5, inode 6129850, offset 3596288, length 4096, links 1 (path: Nextcloud/appdata_ocies0pc9lnb/preview/b/b/1/7/d/1/6/109600/3398-3398-max.png)
Mar  8 14:54:28 Tower kernel: BTRFS error (device sdi1): bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
Mar  8 14:54:28 Tower kernel: BTRFS error (device sdi1): unable to fixup (regular) error at logical 908941897728 on dev /dev/sdh1

 

@JorgeB thank you very much! It was a very awesome lesson to me and I learned a lot! Hopefully this thread will give other people also advice how to fix BTFRS errors!! Thank you so much!

 

And I will report back, if the server runs now stabel with the config I did in the BIOS. Finger crossed, this was the the screw I had to fix to get the Mofo stable.... 😄

  • Community Expert
7 minutes ago, hundsboog said:

So this should be the culprits?

Yes.

  • Author

Ok, I deleted those two preview files and started scrub again, which was doing its thing. The error belonging to those were completely gone. 

 

This is now the final result, where I need still some assistance:

 

Mar  8 16:41:31 Tower ool www[877]: /usr/local/emhttp/plugins/dynamix/scripts/btrfs_scrub 'start' '/mnt/cache' ''
Mar  8 16:41:31 Tower kernel: BTRFS info (device sdi1): scrub: started on devid 1
Mar  8 16:41:31 Tower kernel: BTRFS info (device sdi1): scrub: started on devid 2
Mar  8 16:42:09 Tower kernel: BTRFS warning (device sdi1): checksum error at logical 35017768960 on dev /dev/sdh1, physical 18857115648: metadata leaf (level 0) in tree 5
Mar  8 16:42:09 Tower kernel: BTRFS warning (device sdi1): checksum error at logical 35017768960 on dev /dev/sdh1, physical 18857115648: metadata leaf (level 0) in tree 5
Mar  8 16:42:09 Tower kernel: BTRFS error (device sdi1): bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0
Mar  8 16:42:09 Tower kernel: BTRFS error (device sdi1): unable to fixup (regular) error at logical 35017768960 on dev /dev/sdh1
Mar  8 16:42:09 Tower kernel: BTRFS warning (device sdi1): checksum error at logical 35017768960 on dev /dev/sdi1, physical 18878087168: metadata leaf (level 0) in tree 5
Mar  8 16:42:09 Tower kernel: BTRFS warning (device sdi1): checksum error at logical 35017768960 on dev /dev/sdi1, physical 18878087168: metadata leaf (level 0) in tree 5
Mar  8 16:42:09 Tower kernel: BTRFS error (device sdi1): bdev /dev/sdi1 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0
Mar  8 16:42:09 Tower kernel: BTRFS error (device sdi1): unable to fixup (regular) error at logical 35017768960 on dev /dev/sdi1

 

  • Community Expert

That's metadata corruption, for this it's best to backup and re-format the pool.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.