[SOLVED] 2 1TB ssd cache pool (btrfs) - uncorrectable csum errors?


Recommended Posts

My questions are how? or why?  I forked $130 for an extra 1TB ssd to create a cache pool (raid1) so I could have redundancy and not worry about data corruption on my cache (where I have my VM images,) but seems like I wasted my money?  I never had issues with my cache drives (ssd) until I went cache pool.  I checked it with scrub and I got this:

 

Jul 10 01:13:38 MMPC1 ool www[25457]: /usr/local/emhttp/plugins/dynamix/scripts/btrfs_scrub 'start' '/mnt/cache' ''
Jul 10 01:13:38 MMPC1 kernel: BTRFS info (device sdl1): scrub: started on devid 1
Jul 10 01:13:38 MMPC1 kernel: BTRFS info (device sdl1): scrub: started on devid 2
Jul 10 01:28:38 MMPC1 kernel: BTRFS warning (device sdl1): checksum error at logical 5525486284800 on dev /dev/sdm1, physical 197491273728, root 5, inode 279, offset 31180976128, length 4096, links 1 (path: custom/unraid/Windows10.img)
Jul 10 01:28:38 MMPC1 kernel: BTRFS error (device sdl1): bdev /dev/sdm1 errs: wr 0, rd 0, flush 0, corrupt 22, gen 0
Jul 10 01:28:38 MMPC1 kernel: BTRFS error (device sdl1): unable to fixup (regular) error at logical 5525486284800 on dev /dev/sdm1
Jul 10 01:28:41 MMPC1 kernel: BTRFS warning (device sdl1): checksum error at logical 5523090034688 on dev /dev/sdm1, physical 198316249088, root 5, inode 279, offset 63094951936, length 4096, links 1 (path: custom/unraid/Windows10.img)
Jul 10 01:28:41 MMPC1 kernel: BTRFS error (device sdl1): bdev /dev/sdm1 errs: wr 0, rd 0, flush 0, corrupt 23, gen 0
Jul 10 01:28:41 MMPC1 kernel: BTRFS error (device sdl1): unable to fixup (regular) error at logical 5523090034688 on dev /dev/sdm1
Jul 10 01:28:53 MMPC1 kernel: BTRFS warning (device sdl1): checksum error at logical 5525486284800 on dev /dev/sdl1, physical 197512245248, root 5, inode 279, offset 31180976128, length 4096, links 1 (path: custom/unraid/Windows10.img)
Jul 10 01:28:53 MMPC1 kernel: BTRFS error (device sdl1): bdev /dev/sdl1 errs: wr 0, rd 0, flush 0, corrupt 32, gen 0
Jul 10 01:28:53 MMPC1 kernel: BTRFS error (device sdl1): unable to fixup (regular) error at logical 5525486284800 on dev /dev/sdl1
Jul 10 01:28:57 MMPC1 kernel: BTRFS warning (device sdl1): checksum error at logical 5523090034688 on dev /dev/sdl1, physical 198337220608, root 5, inode 279, offset 63094951936, length 4096, links 1 (path: custom/unraid/Windows10.img)
Jul 10 01:28:57 MMPC1 kernel: BTRFS error (device sdl1): bdev /dev/sdl1 errs: wr 0, rd 0, flush 0, corrupt 33, gen 0
Jul 10 01:28:57 MMPC1 kernel: BTRFS error (device sdl1): unable to fixup (regular) error at logical 5523090034688 on dev /dev/sdl1
Jul 10 01:35:10 MMPC1 kernel: BTRFS warning (device sdl1): checksum error at logical 5690545643520 on dev /dev/sdm1, physical 280946253824, root 5, inode 279, offset 37611438080, length 4096, links 1 (path: custom/unraid/Windows10.img)
Jul 10 01:35:10 MMPC1 kernel: BTRFS error (device sdl1): bdev /dev/sdm1 errs: wr 0, rd 0, flush 0, corrupt 24, gen 0
Jul 10 01:35:11 MMPC1 kernel: BTRFS error (device sdl1): unable to fixup (regular) error at logical 5690545643520 on dev /dev/sdm1
Jul 10 01:35:20 MMPC1 kernel: BTRFS warning (device sdl1): checksum error at logical 5690545643520 on dev /dev/sdl1, physical 280967225344, root 5, inode 279, offset 37611438080, length 4096, links 1 (path: custom/unraid/Windows10.img)
Jul 10 01:35:20 MMPC1 kernel: BTRFS error (device sdl1): bdev /dev/sdl1 errs: wr 0, rd 0, flush 0, corrupt 34, gen 0
Jul 10 01:35:20 MMPC1 kernel: BTRFS error (device sdl1): unable to fixup (regular) error at logical 5690545643520 on dev /dev/sdl1
Jul 10 01:40:21 MMPC1 kernel: BTRFS info (device sdl1): scrub: finished on devid 2 with status: 0
Jul 10 01:40:32 MMPC1 kernel: BTRFS info (device sdl1): scrub: finished on devid 1 with status: 0

 

GRRR!  This is soooo annoying!!!!

 

Looks like errors are on BOTH drives, shows as 6 errors, but 3 on sdl1 and same 3 on sdm1.  Is there ANYTHING I can do or can I run another setup that's more bullet proof???

Edited by EArroyo
Link to comment

No errors on parity check, only these 3 checksum errors and all 3 on both SSDs.  I did have some power spikes during this stupid storm that just passed (live @ Florida) and also found out my UPS needs battery replacement (go figure, everything going to crap all at once) so I ran a drive check on Windows 10 (VM) and found no errors and VM works fine.  Is there any way I can copy without getting the input/output error but not messing up image integrity so I can reformat the cache pool?  As last resort, can I mount a new image on VM and clone the main drive to the new image?

Link to comment
1 hour ago, EArroyo said:

Is there any way I can copy without getting the input/output error but not messing up image integrity so I can reformat the cache pool?

You can bypass the i/o error by using btrfs restore or copying from an user share to another, but it will still be corrupt.

Link to comment
7 hours ago, JorgeB said:

You can bypass the i/o error by using btrfs restore or copying from an user share to another, but it will still be corrupt.

 

Well, let's say the corruption are in sections of the image file that has no "data" the VM uses.  Would it be safe to do then?  Is there any qcow2 utility to repair a corrupted image and not mark it unusable (if that's the case?)

 

EDIT: 

 

I'm about to try to add a 2nd image drive to my VM and try to clone it to the new image file, maybe that will save my VM (because even though it works, I don't want to hit a wall)

 

I'm also very disappointed that btrfs cache pool raid1 is not as safe and I feel it might just be a waste of time and money if same scenario that corrupted it will corrupt a non btrfs cache pool raid1 the same way, I may as well just spend the money and get twice as much space...  🤔

 

 

Edited by EArroyo
Link to comment

Well, just an update....  This looks very promising.  Downloaded AOMEI Backupper Standard (free) and it's cloning from one image to the other.  Hopefully the result boots when I set it as main drive and possibly smaller in size?

 

aomei_backuper_standard_cloning.jpg.c9220c65933ce1b0d7cbd96e0ad40223.jpg

w10_images.jpg.2b67566c4cc6711c73468bb76a3f8585.jpg

Edited by EArroyo
Link to comment
  • EArroyo changed the title to [SOLVED] 2 1TB ssd cache pool (btrfs) - uncorrectable csum errors?

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.