BTRFS errors on cache pool


Go to solution Solved by JorgeB,

Recommended Posts

Hi,

I woke up this morning to a Fix Common Problems error.

This appears to be identical to this error:

https://forums.unraid.net/topic/94659-solved-btrfs-issues/

 

My cache pool is this:

image.thumb.png.8d090964d975361b0195e360c752b7dc.png

 

I downloaded diagnostics, but it was inside a VM (stupidly), however I've attached the reports from the syslog folder. I was in a hurry to stop the server in case of corruption.

Prior to this, I ran

 btrfs dev stats /mnt/cache_nvme

It was full of errors. I tried a scrub and since rebooting I've run it again. you will note there are still errors but not nearly as much. I'm guessing the cache drives dropped out.

Quote

root@Nexus:~# btrfs dev stats /mnt/cache_nvme
ERROR: cannot check /mnt/cache_nvme: No such file or directory
ERROR: '/mnt/cache_nvme' is not a mounted btrfs device
root@Nexus:~# btrfs dev stats /mnt/cache_nvme
[/dev/nvme1n1p1].write_io_errs    0
[/dev/nvme1n1p1].read_io_errs     0
[/dev/nvme1n1p1].flush_io_errs    0
[/dev/nvme1n1p1].corruption_errs  0
[/dev/nvme1n1p1].generation_errs  4
[/dev/nvme2n1p1].write_io_errs    104169165
[/dev/nvme2n1p1].read_io_errs     31314052
[/dev/nvme2n1p1].flush_io_errs    4340
[/dev/nvme2n1p1].corruption_errs  0
[/dev/nvme2n1p1].generation_errs  0
root@Nexus:~# 

I've attached Diagnostics.zip which was before I stopped the array. After stopping the array, it couldn't find nvme2n1. I did a reboot to clear the logs as the system log was full.

I've attached full diagnostics after rebooting into maintenance mode.

I cannot restart docker or VM service, as they were on these drives.

Currently, status is stuck on trying to unmount shares. I've had to kill a process to get it to shutdown.

 

Could somebody please help? Fortunately, I did a lengthy sync of the important stuff last night to an undefined drive. Im wondering if this caused the issue in part?!

 

Last parity check was on the 02/07/2022, completed without errors. I remain unsure what the next steps are? Should I dd an image of the cache, which is 500gb?

 

I remain concerned about bitrot, corruption, etc. I've left the server shutdown for now.

 

 

 

Diagnostics.zip nexus-diagnostics-20220711-1034.zip

Link to comment
  • Solution

Second

27 minutes ago, Geck0 said:

I've mounted the second drive

Jul 11 10:11:35 Nexus kernel: BTRFS: device fsid 653150f2-f281-4987-bb45-f3e553c14b9c devid 2 transid 534361 /dev/nvme2n1p1 scanned by udevd (1391)
Jul 11 10:11:35 Nexus kernel: BTRFS: device fsid 653150f2-f281-4987-bb45-f3e553c14b9c devid 1 transid 542551 /dev/nvme1n1p1 scanned by udevd (1402)

I assume you mean devid 2 (nvme2n1)? The other device will be out of sync.

 

If yes it's always good to make sure backups are up to date, if you mounted devid2 by itself read/write best forward is to wipe the other one and then re-add to the pool.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.