Jump to content

BTRFS corruption.errs


Recommended Posts

Hi,  

I just finished restoring my apps/containers after my main cache drive decided to go read only - keeping an eye on BTRFS errors, with "btrfs dev stats /mnt/cache", i'm seeing that mere hours after restoration, the corruption errors on both disks in the pool is at '1' already.  

The two disks in the pool has no 'bad' smart results, showing % life remaining around 47-49%, nor any other errors.  

 

Is an error count of '1' anything to be worried about at this point? Or should I start looking at replacing the drives already?

 

Thanks!

Link to comment

When's the last time you did a memtest for at least 1 full cycle? Are you running your RAM beyond the limits of your CPU / board? Memory timings are kind of like speed ratings on tires, you can get 150+ MPH rated tires on a car that can only can muster 120 going downhill with a tailwind. Just because you have 3200 rated memory, the CPU may only be stable at 2666 or lower.

Link to comment

When the memory was installed - the system was running perfectly fine, no issues, for well over 300 days without reboot.

RAM is running within spec of the motherboard/CPU documented limits, 2933Mhz, with a slight overvolt (0.01v) to cater to the 4x DIMM voltage drop - standard stuff.

I am 100% certain it is not the memory - I'm well versed in Ryzen memory requirements/limitations....especially when it comes to the older Zen Architectures (its a 3200g).

 

Anyway - reading around on what BTRFS errors mean, the "corruption" stat doesn't mean hardware problems (99% the time), it means corruption of data.

IO errors are hardware issues 100% of the time, by comparison.

With that in mind, the evidence would point to data issues, instead of hardware ones, despite the docker image data being hours old being a bit strange.

 

Regardless, as extra steps/notes since last night - 

The second cache pool had racked up a extra errors every second (getting up to 8,700), however these stopped once the docker service was shutdown - a little strange.  

 

Following this, I shut the system down, re-seated all the SATA and power connections on the drive and motherboard, blew out the accumulated dust (not much in all fairness), then booted to UEFI, did a long awaited update, and booted into Unraid.

Everything came up fine - no errors anywhere (btrfs status, sys log, btrfs scrub, etc.) other than on the main cache pool, where a few corrupted errors appeared whenever a specific VM did anything, with the scrub confirming just the same 2 errors each time. Checking dmesg's output, i was able to confirm that the specific VMs vhd had become corrupted.

I deleted the affected vhd, restored the VM from an earlier backup, re-scrubbed the file system and got no further errors.  

Completing boot/reconfig of all my containers/VMs and leaving it overnight, no further errors anywhere that I can see.  

 

If it behaves itself between now and the end of the week, I'll likely shutdown completely and run another memtest and offline stress test since i updated the UEFI.

 

Next step following this will be to work out the syntax to send notifications to the notification agents from CLI - i want to have a scheduled script that runs, checks if it can write to the cache drives or not (with not indicating its in 'read only' mode again - the original source of all this), and if so, pinging an alert.

 

I'll report back any further findings regardless, for anyone elses future diagnosis.

Link to comment

Following up - no further issues (touch wood!) have been experianced.

Drive partitions are writing (and reading) without error, and the monthly (1st of each) parity check ran without error too.

 

Next up, sending alerts from CLI... 

Edited by boomam
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...