Cache BTRFS errors

Followers

November 7, 20241 yr

Hello,

I've been running Unraid for a few months and have had no issues. This morning, I received an error from Unraid

/var/log is getting full (currently 82 % used)

When I took a look at the log files, I saw syslog was very large. Viewing the syslog shows lots of BTRFS errors such as these...

Nov  7 11:36:06 NAS2 kernel: BTRFS warning (device nvme0n1p1): lost page write due to IO error on /dev/nvme0n1p1 (-5)
Nov  7 11:36:06 NAS2 kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 27450484, rd 49269, flush 322751, corrupt 0, gen 0
Nov  7 11:36:06 NAS2 kernel: BTRFS error (device nvme0n1p1): error writing primary super block to device 1

After rebooting the server, the array didn't start because it said the nvme0n1p1 was missing.

Cache pool BTRFS missing device(s)

I unplugged the server, re-seated the SSD, rebooted, and the drive appeared again. I ran short and extended SMART self tests which completed without errors. But Docker will no longer start because cache is in read only mode?

Unable to write to cache
Unable to write to Docker Image

Is my nvme0n1p1 drive failing or corrupted? I'm a bit lost on where to go and any help would be greatly appreciated. Thank you!

nas2-diagnostics-20241107-1135.zip Samsung_SSD_990_PRO_with_Heatsink_2TB-20241107-1205-SMART.txt

Edited November 7, 20241 yr by projectsunset

Quote

Solved by JorgeB

November 8, 20241 yr

Go to solution

November 7, 20241 yr

Author

So I've got the array and docker back online by removing the nvme0n1 drive from the raid1 cache.

I still don't have any idea how to diagnose what's wrong with the drive and if it's corrupted or failing?

Should I format it and re-add it to the cache? I'm nervous to re-add it without knowing what the problem is.

Quote

November 8, 20241 yr

Community Expert
Solution

The syslog rotated, so the NVMe was already offline in the diags, power cycle the server and it should come back.

Would be better to see when it dropped, but in some cases this helps:

Some NVMe devices have issues with power states on Linux, try this, on the main GUI page click on the flash drive, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (top right) and add this to your default boot option, after "append initrd=/bzroot"

nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pcie_port_pm=off

e.g.:

append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pcie_port_pm=off

Reboot and see if it makes a difference.

Quote

November 8, 20241 yr

Author

Thank you Jorge. I've added that to the "Unraid OS" Syslinux Config and rebooted my server. I've also added your monitor a btrfs or zfs pool for errors script to my User Scripts to run hourly.

Since the cache has been running from a single drive for the past 17+ hours, should I wipe the nvme0n1p1 drive before re-adding back into the cache pool? Or is it safe to add back as is?

Thank you so much for the help!

Quote

November 8, 20241 yr

Community Expert

Should be OK to add as is, Unraid will wipe during that.

Quote

November 8, 20241 yr

Author

Thanks Jorge. I've re-added the drive to the cache pool and everything is looking good so far.

I'll mark this as solved and keep a closer eye on the logs to see if the problem resurfaces.

Thank you so much for your help, I really appreciate it!

Quote

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Followers

Go to topic listing

Cache BTRFS errors

Featured Replies

Solved by JorgeB

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)