Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Cache error - best way forward to preserve data?

Featured Replies

Hi all,

On Sunday morning I awoke to see notifications on my phone saying:

Warning: crc error count is 1

Warning: crc error count is 2

Warning: crc error count is 5

(there have been none since this initial escalation)

And my BTRFS script was also producing "ERRORS on cache pool" (I've since disabled its hourly schedule).

I wasn't able to attend to the problem until late this morning (Monday), and the system log has filled up in that time :/

Now when I go into SMART, my 2nd Cache disk has the following message:

A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.

There's also no temperature showing on the front page for that disk (the other cache disk does show temp).

My drives are well beyond their MTBF (warranty of 75TB written, and I've written ~130TB), so I assumed it's them dying and have ordered 2x 1TB replacements, arriving today.

To replace them, I am planning on following this guide:

That is, adding one of the new disks to the chassis, assigning it in place of the 2nd cache drive, and starting the array. Then once that's finished, shutting down, removing the dead drive, connecting the other new 1TB SSD, then assigning that new disk in place of the final old disk and starting the array.

Seems easy.

However, before I start, I note on the cache disk page that it says under "Balance Status":

Current usage ratio: 44.9 % --- Full Balance recommended

Is that because my second disk has dropped out? Our should I perform a balance before I attach the new disk?

I've attached my diagnostics for your perusal.

Are there any problems with the steps I plan to take? Am I at risk of data loss?

Also: Should I have just shutdown and tried re-seating my SATA cable instead of assuming a dead disk? ie is the disk definitely dead?

Thanks for your help and insight.

 

EDIT:

I should  also add that I ran this command before I started typing up this post, then afterwards. As you can see, the error rates are increasing:

root@Percy:~# btrfs dev stats /mnt/cache
[/dev/sdb1].write_io_errs    2469391
[/dev/sdb1].read_io_errs     1065906
[/dev/sdb1].flush_io_errs    0
[/dev/sdb1].corruption_errs  0
[/dev/sdb1].generation_errs  0
[/dev/sdc1].write_io_errs    0
[/dev/sdc1].read_io_errs     0
[/dev/sdc1].flush_io_errs    0
[/dev/sdc1].corruption_errs  0
[/dev/sdc1].generation_errs  0
root@Percy:~# btrfs dev stats /mnt/cache
[/dev/sdb1].write_io_errs    2488143
[/dev/sdb1].read_io_errs     1067924
[/dev/sdb1].flush_io_errs    0
[/dev/sdb1].corruption_errs  0
[/dev/sdb1].generation_errs  0
[/dev/sdc1].write_io_errs    0
[/dev/sdc1].read_io_errs     0
[/dev/sdc1].flush_io_errs    0
[/dev/sdc1].corruption_errs  0
[/dev/sdc1].generation_errs  0

percy-diagnostics-20230206-1131.zip

Edited by jademonkee
Added btrfs dev stats command output

Solved by JorgeB

  • Community Expert

Replace both cables for both devices and post new diags after array start.

  • Author

Not so easy to just swap the cable (as it's a SAS > x4 SATA cable), so I placed the old disks on the two spare SATA connections on my expansion card.

After entering my encryption key to start the array, Firefox said that it will have to resend the info to show the page, I hit ok, but now all the disk slots are selectable, but there's no option to start/stop the array, only shutdown (see attached)

Diagnostics attached.

 

ArrayScreenshot.png

percy-diagnostics-20230206-1347.zip

  • Community Expert
22 minutes ago, jademonkee said:

After entering my encryption key to start the array, Firefox said that it will have to resend the info to show the page, I hit ok, but now all the disk slots are selectable

That's a known Firefox problem, reboot and use a different browser (or don't hit resend).

  • Author

🤦‍♂️

Ok, have rebooted.

Latest diags attached.

I should note that last time I booted, I received a warning that the crc error count was now 18.

percy-diagnostics-20230206-1422.zip

  • Community Expert

Pool is mounting and everything looks good so far, run a correcting scrub and post new diags if there are uncorrectable errors.

  • Author

Result of scrub:

UUID:             54142ec0-63e0-4706-afde-ebb28ee3d5d1
Scrub started:    Mon Feb  6 15:14:53 2023
Status:           finished
Duration:         0:02:26
Total to scrub:   104.18GiB
Rate:             730.69MiB/s
Error summary:    verify=6869 csum=314706
  Corrected:      321575
  Uncorrectable:  0
  Unverified:     0

 

Under Balance Status it still says:

Current usage ratio: 44.8 % --- Full Balance recommended

Should I balance it? I'm not entirely sure what it does...

I'll reset the error count on the User Script and reschedule it to run hourly.

 

And now I have to decide on if I'll keep or return the new SSD, too...

  • Author

Hrmm. I cleared the errors using:

root@Percy:~# btrfs dev stats -z /mnt/cache
[/dev/sde1].write_io_errs    2539524
[/dev/sde1].read_io_errs     1083292
[/dev/sde1].flush_io_errs    0
[/dev/sde1].corruption_errs  314706
[/dev/sde1].generation_errs  6869
[/dev/sdd1].write_io_errs    0
[/dev/sdd1].read_io_errs     0
[/dev/sdd1].flush_io_errs    0
[/dev/sdd1].corruption_errs  0
[/dev/sdd1].generation_errs  0

And re-scheduled the hourly script to check the pool for errors.

Then I went and re-enabled the Docker service, but now I get the error on the Docker page:

Docker Service failed to start.

Diags attached.

percy-diagnostics-20230206-1528.zip

  • Community Expert
  • Solution

No need to balance for now, and the devices look OK, more likely it was a cable/connection problem, keep monitoring the stats, and you need to recreate the docker image.

  • Author

Great stuff, thanks.

I've now recreated the Docker image (as well as the custom Docker network for Swag etc), and everything seems to be working well.

I'll keep an eye on everything over the coming days.

Thanks so much for your help.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.