Cache drive showed as unassigned? Docker won't start now

July 22, 20196 yr

Got an alert that one of my cache drives was missing, but it showed as an unassigned device.

I rebooted server and it's detected as part of cache again but docker service is now failing to start.

What's the proper way to solve an issue like this? Is the rest of the data in cache pool likely to be corrupt?

Quote

July 22, 20196 yr

Community Expert

3 hours ago, CorneliousJD said:

Is the rest of the data in cache pool likely to be corrupt?

Docker image is likely corrupt, so likely will any more data using NOCOW shares, rest should be fixable with a scrub, more info here.

Quote

July 22, 20196 yr

Author

3 hours ago, johnnie.black said:

Docker image is likely corrupt, so likely will any more data using NOCOW shares, rest should be fixable with a scrub, more info here.

Thanks, the NOCOW was key, my /system/ and /vms/ shares are NOCOW (both were this way by default from what I recall).

docker.img rebuilt and running btrfs scrub start -B /mnt/cache/ now -- so far no output, not sure how long to expect it to take, or if I can continue to rebuild docker containers while this goes.

The odd thing is this same disk blinked offline before, but it was OVER a year ago (early April of 2018).

Strange....

Quote

July 22, 20196 yr

Author

3 hours ago, johnnie.black said:

Docker image is likely corrupt, so likely will any more data using NOCOW shares, rest should be fixable with a scrub, more info here.

Scrubbing didn't take long, but now btrfs dev stats /mnt/cache shows corrupt errors now when it didn't before the scrub. Not sure what to make of this.

image.png.c81a0c014f3818e98cdb4c0f42c2c201.png

Quote

July 22, 20196 yr

Community Expert

2 minutes ago, CorneliousJD said:

both were this way by default from what I recall

They are.

3 minutes ago, CorneliousJD said:

not sure how long to expect it to take

Depends on much data and how fast cache is, since you used -B you can check progress with:

btrfs scrub status /mnt/cache

And you can continue to use the cache normally, with reduced performance

Quote

July 22, 20196 yr

Community Expert

1 minute ago, CorneliousJD said:

but now btrfs dev stats /mnt/cache shows corrupt errors now when it didn't before the scrub

That's normal, what's important is that there were no uncorrectable errors, all data on COW shares is now fine and correctly mirrored, any data on NOCOW shares can still be corrupt, like mentioned in the link, no way to check or correct it since it's not checksummed.

You should also reset current stats and use a script to monitor for the future.

Quote

July 22, 20196 yr

Author

1 minute ago, johnnie.black said:

That's normal, what's important is that there were no uncorrectable errors, all data on COW shares is now fine and correctly mirrored, any data on NOCOW shares can still be corrupt, like mentioned in the link, no way to check or correct it since it's not checksummed.

You should also reset current stats and use a script to monitor for the future.

Awesome, good to know I can at least get back to normal work-mode today for now, I'll circle back on the scripts in a few hours.

Was awesome waking up to the fix - thanks @johnnie.black -- you've helped me before in the past as well, much appreciated!

Quote

July 22, 20196 yr

Author

So I'm working on setting up the scripts for ease of use and the one that generates a warning seemed to work once, but now it's not, I had deleted the script to create them with better names/descriptions (and so the folders would get named correctly in /flash/config/plugins/user.scripts/scripts/)

Now when I copy/paste your script in I get the following error, and it doesn't actually generate the warning.

/tmp/user.scripts/tmpScripts/Cache - Error Check/script: line 4: syntax error: unexpected end of file

The script is only 3 lines long, so there's nothing on line 4.

Is there a way I should be closing out this script to run cleanly?

Quote

July 22, 20196 yr

Community Expert

If you're using user scrips there's nothing needed to do.

Quote

July 22, 20196 yr

Author

1 minute ago, johnnie.black said:

If you're using user scrips there's nothing needed to do.

Yep, I am, just thought it was odd that it was the only one showing that error - the rest of the scripts I have exit cleanly.

I'll go ahead and schedule that for hourly runs. I created a scrub script and a script to reset error counts incase this happens again.

the SSD is on a backplane so I can't really change any cables, and it hasn't happened for over a year until now. I'll be prepared this time though with the new scripts added incase it happens again.

Thank you again for your time and info!

Quote

Cache drive showed as unassigned? Docker won't start now

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)