SSD Cache drives write errors


Recommended Posts

I have a pair of Samsung SSD drives for my cache running on btrfs file system. These primarily run my Dockers, Plex, Sonarr, Radarr, nzbget & tautulli. THis setup has been running well for over a year with no problems. A month or so ago I installed UniFi Controller and Video dockers and not sure if this was the cuase but it is certainly where my troubles began. I have since removed the UniFi dockers and migrated over to a Cloud Key.

 

The issue is some write issues in the past that have caused csum errors on the drives. I started a thread under apps/docker initially as I thought the problem was with the docker image;

So with some help I ran the btrfs stats command and I could see the write errors;

[/dev/sdl1].write_io_errs 7
[/dev/sdl1].read_io_errs 0
[/dev/sdl1].flush_io_errs 0
[/dev/sdl1].corruption_errs 0
[/dev/sdl1].generation_errs 0
[/dev/sdk1].write_io_errs 9
[/dev/sdk1].read_io_errs 0
[/dev/sdk1].flush_io_errs 0
[/dev/sdk1].corruption_errs 0
[/dev/sdk1].generation_errs 0

 

Advice was to look at the following thread and follow the suggestions;

 

From here I could see the write errors and I replaced the cable for the SSD's. It's at this point I get a bit stuck as to how to fix the issue? Each time I run a Scrub command either via terminal or via the gui, it aborts immediately.

 

I have actually now purchased a new pair of SSD's but I am assuming this is a btrfs issue as no reporting is given to errors on the f/s so I want to avoid in the future. I would really like to fix this though.

 

Latest diag report attached.

 

media-diagnostics-20190416-1446.zip

Link to comment

These write errors are a hardware problem, btrfs only reports them, it's not the reason for them:

Apr 16 14:15:17 Media kernel: BTRFS info (device sdk1): bdev /dev/sdk1 errs: wr 7, rd 0, flush 0, corrupt 0, gen 0
Apr 16 14:15:17 Media kernel: BTRFS info (device sdk1): bdev /dev/sdl1 errs: wr 9, rd 0, flush 0, corrupt 0, gen 0

Now, unless you reset them these are for the life of the filesystem, they don't reset on reboot, more info here:

https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=700582

 

 

Now, besides those errors, and likely because of them, you now have filesystem corruption:

Apr 16 14:16:25 Media kernel: BTRFS critical (device sdk1): corrupt node: root=7 block=93667393536 slot=86, bad key order, current (18446744073709551606 128 986823262208) next (18446744073707454454 128 986826604544)
### [PREVIOUS LINE REPEATED 1 TIMES] ###
Apr 16 14:16:25 Media kernel: BTRFS: error (device sdk1) in btrfs_finish_ordered_io:3074: errno=-5 IO failure
Apr 16 14:16:25 Media kernel: BTRFS info (device sdk1): forced readonly

With btrfs best way forward is to backup cache, format and restore the data.

Link to comment

Ok thanks.

 

The reset was partly where i was getting confused, I was thinking that I needed to resolve the issues via the scrub and then I would reset. Obviously misunderstood, my bad.

 

I was just looking into backup now so at least I'm doing something right... :)

 

To be clear then, do I reset the drives before the backuo, format and restore?

 

Many thanks

Link to comment

So not going particularly well, to be expected given the corruption I guess. I have set appdata and system to not use the cache drive and kicked off the mover, expecting it to then move the data off the cache drive. This doesn't work and I see warnings in the log so I am assuming moving the data off is a non starter and I'm going to rely on backups as I'm using CA backup for appdata.

 

Therefore I think what I need to do now is;

1) Re-format the cache drives

2) Re-create the docker image

3) Restore the appdata via the CA backup

4) Reinstall the docker apps

 

Is that the right process?

 

I'm also failing to see how I format the cache drives, sorry, probably being dumb here again....

Link to comment

I think I have found the re-format answer;

Stop the array.

Main, Cache Device, Change the format type to what you want.  If its already what you want and just want to reformat it, then:

Change it to anything else.

Start the array.

Format the drive

Stop the array

Change the format type to what you want

Start the array

Format the drive

 

However, when I stop the array and go to change the format I only have auto, btrfs, btrfs encrypted. Should I chose the auto or encrypted? Worried that encrypted will take ages or impossible to back out of?

Link to comment

Setting the appdata to Use Cache = No does not achieve what you want?  You actually need Use Cache=Yes to achieve what you want.   If you turn on the Help in the GUI this might make it clearer as it describes how the various Use Cache options interact with mover.

 

Regarding the format options if you have more than one cache drive you are only going to be offered BTRFS options.   You get offered other options if you have a single cache drive.   You can safely change the format to BTRFS encrypted and format the drive.   After that stop the array and change it back to BTRFS (assuming that is what you want) restart the array and run the format again.

Edited by itimpi
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.