Power failure while writing to SSD cache - what to do?

fersal · April 30, 2015

I was in the process of copying a large number of files to my UnRAID v6.0b15 box, and what I suspect to be a faulty UPS cut off power to the server right in the middle of the transfer. I'm pretty sure that all the writes were made to my SSD cache, since I checked the status a couple of times, and during this time all the hard drives were spun down.

I've been thinking about my best course of action at this point to protect my existing data, since it's likely that the data in the SSD drive may be corrupted... what should I do? Should I start it again like nothing happened, or maybe I should unplug the SSD and start the server without it to see what happens?

jonp · April 30, 2015

A while ago I was copying a large number of files to my UnRAID v6.0b15 box, and what I suspect to be a faulty UPS cut off power to the server right in the middle of the transfer. I'm pretty sure that all the writes were made to my SSD cache, since I checked the status a couple of times, and during this time all the hard drives were spun down.

I've been thinking about my best course of action at this point to protect my existing data, since it's likely that the data in the SSD drive may be somewhat corrupted... what should I do? Should I start it again like nothing happened, or maybe I should unplug the SSD and start the server without it to see what happens?

What filesystem is your cache device formatted with?

fersal · April 30, 2015

I'm pretty sure it was btrfs.

fersal · April 30, 2015

... so any ideas regarding this? I'm terrified of screwing it up! I'll appreciate any advice.

gfjardim · April 30, 2015

... so any ideas regarding this? I'm terrified of screwing it up! I'll appreciate any advice.

First you need to mount the drive, then run a read-only scrub with this command:

PS: if it's your cache drive, change the path below to /mnt/cache

/sbin/btrfs scrub start -B -R -d -r /path/to/you/mount/

Then post the result here so we can see if it has any errors.

fersal · April 30, 2015

OK thanks, here's what I got:

Linux 3.19.4-unRAID.

root@Tower:~#

root@Tower:~# /sbin/btrfs scrub start -B -R -d -r /mnt/cache

scrub device /dev/sde1 (id 1) done

scrub started at Thu Apr 30 15:04:24 2015 and finished after 448 seconds

data_extents_scrubbed: 921696

tree_extents_scrubbed: 2145

data_bytes_scrubbed: 59669508096

tree_bytes_scrubbed: 35143680

read_errors: 0

csum_errors: 0

verify_errors: 0

no_csum: 13113216

csum_discards: 3071

super_errors: 0

malloc_errors: 0

uncorrectable_errors: 0

unverified_errors: 0

corrected_errors: 0

last_physical: 118132572160

root@Tower:~#

gfjardim · April 30, 2015

Seems clear to me.

fersal · April 30, 2015

It looks OK, but I listed the contents of the cache drive and I see a bunch of media folders and files, many of which are present on the hard drive shares from before... aren't they supposed to be deleted from the cache drive once they're copied to the array?

Also I'm pretty sure the file transfer was still under way when the power went down. Would it be OK to delete the cache contents and start the copy over just to be sure, or would I be messing something up by doing this?

Thanks for all your help!

trurl · April 30, 2015

It looks OK, but I listed the contents of the cache drive and I see a bunch of media folders and files, many of which are present on the hard drive shares from before... aren't they supposed to be deleted from the cache drive once they're copied to the array?

Also I'm pretty sure the file transfer was still under way when the power went down. Would it be OK to delete the cache contents and start the copy over just to be sure, or would I be messing something up by doing this?

Thanks for all your help!

Are you sure mover ran? I sort of had the impression that you had your server off during that period.

fersal · April 30, 2015

What seems strange is that some of the files that are still in the cache were copied to disk by mover in a prior run the night before, but they were never deleted. At what point does mover delete the files it copies? Is it immediate, or does it do it at the end in batch mode? Because it's possible that something went wrong in the middle of the night because of the flaky UPS.

At this point would it be OK for me to delete the files in the cache so they're not copied over again, or should I just let it go and see what happens?

fersal · April 30, 2015

To make things clear, all of the files on the cache drive seem to be present on the disk shares, even though mover never ran... is it possible that UnRAID is smart enough to list the contents of the disk shares to include the cache files as if they had already been copied to their corresponding folders?

JonathanM · April 30, 2015

To make things clear, all of the files on the cache drive seem to be present on the disk shares, even though mover never ran... is it possible that UnRAID is smart enough to list the contents of the disk shares to include the cache files as if they had already been copied to their corresponding folders?

User shares, (/mnt/user/folder) yes, definitely, that's one of the features. /mnt/diskx? No. If it shows up under the /mnt/disk1 path, it's on disk1. Everything under /mnt/user/ is a virtual pointer back to the real file that exists on one of the /mnt/diskx or /mnt/cache paths.

You can configure unraid to share any of the above, so be sure you are actually looking in the correct place. I would telnet in or use the local console to verify the files exist on the /mnt/disk paths before erasing the cache drive.

fersal · April 30, 2015

Got it! I was fooled into thinking that the files were on disk by looking at the user shares rather than disk1, disk2 etc. I'll let mover run tonight and do its thing. Thanks!

Power failure while writing to SSD cache - what to do?

Recommended Posts

fersal

Link to comment

jonp

Link to comment

fersal

Link to comment

fersal

Link to comment

gfjardim

Link to comment

fersal

Link to comment

gfjardim

Link to comment

fersal

Link to comment

trurl

Link to comment

fersal

Link to comment

fersal

Link to comment

JonathanM

Link to comment

fersal

Link to comment

Archived