Jump to content

Power failure while writing to SSD cache - what to do?


fersal

Recommended Posts

I was in the process of copying a large number of files to my UnRAID v6.0b15 box, and what I suspect to be a faulty UPS cut off power to the server right in the middle of the transfer.  I'm pretty sure that all the writes were made to my SSD cache, since I checked the status a couple of times, and during this time all the hard drives were spun down.

 

I've been thinking about my best course of action at this point to protect my existing data, since it's likely that the data in the SSD drive may be  corrupted... what should I do?  Should I start it again like nothing happened, or maybe I should unplug the SSD and start the server without it to see what happens?

Link to comment

A while ago I was copying a large number of files to my UnRAID v6.0b15 box, and what I suspect to be a faulty UPS cut off power to the server right in the middle of the transfer.  I'm pretty sure that all the writes were made to my SSD cache, since I checked the status a couple of times, and during this time all the hard drives were spun down.

 

I've been thinking about my best course of action at this point to protect my existing data, since it's likely that the data in the SSD drive may be somewhat  corrupted... what should I do?  Should I start it again like nothing happened, or maybe I should unplug the SSD and start the server without it to see what happens?

 

What filesystem is your cache device formatted with?

Link to comment

... so any ideas regarding this?  I'm terrified of screwing it up!  I'll appreciate any advice.

 

First you need to mount the drive, then run a read-only scrub with this command:

 

PS: if it's your cache drive, change the path below to /mnt/cache

 

/sbin/btrfs scrub start -B -R -d -r /path/to/you/mount/

 

Then post the result here so we can see if it has any errors.

Link to comment

OK thanks, here's what I got:

 

Linux 3.19.4-unRAID.

root@Tower:~#

root@Tower:~#

root@Tower:~# /sbin/btrfs scrub start -B -R -d -r /mnt/cache

scrub device /dev/sde1 (id 1) done

        scrub started at Thu Apr 30 15:04:24 2015 and finished after 448 seconds

        data_extents_scrubbed: 921696

        tree_extents_scrubbed: 2145

        data_bytes_scrubbed: 59669508096

        tree_bytes_scrubbed: 35143680

        read_errors: 0

        csum_errors: 0

        verify_errors: 0

        no_csum: 13113216

        csum_discards: 3071

        super_errors: 0

        malloc_errors: 0

        uncorrectable_errors: 0

        unverified_errors: 0

        corrected_errors: 0

        last_physical: 118132572160

root@Tower:~#

 

Link to comment

It looks OK, but I listed the contents of the cache drive and I see a bunch of media folders and files, many of which are present on the hard drive shares from before... aren't they supposed to be deleted from the cache drive once they're copied to the array?

 

Also I'm pretty sure the file transfer was still under way when the power went down.  Would it be OK to delete the cache contents and start the copy over just to be sure, or would I be messing something up by doing this?

 

Thanks for all your help!

Link to comment

It looks OK, but I listed the contents of the cache drive and I see a bunch of media folders and files, many of which are present on the hard drive shares from before... aren't they supposed to be deleted from the cache drive once they're copied to the array?

 

Also I'm pretty sure the file transfer was still under way when the power went down.  Would it be OK to delete the cache contents and start the copy over just to be sure, or would I be messing something up by doing this?

 

Thanks for all your help!

Are you sure mover ran? I sort of had the impression that you had your server off during that period.
Link to comment

What seems strange is that some of the files that are still in the cache were copied to disk by mover in a prior run the night before, but they were never deleted.  At what point does mover delete the files it copies?  Is it immediate, or does it do it at the end in batch mode?  Because it's possible that something went wrong in the middle of the night because of the flaky UPS.

 

At this point would it be OK for me to delete the files in the cache so they're not copied over again, or should I just let it go and see what happens?

Link to comment

To make things clear, all of the files on the cache drive seem to be present on the disk shares, even though mover never ran... is it possible that UnRAID is smart enough to list the contents of the disk shares to include the cache files as if they had already been copied to their corresponding folders?

Link to comment

To make things clear, all of the files on the cache drive seem to be present on the disk shares, even though mover never ran... is it possible that UnRAID is smart enough to list the contents of the disk shares to include the cache files as if they had already been copied to their corresponding folders?

User shares, (/mnt/user/folder) yes, definitely, that's one of the features. /mnt/diskx? No. If it shows up under the /mnt/disk1 path, it's on disk1. Everything under /mnt/user/ is a virtual pointer back to the real file that exists on one of the /mnt/diskx or /mnt/cache paths.

 

You can configure unraid to share any of the above, so be sure you are actually looking in the correct place. I would telnet in or use the local console to verify the files exist on the /mnt/disk paths before erasing the cache drive.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...