[SOLVED] File corruption on my BTRFS cache drive


Recommended Posts

Hello :)

 

It seems that I have some file corruptions on my cache drive after a system reboot.

Currently there does not seem to be any data affected that I would not be able to reproduce with some amount of time.

But if there is any chance to recover those files I would gladly take that and also try to find out what actually happened on that drive.

 

My system

  • Unraid 6.8.2 (initially installed 6.8.3, downgraded to 6.8.2 Unraid DVB Edition later with plugin Unraid DVB for using my TV cards)
  • Mainboard: Supermicro X11SPi-TF
  • CPU: Intel Xeon Silver 4210
  • RAM: 2x 32GB Samsung M393A4K40CB1-CRC DDR4-2400 regECC DIMM CL17 Single
  • Cache drive: 1000GB Samsung 970 Evo Plus M.2 2280 PCIe 3.0 (BTRFS - encrypted)
  • HDDs (Array): 4x WD 8TB Red (XFS - encrypted)
  • UPS: APC Back-UPS 700 VA (no unsafe shutdowns have happened at any time!)

 

I just started using Unraid recently (a couple of weeks ago). So there have been some hardware/software changes in in that time of course, but not for about a week or so now.

Unfortunately I have not yet set up automatic backups for my cache drive, so all data that is affected would be probably gone, if not recoverable.

I do not know if my diagnostics.zip will show any relevant errors, as the described problems happened around the reboot (either before or right after).

 

 

What happened before

  • Everything was running ok before: Nothing that would have indicated any of those errors has happened before
  • NO hardware changes in the last days
  • NO reboots in the last days
  • Stopped my array for changing some network settings (regarding my IPv6 configuration)
  • Rebooted the system
  • That is where the following problems occurred

 

Problems

  • After reboot, some dockers have not started -> docker.txt logfile shows an error when mounting the appdata/oscam folder for my oscam docker
  • My cache drive seems to have some file corruptions now (as you can see at the rsync-log.txt file)
    • /mnt/user/appdata/oscam/ (which is on the cache)
    • /mnt/user/appdata/ddns-route53/ (also on the cache)
    • 2 files in /mnt/user/appdata/letsencrypt/... (also on the cache)
    • libvirt.img in /mnt/user/system/libvirt (also on the cache) -> This file seems to exist on my parity-protected array! (see error_system-libvirt.png)
  • I can open the appdata-share over SMB, but no files/folders are shown
  • When clicking on "View files" at the "Shares" (Unraid WebGUI), no files are shown as well (see gui_appdata-error.png)
  • Same error happens with the system share: No files in /mnt/user/system/libvirt (see gui_system-libvirt-error.png)
    • Although this file seems to exist on my HDD, as stated before
  • I can not access the VMs tab at Unraid WebGUI, as the libvirt.img seems to be affected (all VM images seem not to be affected by corruption)

 

What I did after

  • Backup all content on my cache drive to my shares (HDD array) to prevent any further file corruption
    • See rsync-log.txt for report
  • Check "Fix Common Problems" plugin for any errors -> No errors found
  • There were definitely files written to my cache drive again after the problem occurred (due to copying data to my shares with cache-setting yes and due to Tvheadend writing its timeshift file to the cache - Tvheadend is stopped for now though!)
  • I have not rebooted my system since.
  • I have not tried to use any recovery methods for my cache drive yet, as I did not want to make anything worse for now.

 

 

My questions for now to possibly recover the affected corrupted files

  1. Is there any chance to rebuild the corrupted files/directories at all?
  2. Should I reboot again, to potentially fix any errors?
  3. Should I use (and is it safe to use) BTRFS scrub on my cache to recover potential errors?
  4. Can these methods help to recover my files/folders, if SCRUB will not help? https://forums.unraid.net/topic/46802-faq-for-unraid-v6/page/2/?tab=comments#comment-543490
  5. If not, is it safe to override the corrupt file /mnt/cache/system/libvirt/libvirt.img with the (seemingly) working file /mnt/disk1/system/libvirt/libvirt.img to be able to use VMs again?

 

 

I still have some hope for my files to be recovered, but if not, the situation does not seem to be too bad for now, as it did not affect all my files at the cache (and most importantly, not the files that I have needed a lot more time to configure to 😅).

As I am still new to Unraid I did not want to try to much yet to potentially lose the chance to get those files at all.

And of course, from now on backing up my files will definitely be higher prioritized!

 

Thanks in advance 🙂

jnk

 

rsync-log.txt

screenshots.zip

Edited by jnk22
Link to comment

I was able to restore all my data using btrfs restore! :) 

 

I restored all data to another HDD (as an unassigned device), formatted the cache driver afterwards (so that I have a clean drive again), and copied all data back to it again.

 

I used "btrfs restore -xmSv" to also restore and keep my file attributes (see https://btrfs.wiki.kernel.org/index.php/Restore), just in case that anyone else might find this helpful. It seems that they have all been properly restored.

 

Thanks a lot for your quick reply and help!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.