(SOLVED) Loss of VMs, Dockers, Plugins


Recommended Posts

Hi

 

This morning my place had a power cut, after rebooting - I am now no longer able to mount my system_cache.

 

My system cache had:

  • appdata
  • domains
  • isos
  • system
  •  

It was mirrored to harden it on loss of a drive.

 

I only transferred across to the new setup on Monday, yesterday I spent my time sorting the remaining files, and plex. 

The only backups I managed was libvert.img and my usb drive - it was on my to do list today.

 

 I don't want to loose those VM's really.  One had HomeAutomation software which I had all registered.

 

Brand new server and this happens, I am gutted 😔

 

Can someone please give advice? 

 

image.thumb.png.afe035db86294a667734733084c36e26.png

Edited by SmokeyColes
Link to comment

Hi @JorgeB

 

Forgive me, my ability isn't there.  Here's what I tried:

 

mkdir / x

mount -o degraded,usebackuproot,ro /dev/nvme0n1 /x

 

I just want to check before I move to BTRFS restore (safe to use) step -  that I've done the command correct?

 

I'm happy to pay for your time on this, with the help you've provided - you deserve a couple of pints anyway for it all.

 

 

image.thumb.png.60c30bae6ef3938b9a1a5066cb7098d2.png

Edited by SmokeyColes
Link to comment

Oh thank goodness; honestly my nerves!  I'm copying everything across in MC (as per your instructions) into a new usr/share now. 

Copying it.

 

What would you suggest my next step?

 

Should I move to step2 - BTRFS restore (safe to use)

or should I format which I think is what I need to do, and then once formatted move it all back, and restart the server?

 

Please PM me or let me know your paypal email too, buying you a few pints - its the least I can do for not loosing my VMs.

Edited by SmokeyColes
Link to comment
10 minutes ago, SmokeyColes said:

What would you suggest my next step?

After everything is copied you can try to init the log tree:

 

First unmount filesystem with:

umount /x

 

Then:

btrfs rescue zero-log /dev/nvme0n1p1

 

Start the array normally and see if the pool mounts, if it does and all looks fine nothing more should be needed, but make sure you do regular backups of anything important.

Link to comment

I have spoken to soon.

 

appdata and system no issues with the copy

 

1 of 3 VM's transferred, but the 2 I really needed did not.

 

2 of the 3 ISOs did not - but in fairness I can redownload (I'm not excessively worried) but would like to recover the VMs.

 

Should I move onto init log tree, or should I try something else?

 

image.thumb.png.36a512bd455866f5b70237be642096ad.png

 

 

Edited by SmokeyColes
Link to comment

Yes, they are failing checksums, I also noticed data corruption was detected in you other pool, you likely have bad RAM or other hardware problem.

 

You can use btrfs restore since it won't check for that, but files will still be corrupt, but depending on the severity they might still work.

Link to comment

I'm having a really bad day.

 

Neither VM worked; I have decided to format.  But for now the server is OOA, and wont be switched on until new memory modules are bought.

I'm purely to blame for this; I bought second hand DDR4 and now paying for it!  30208 errors at 21%.  

 

 

 

I wouldn't mind if you could just tell me what you meant about corrupt data in the other pool.  I sent over a number of TV shows to the download_cache then ran mover to the array.

 

Is it safe to say that that now my array has some corrupt data as well?

Sounds to me like I need to delete the files I sent across to my data drives, and will need to format the system_cache and download_cache all again (after replacing ram).

 

Once I hear back, I'll mark this as SOLVED and want to thank you for your help.  

 

 

 

Link to comment
15 minutes ago, SmokeyColes said:

Is it safe to say that that now my array has some corrupt data as well?

It's suspect, but not proven corrupt until you compare checksums. Unless you already have a solution in place to do that, many times it's faster to assume corruption and recopy.

 

There are many ways of doing checksum comparisons of files, almost as many as there are methods of copying the files in the first place. I recommend doing some research and getting a handle on at least one way of doing it, that way you won't get caught flat footed again.

 

10,000 foot overview.

 

You can compare files bit by bit, there are programs that read all the bits in each file in parallel and compare as they go. That method is SLOW, especially over a network.

You can compare the checksum of the files, which is a short string that depending on the method of obtaining the string is almost totally guaranteed to be different if the file is different. That means once you know the name, size, and checksum of the file, you can transfer just that information to the other system and compare it to the list generated by the copied files, and if everything matches you are 99.999% sure you don't have corruption. The amount of 9's in that confidence level can be huge, the chances of a hash collision are pretty much 0 for any decent sized files.

 

Generating the list of checksums for both the original and copy, plus comparing the list, can be time consuming, so if the amount of data is small, recopying is faster.

Link to comment
24 minutes ago, SmokeyColes said:

I wouldn't mind if you could just tell me what you meant about corrupt data in the other pool.  I sent over a number of TV shows to the download_cache then ran mover to the array.

You can run a scrub on that pool to look for any corrupt files, if there are any they will be listed on the syslog, files corrupted on btrfs would be detected and you'd get an i/o error when trying to move/copy them, like the ones you got above, but some files might get corrupt after leaving btrfs at the time they got written to the xfs array disks, and no way to find those.

Link to comment
5 minutes ago, jonathanm said:

If the files were corrupted in RAM before being written to the cache, BTRFS would have no way of determining that the incoming data was bad.

Correct, it can detect some if there are errors during read, i.e. calculating the checksum for the stored block failed to match the written one due to RAM errors, I believe it's possible that either the block or checksum are corrupted before storing and those can also be detected since they won't match, but if the corruption happens before the data is written and the checksum is calculated for the corrupt block then those would go undetected, unless there were more errors during reads.

Link to comment

Bad RAM is bad juju. You really never know what it is or isn't going to corrupt, since pretty much every operation uses RAM to move data. Honestly, best case scenario is RAM so bad the machine just crashes hard, that way you at least can't get much done before figuring out you have a major issue.

  • Like 1
Link to comment

Hi just an update before I close the post; I was fairly lucky as a lot of the files I had transferred to the cache but not transferred cache to the array as I thought. 

 

I was easily able to identify corrupted stuff as the moves left on the cache - were the corrupted files (the move from cache to array was done with the new ram in).

 

So I have no idea, but its sorted now.  I lost about 40GB overall.

 

I have slowly been working my way back to repairing the damaged of the corrupt VMs taking regular backups - this is where it hit me hardest (in time and effort).

 

 

Unfortunately I received a new error tonight which I'm going to no doubt raise with you pro's again (I'm not having much luck!)

 

Thank you all for you kind help and support.

  • Like 1
Link to comment
  • SmokeyColes changed the title to (SOLVED) Loss of VMs, Dockers, Plugins

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.