Jump to content

[SOLVED] Data Loss - I really need help


Recommended Posts

Hi folks

 

I need help, i have about 8TB of data loss.

 

It started yesterday morning, i recieved a new Nvidia T400 GPU (for transcoding). I added it to the server and noticed i couldn't get to the web interface. I thought that

maybe the new PCIE device had reordered my stubbed PCIE devices so to cut a long story short i went along and removed everything from the VFIO config from the flash. I still couldn't get to the web interface.

 

I then thought it could have been a script hanging the boot (the unlock nvidia script) so i disabled that on the flash as well. I still couldn't get in. My only way in was

via safe mode no pluggins with the array off. When the array started it immediatly locked me out again. I shut down and pulled out the Nvidia GPU and reboot to safe mode.

 

I then noticed my disk 4 had an error (device disabled, contents emulated). I shut down, swapped the SATA cable and rebooted to safe mode but it didn't change.

 

I checked the forums and there was mentions about doing a smart check on the disk so i done that and the disk came back

in perfect health. Forums then mentioned the only way to rebuild it was to stop the array, remove the disk, start the array, stop the array, attach the disk, start the array and rebuild. So thats what i done

and i could see all the drives spin up for reads and disk 4 max out on writes. Cut forward say 18 hours and the pricess is completebut the disk is only showing as having around 100GB of data on it.

 

Last night my docker system was functioning so i was able to watch Emby. Towards the end of the night Emby threw a server error on my TV. I checked the server and there

were several BTRFX error- device loop2 errors in my log. I tried to reboot emby but it won't come back, it throws a 403 error.

 

Im lost at this point and i really need some advice.

Edited by AceRimmer
Link to comment
22 minutes ago, JorgeB said:

Forgot to mention, cache filesystem is corrupt, for btrfs is best to backup and reformat, I see that you're running Ryzen with overclocked RAM, that's a known cause of data corruption, see here and adjust RAM speeds accordingly.

 

Ok I'll clock back the ram.

 

As for the data Loss is there any way to rebuilt with the parity disk and remaining disks or would the changes have already been written to parity? 

 

And in regards to my cache do I format it and start fresh or do I restore my docker image backup? 

Link to comment
Just now, AceRimmer said:

As for the data Loss is there any way to rebuilt with the parity disk and remaining disks or would the changes have already been written to parity? 

There's a chance of recovery but only if you use a deleted file recovery util, like UFS explorer or similar.

 

1 minute ago, AceRimmer said:

And in regards to my cache do I format it and start fresh or do I restore my docker image backup? 

Docker image can easily be recreated.

Link to comment
9 minutes ago, AceRimmer said:

As for the data Loss is there any way to rebuilt with the parity disk and remaining disks or would the changes have already been written to parity? 

The moment you format the drive then parity is updated to reflect this.

 

You might find that data recovery software (such as UFS Explorer on Windows) can recover most of the data.

Link to comment

So I've tried a few things. I can't boot into GUI (in safe mode or not), the GUI doesn't display on my screen. It may be trying to output through my new transcoding GPU. 

 

I've tried renaming all my vdisks to break the XML so that the VM's don't auto start but that hasn't worked. I've tried removing the bonding setting from my NIC as well. That didn't help either. I've also unstubbed my second Ethernet nic just because why not. 

 

I managed to save the syslog to my flash so I'm attaching it below. This was done in safe mode without plugins over the web interface  I start the array, Dockers auto start then I turn on the VM manager and I get locked out. The log gets flooded with a disk0 read errors until I done a hard shut down. Syslog was over 300mb's so I've chopped it down to size. 

 

UPDATE: So ive swapped over to my MB's secondary Intel NIC and the VM Manager is booting up without crashing the system so im continuing with the rebuild. I dont understand why adding a secondary Nvidia GPU and my NIC drops out but prior to adding the new GPU i ran a secondary AMD GPU for months without any issue. If anyone can decypher the logs please do let me know.

 

syslog_short.txt

Edited by AceRimmer
Adding an update
Link to comment
4 hours ago, AceRimmer said:

I dont understand why adding a secondary Nvidia GPU and my NIC drops out

If you are passing though any devices to the VM, or if they are bound, depending on the way you have them bound, the hardware IDs can and will likely change when adding new hardware, so if you were for example passing through device 01:00.0 that was a GPU before, it can be a different device after adding (or removing) some hardware, so check those.

Link to comment
  • AceRimmer changed the title to [SOLVED] Data Loss - I really need help

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...