AceRimmer Posted November 25, 2021 Share Posted November 25, 2021 (edited) Hi folks I need help, i have about 8TB of data loss. It started yesterday morning, i recieved a new Nvidia T400 GPU (for transcoding). I added it to the server and noticed i couldn't get to the web interface. I thought that maybe the new PCIE device had reordered my stubbed PCIE devices so to cut a long story short i went along and removed everything from the VFIO config from the flash. I still couldn't get to the web interface. I then thought it could have been a script hanging the boot (the unlock nvidia script) so i disabled that on the flash as well. I still couldn't get in. My only way in was via safe mode no pluggins with the array off. When the array started it immediatly locked me out again. I shut down and pulled out the Nvidia GPU and reboot to safe mode. I then noticed my disk 4 had an error (device disabled, contents emulated). I shut down, swapped the SATA cable and rebooted to safe mode but it didn't change. I checked the forums and there was mentions about doing a smart check on the disk so i done that and the disk came back in perfect health. Forums then mentioned the only way to rebuild it was to stop the array, remove the disk, start the array, stop the array, attach the disk, start the array and rebuild. So thats what i done and i could see all the drives spin up for reads and disk 4 max out on writes. Cut forward say 18 hours and the pricess is completebut the disk is only showing as having around 100GB of data on it. Last night my docker system was functioning so i was able to watch Emby. Towards the end of the night Emby threw a server error on my TV. I checked the server and there were several BTRFX error- device loop2 errors in my log. I tried to reboot emby but it won't come back, it throws a 403 error. Im lost at this point and i really need some advice. Edited December 17, 2021 by AceRimmer Quote Link to comment
ChatNoir Posted November 25, 2021 Share Posted November 25, 2021 Please attach your diagnostics to your next post. Quote Link to comment
AceRimmer Posted November 25, 2021 Author Share Posted November 25, 2021 2 minutes ago, ChatNoir said: Please attach your diagnostics to your next post. voyager-diagnostics-20211125-1404.zip Quote Link to comment
JorgeB Posted November 25, 2021 Share Posted November 25, 2021 You formatted disk4, formatting is never part of s rebuild, there's a warning about it: When you have an unmountable this the correct thing to do is to check filesystem. Quote Link to comment
JorgeB Posted November 25, 2021 Share Posted November 25, 2021 Forgot to mention, cache filesystem is corrupt, for btrfs is best to backup and reformat, I see that you're running Ryzen with overclocked RAM, that's a known cause of data corruption, see here and adjust RAM speeds accordingly. Quote Link to comment
AceRimmer Posted November 25, 2021 Author Share Posted November 25, 2021 22 minutes ago, JorgeB said: Forgot to mention, cache filesystem is corrupt, for btrfs is best to backup and reformat, I see that you're running Ryzen with overclocked RAM, that's a known cause of data corruption, see here and adjust RAM speeds accordingly. Ok I'll clock back the ram. As for the data Loss is there any way to rebuilt with the parity disk and remaining disks or would the changes have already been written to parity? And in regards to my cache do I format it and start fresh or do I restore my docker image backup? Quote Link to comment
JorgeB Posted November 25, 2021 Share Posted November 25, 2021 Just now, AceRimmer said: As for the data Loss is there any way to rebuilt with the parity disk and remaining disks or would the changes have already been written to parity? There's a chance of recovery but only if you use a deleted file recovery util, like UFS explorer or similar. 1 minute ago, AceRimmer said: And in regards to my cache do I format it and start fresh or do I restore my docker image backup? Docker image can easily be recreated. Quote Link to comment
itimpi Posted November 25, 2021 Share Posted November 25, 2021 9 minutes ago, AceRimmer said: As for the data Loss is there any way to rebuilt with the parity disk and remaining disks or would the changes have already been written to parity? The moment you format the drive then parity is updated to reflect this. You might find that data recovery software (such as UFS Explorer on Windows) can recover most of the data. Quote Link to comment
AceRimmer Posted November 25, 2021 Author Share Posted November 25, 2021 Cool, that's for the advice guys, I'll pull the drive and see what I can find on it. I may leave this thread floating for the time being because I may need more advice during the rebuild. Quote Link to comment
AceRimmer Posted December 1, 2021 Author Share Posted December 1, 2021 @JorgeB @itimpi Hey guys, it's a few days later and I'm rebuilding the machine. I've wiped the cache drives and set them up again, recreated the docker image. Dockers are running great. My original problem is back though. I've just turned on the VM manager and I've been locked out of the web GUI again. Quote Link to comment
AceRimmer Posted December 1, 2021 Author Share Posted December 1, 2021 (edited) So I've tried a few things. I can't boot into GUI (in safe mode or not), the GUI doesn't display on my screen. It may be trying to output through my new transcoding GPU. I've tried renaming all my vdisks to break the XML so that the VM's don't auto start but that hasn't worked. I've tried removing the bonding setting from my NIC as well. That didn't help either. I've also unstubbed my second Ethernet nic just because why not. I managed to save the syslog to my flash so I'm attaching it below. This was done in safe mode without plugins over the web interface I start the array, Dockers auto start then I turn on the VM manager and I get locked out. The log gets flooded with a disk0 read errors until I done a hard shut down. Syslog was over 300mb's so I've chopped it down to size. UPDATE: So ive swapped over to my MB's secondary Intel NIC and the VM Manager is booting up without crashing the system so im continuing with the rebuild. I dont understand why adding a secondary Nvidia GPU and my NIC drops out but prior to adding the new GPU i ran a secondary AMD GPU for months without any issue. If anyone can decypher the logs please do let me know. syslog_short.txt Edited December 1, 2021 by AceRimmer Adding an update Quote Link to comment
JorgeB Posted December 1, 2021 Share Posted December 1, 2021 4 hours ago, AceRimmer said: I dont understand why adding a secondary Nvidia GPU and my NIC drops out If you are passing though any devices to the VM, or if they are bound, depending on the way you have them bound, the hardware IDs can and will likely change when adding new hardware, so if you were for example passing through device 01:00.0 that was a GPU before, it can be a different device after adding (or removing) some hardware, so check those. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.