All data gone?


Recommended Posts

Gday all,

 

Before i start my story, at time of writing i'm a little salty, so excuse the negative undertone.

And a disclaimer: In previous posts i have mentioned having Machine Check Events from ECC memory. This was caused by a defective memory module and has been replaced. System was running fine for 2 - 3 weeks since replacing the module. While replacing the module i have also replaced 3 intertech Fans (1200rpm pwm) for Noctua NF-F12 industrial PPC 3000rpm fans.

 

The question i'm looking to get answered: Where did i go wrong and how can i prevent this in the future?

 

Last monday i noticed 2 of my drives had failed in my 5 disk array with 2 parity. All WD Red 4TB drives. Which i found a bit odd.

So first thing i did was check if the parity hadn't failed. Which it didn't seem too. The disks were being emulated.

Pffew, parity saved my ass....... orso i thought.

Data was showing on the network, albeit a bit slow, i was able to acces it.

So i ordered 2 new drives. And started troubleshooting.

 

Checked all cables and connections. All was fine.

VM's and Dockers were running from the cache disk. so atleast that hadn't failed.

CCTV recordings in a 2nd pool, mirrored had no issues.

 

All WD disks were attached to the onboard SAS controller of my Asrock rack EPYCD8-2T.

The purple disks for my CCTV were also connected to the same controller, but on the 2nd port / breakout cable.

So we wait for the replacement disks. 2x 4TB seagate drives. Figured i would take a different brand, making it easier to recognize.

 

Disks come in, turn the server off. Replace the supposedly failed disks and start the machine up.

Array started offline, as requested, so i could add the 2 new disks to the parity and start rebuilding.

Rebuilding starts, data is still available thru emulation. 

 

And then... error galore. every bit read is being errored. ohh shhhh....t something is wrong.

image.png.f25992b392a92b6c36549ed8f92bf853.png

I let it run for a bit longer, figuring maybe its a parity thing, but rebuild drive speed drops from 80mb/ps to the kbs.....

image.png.0c4f1997381b14556357ecff34ecb351.png

Alright, this isn't right. something else must be wrong. At this moment i also noticed that data that was on 1 disk (my collection of movies i gathered over the years) was no longer showing. 

So i hastely make an backup of my most important files to an USB disk, just to be save. So these are save.

 

Turn off the machine and check all the connections once again. Nothing seemed out of place, but something must be wrong.

So since all effected disks where in a HDD tray caddy (for hot swap purposes), i figured maybe something is wrong with that thing.

I remove the caddy, place all the drives in a regular hdd tray, connect them directly to the power and data cables and boot the machine back up.

 

Re-starting the parity build, by swapping the new disks per slot and formatting them. After about 2 hours, i reach the 500GB mark where previously all the errors started appearing. No errors this time. The parity rebuild continues, all seems fine.

image.png.7da340178c6dfc95c5a298ad60711fdf.png

After about 5 hours, something happens. The one data disk that wasn't malfunctioning earlier, now all a sudden has errors and is being emulated.

 

The parity build process changes in speed, from 80mb/ps on average, to 1,2gb/ps, 2.3gb/ps, 4.6gb/ps and even topping out around 8gb/ps.

image.png.98dc095e93380a270901e643edd060ec.png

Now we all know, sata drives can be fast. But no way spinning disks on Sata are THIS fast.

But the rebuild completes shortly after that. Since i had a hunch, i checked the network drive... Even tho accessible, only the shares on the cache are still present. All while the parity disks are up, as well as the 2 new disks, and then the disabled 3rd disk.....

 

So at this point, i basically figured i lost all my data.

So . . . we reboot. Cause . . . i haven't tried turning it off and on again. right?

Now, unraid wants the same disk on position 1 and 3? Daf...q?

image.thumb.png.08bf2d371c62e3d01576dc74a6f2756e.png

When trying to force the correct disk, it gives me the 'wrong' message.

 

Turn the machine off. Hook all drives up to my HBA which only uses 1 breakout for my SSD's connected for VM's and aren't showing any  issues. And turn it back on. Hoping this would solve the issue. But the same thing happened.... crap....

 

Well, then there is only 1 thing left to do right? New config...

So we do new config. assign all drives where they belong. Start the array, and start rebuilding....

And this is where i am at right now. No disks are being emulated. Only the shares on my cache are showing and unraid is clearing disk 2 and having little to no data on disk 1 and 3.

 

But, i'm still having a little bit of hope....

According to the header, i'm still using about 34% of my array, which was the state before all things happened.

image.png.da1fe3e83320734d595452522d5c3218.png

 

So now we wait for disk2 to clear and hopefully rebuild data from the parity, but i'm not thinking ill be that lucky.

 

So . . . will i get my data back, any data? Where did i go wrong in troubleshooting? What should i have done, maybe a different order, different troubleshooting steps. Cause if i don't learn now, ill never be able to trust this setup again.

Link to comment
Just now, PeteAsking said:

As an aside, since each disk is independent in order to try get data back you can just plug a "failed disk" into another computer and see if any of the data is readable on it. That may help you in getting some data back.

I only have windows machines tho. Don't think they read XFS?

Link to comment
Just now, PeteAsking said:

I mean you should download a live cd like even ubuntu live cd and then its irrelevant if you boot off that what your OS on a different pc is.

Allright, ill give that a shot with the 2 failed drives i have laying abouts. They shouldn't have had anything happen to them other than what i described above.

Maybe i get lucky.

  • Upvote 1
Link to comment
Just now, PeteAsking said:

Also when you choose new config I think It wont delete data on a disk so if a disk had data on it it will still be available also so probably you are not in such a bad state as you might believe.

That is what i thought. But why is it clearing the disk now then?

Clearing the disk to me sounds like a format thing?

Link to comment
Just now, Caennanu said:

That is what i thought. But why is it clearing the disk now then?

Clearing the disk to me sounds like a format thing?

I must admit I forget the exact option when doing a new config, it could be there is a checkbox to tell it not to format or perhaps its just clearing the new disks, this I am less sure about. Hopefully someone more experienced chimes in but I do think you shouldnt give up on your data just yet.

Link to comment

Also people here are helpful so probably if you take it slow and let people ask the questions they need and not rush, then probably if the data is available somehow you will get it back. It sounds super scary but if you are calm this will provide the best chance of success and it sounds like there are certainly some things to try before giving up hope :)

  • Like 1
Link to comment
4 minutes ago, PeteAsking said:

Also people here are helpful so probably if you take it slow and let people ask the questions they need and not rush, then probably if the data is available somehow you will get it back. It sounds super scary but if you are calm this will provide the best chance of success and it sounds like there are certainly some things to try before giving up hope :)

Hope is all i got.

Ill try not to respond too quickly ;0

  • Like 1
Link to comment
1 hour ago, Caennanu said:

swapping the new disks per slot and formatting them

Formatting is NEVER part of a recovery. The emulation includes the file system, so when you format, you tell Unraid to empty the filesystem.

 

Hopefully you can read all the individual disks and recover your data, it doesn't really sound like you had real disk failures, more failure to read or write.

 

Posting diagnostics may shed some light on what's currently available, but since you've rebooted multiple times since this started, the original errors that started this are lost.

Link to comment
Just now, JonathanM said:

Formatting is NEVER part of a recovery. The emulation includes the file system, so when you format, you tell Unraid to empty the filesystem.

 

Hopefully you can read all the individual disks and recover your data, it doesn't really sound like you had real disk failures, more failure to read or write.

 

Posting diagnostics may shed some light on what's currently available, but since you've rebooted multiple times since this started, the original errors that started this are lost.

Aye, thats what i thought. But the disks were unmountable (forgot to mention, sorry, been a hectic day), thus i had to format them :(

Would there have been another option when they became unmountable because of a missing file system?

 

Yeah, in terms of recovery. i'm currently using recovery software for the original failed disks. It seems to be working, but scan is still in progress (for the next 7 hours...).

And that is what i'm thinking currently. Or atleast hoping for, that it was simply an I/O issue, and that i can recover most of it.

 

Link to comment
8 hours ago, itimpi said:

Yes!    It is covered here in the online documentations accessible via the ‘Manual’ link at the bottom of the GUI.

Thank you.

If i had followed the advice of ChatNoir, i probably would have found that....

 

And good timing at that. The disk that had no issues so far and was clearing yesterday has just finished .. . . as you geussed it, unmountable.

Edited by Caennanu
Link to comment
11 hours ago, Caennanu said:

The parity build process changes in speed, from 80mb/ps on average, to 1,2gb/ps, 2.3gb/ps, 4.6gb/ps and even topping out around 8gb/ps.

FYI this will happen if Unraid is beyond the array redundancy, e..g, if you get errors on another disk with two parity drives but have already two disabled disks Unraid has no way of continuing the rebuild, so to not corrupt the disk (if it's a previous disk that is being rebuilt on top) it just skips ahead while there are multiple disk errors, hence the speed, it's just skipping, not actually writing anything.

Link to comment
7 minutes ago, JorgeB said:

FYI this will happen if Unraid is beyond the array redundancy, e..g, if you get errors on another disk with two parity drives but have already two disabled disks Unraid has no way of continuing the rebuild, so to not corrupt the disk (if it's a previous disk that is being rebuilt on top) it just skips ahead while there are multiple disk errors, hence the speed, it's just skipping, not actually writing anything.

Alright that makes sence. 

Diagnostics comming soon.

 

--- Diagnostics added

bigboii-diagnostics-20220623-0902.zip

Edited by Caennanu
Added Diagnostics
Link to comment

Does anyone have any recommendations for recovery software i can use on either ubuntu live booted from usb or via windows, where i hook up the drives via an HDD dock?

 

So far, i've been able to recover a decent amount of files. But, there's no folder structure present at all, its all individual files. 

Also, the softwares seem to read my VM image files internally, instead of giving me the image file itsself?

 

Since i had a spare key for EaseUS laying around, i'm using that. Installed on a windows laptop that i can run 24/7 for the time being. But i had the same 'issue' with hetman linux recovery software and recoverit. (did only the scans with the latter 2, since . ..  well limitations on free software)

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.