Turning a 4.7 server back on after a year with a failed drive. [SOLVED]

GileraGFR · February 16, 2019

Just over a year ago a 2TB data drive failed in my 4.7 6 disk array - one parity dive and no cache. I powered the server off at the time until i had the time to investigate and recover, this is where i am at the moment.

My question is not exactly straight forward but i'm more after other peoples opinions on what is the best way of turning things back on so to speak? What are other peoples opinions on whether I should first remove each drive and copy the data off first or just stick in a new 2TB drive and allow it to rebuild?

I'm also worried that the 2TB drive i have purchased might be bigger than parity, i've posted earlier about this and was advised that if it is bigger even by a byte then it won't work as it is bigger than parity drive - is it possible to confirm that before i power anything on? I've noticed on some drives manufacturers put how many sectors etc are on the drive but these drives don't have that info and not sure if that is usable info anyway?

There is some data on there that i care about which i did have a backup of but have since lost most of it - i'm in limbo at the moment as i don't know what i could potentially lose, Schrodingers Cat springs to mind here, i'm almost happy to leave it powered off and for my data to both exist and not-exist hehe.

My other thought is about whether i should upgrade before rebuilding as new unraid is obviously much better from what i have read - from what i have learnt in my IT career though i'm reluctant to upgrade while in a failed state.........really want others opinions though on this.

I have a new server pretty much stood up and ready to go if i need somewhere to copy data etc so i think i'm as ready as i'm ever going to be but i am still super nervous about it as i have been for a little while lol

Any input from anyone i will be very grateful for.

Cheers

Edited February 17, 2019 by GileraGFR
Solved

JonathanM · February 16, 2019

Since you already have one failed drive, how confident are you in the health of the others? Are you sure it was a drive failure and not a communication failure? Unraid red balls a drive when a write fails, whether or not the drive itself is ok.

The success of recovery depends on all remaining drives to perform flawlessly.

You definitely DON'T want to pull drives at this point if you want to recover what's on the failed drive.

If it were me, I'd prepare the new server to receive data, fire up the old server and copy everything critical first, then attempt a rebuild.

Don't disassemble anything until you have a better picture of what state all the old drives are in.

GileraGFR · February 16, 2019

Thanks for replying jonathanm.

Since posting I have had a look at the SMART output of the drives from the last log file and they seem to be OK but obviously I am still not confident since the amount of time has passed.

It does look to be a drive failure from the log files but this is only from my understanding.

I think you're right though, i should prepare to copy critical data and then fire up the old one - I have pulled the failed drive though, do you think i should replace or would you need to see log files to determine if it was a failed drive or coms failure yourself?

JonathanM · February 16, 2019

As long as parity is valid and only one drive is missing, the server should act normally with respect to the data on all the drive slots, all your data should show up without rebuilding to the new drive but you will be at risk of another drive failure. I wouldn't replace it just yet, as it's not clear what state things are in.

Do you have the logs from the period where the drive was red balled? If the server was rebooted before the logs were collected, they won't show the event.

GileraGFR · February 16, 2019

Thanks for confirming.

Yes, it hasn't been booted since and it looks to have a complete log from when the drive started going bandy - they're on my PC at home which I'll attach when i get home.

Just going to purchase a couple of drives for the new server and i'll update asap.

GileraGFR · February 17, 2019

New server powered up.........old server powered up, with just the missing failed drive red balled, everything else was green so i started the array and have copied critical data across

Currently chewing through the rest of it now, to say i'm happy is an understatement.

Thanks for your time jonathanm it's much appreciated.

trurl · February 17, 2019

13 hours ago, GileraGFR said:

Since posting I have had a look at the SMART output of the drives from the last log file and they seem to be OK but obviously I am still not confident since the amount of time has passed.

It does look to be a drive failure from the log files but this is only from my understanding.

You should post these logs you have. I don't think normally the syslog has any of your SMART reports so I'm not sure what you are basing your opinions on.

GileraGFR · February 17, 2019

Yeah sure, link following - you'll see the drive failing at the start and then the shutdown being issued. The SMART outputs are there so maybe it was a something done on v5.

https://pastebin.com/Cxv4FZcH

Two drives aren't perfect but they'll certainly last for my means......and either way it doesn't really matter now anyway since i have my critical data and i'm just copying across any non-critical now.

EDIT: The SMART reports haven't changed either since that last log at power down - and note that date is last years in the log not this years.

Edited February 17, 2019 by GileraGFR
Additional info

trurl · February 17, 2019

In future please just attach, zipped if necessary, instead of linking to an external site.

Turning a 4.7 server back on after a year with a failed drive. [SOLVED]

Recommended Posts

GileraGFR

Link to comment

JonathanM

Link to comment

GileraGFR

Link to comment

JonathanM

Link to comment

GileraGFR

Link to comment

GileraGFR

Link to comment

trurl

Link to comment

GileraGFR

Link to comment

trurl

Link to comment

Join the conversation