{SOLVED} First ever failed disk in unRAID...now what?


rmp5s

Recommended Posts

Had a drive die today.  Has the red x next to it.  Not sure what the best path forward is...

What I'm thinking is this:  I took it out of the array.  Then I can let the parity rebuild while I get a new drive (that one wasn't even very old!!) or fix that one or whatever.  Then I can put the new one in/put this one back in and let the parity rebuild again.

 

One concern I have is that it's an array of four 8TB drives with an 8TB parity...rebuilding parity will probably take a looooooooooong time...

 

Thoughts?  What are "best practices" for when a drive kicks the bucket in unRAID?

 

Thanks!

Link to comment
3 minutes ago, rmp5s said:

Interesting.  

So, basically, if you have a drive die, you better have a spare on hand to swap in ASAP?

I have a spare drive for each of my servers sitting on a shelf ready to go when needed.  They have been precleared to ensure the drives were good.

 

Do you have a SMART test on the disk that failed?  Do you know why it failed?

 

I  have had one disk failure in 9 years running unRAID on two servers.  That "failure" was actually due to a cabling problem that resulted in so many errors in a very short time that the drive was marked unable to write with a red X.  The drive was actually good and I rebuilt it onto itself once I fixed the cabling/SATA port issue that led to the drive "failing."

 

You may actually have a bad drive, but if you have some SMART reports for it you might be able to determine what went wrong.

 

My parity/rebuild process takes about 16.5 hours with 8TB parity and data drives.

Link to comment
12 minutes ago, Hoopster said:

Do you have a SMART test on the disk that failed?  Do you know why it failed?

 

You may actually have a bad drive, but if you have some SMART reports for it you might be able to determine what went wrong.

I was trying to get some info out of the thing but haven't been able to yet.  I left the drive in there and just unassigned it so I can run some tests and stuff on it.  I'll post my findings.

 

12 minutes ago, jonathanm said:

Yes. If you only have 1 parity drive, you can only have 1 missing drive. Any further failures will result in complete data loss from all missing data drives, including the first failure.

Right, I knew I only had 1 drive that could die.  I just thought you could lose a drive, move the data onto the remaining drives, then add another drive later.

 

Dunno if the "NAS drives" are worth it over regular drives, but I've got a new 8TB IronWolf NAS drive on the way.  It was only 50 bucks more than the one it's replacing so hey...why not...lol

Edited by rmp5s
Link to comment
18 minutes ago, Hoopster said:

You may actually have a bad drive, but if you have some SMART reports for it you might be able to determine what went wrong.


I kinda think it's dead...lol  

I can't even get Unassigned Devices to mount it...I click mount and the screen just refreshes real quick.  That's it.  I'll take it out and plug it into my desktop and have a look around. If it's not dead dead, I'll format it and put it back in but set it as excluded for a while...see if it dies again.  I don't trust it!  lol

Link to comment
52 minutes ago, rmp5s said:

If the drive is dead, how are you supposed to move the data off of it?

When a drive fails, if parity is accurate at the point it drops, then the only indication will be the red x and the notifications. The data slot should still contain all the data as emulated by parity. Navigating to /mnt/diskX will still work, and all the files on the drive will be there and readable. The disk itself isn't being accessed, the data is being generated by all the other drives that are still working.

  • Thanks 1
Link to comment
3 minutes ago, jonathanm said:

When a drive fails, if parity is accurate at the point it drops, then the only indication will be the red x and the notifications. The data slot should still contain all the data as emulated by parity. Navigating to /mnt/diskX will still work, and all the files on the drive will be there and readable. The disk itself isn't being accessed, the data is being generated by all the other drives that are still working.

 

Ah, I see what you mean.  That's pretty cool, if somewhat convoluted.  

 

2 minutes ago, jonathanm said:

Formatting while the drive is outside the array won't do anything, whatever is being emulated by the rest of the disks will be written back to the drive when you put it back in.

 

Right, and that would be fine.  There was nothing on it anyway.

How can I "reset" the thing then?  So I could put it back in and not have a red x anymore?  I'd like to do so as a test just to see if the thing is really toast.

Link to comment

I clicked both of the SMART self-test buttons for that drive when it was part of the array and neither did anything...the log and history and all that didn't do anything other than tell me to use a command and that command...well...I didn't really feel like going down that rabbit hole at the time...


image.png.d07b72f3049c1dac8f6dbdd9e017a1ba.png
 

Are the logs stored somewhere?  It runs these tests automatically periodically, doesn't it?

Link to comment
  • rmp5s changed the title to {SOLVED} First ever failed disk in unRAID...now what?

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.