Jump to content

Dying drive - Are my thoughts correct?


Recommended Posts

A few days ago I was confronted with what everyone who has a server hopes will not happen but knows, that it some day inevitably will - a slowly dying drive.

 

Now, due to me not having my notifications correctly configured I noticed this only pretty late, as before I would only get the Array Pass, Fail or whatever notices and no warning with email or push. So while playing around with my server I noticed the S.M.A.R.T. values of one drive being bad. It had some relocated and pending sectors but worst of all it had 2 offline sectors. I must have seen these before but probably not noticed them or have just purposely ignored them because eh "it's probably fine -ish(?)" and after all, all the notifications I received just told me the array was fine as it passed (apparently unRaid sees nothing bad with it but okay) But now that I saw them again I thought, right, let's just replace that drive, no biggie right? 

 

Well, I ordered a replacement but due to the current situation it takes a while till your ordered stuff arrives. But I thought well as long as the server as no errors, as long as the drive stays green and so on. Well, the count increased to 6 and I thought uh, well I still have no errors so let's just hope it stays green and if not, I have my dual parity so it should be fine right?

 

So today was the day the server should have done the monthly parity check (and so far every one of those have been fine) and just as it started I received read errors from the disk (disk 2). Now that's when I panicked because I think that these checks are correcting ones. So this would mean that it reads all the disks and reads the errors on disk two and therefore has different data for disk 2 from the data in the parity and would "correct" the parity to the faulty data from disk 2. So my parity would be compromised and if I rebuild disk 2 I would lose the data from these offline sectors. Is this correct? Because I stopped the parity check as soon as I could and it gave me the message parity check canceled (0 errors). Then I shut down the server out of fear of corrupting anything. So my current plan is the following:

 

- Keep server offline until the new drive arrives (should be later today or tomorrow)

- Power on server and remove disk 2.

- Assign the new drive as disk 2 and let the rebuild run.

- Everything should be back to normal (right?)

 

Does this work? Would it be safe to run the server in its current condition? I know that as soon as unraid detects a write error it just takes the disk offline and  emulates the content but so far it hasn't done that so I think my parity should still be good (with the files in the bad sectors intact in parity). Last month parity check run without errors and I didn't have and unexpected shutdowns or anything and no write errors so it should be okay right?

 

Because another thing I noticed is that disk 3 also has some issues. It has some relocated sectors but no pending ones and no offline ones. I mean, I guess I'll need to replace that disk soon but at the moment it should be fine even for the rebuild of disk 2. And even should it fail during the rebuild it should still be okay because I have dual parity. Yes, the array would be unprotected then but I should not lose anything. Will the rebuild of disk 2 still go through okay in this case so that I can replace the other one after that is done, so that the array would be protected to rebuild the other one? 

 

So is my thought process correct on this one? And could I let the server run in its current state? I mean, it would be nice if I could but it wouldn't be a catastrophe if I couldn't... I would like to avoid losing data (as only the irreplaceable data is backuped but it would be a hassle to reacquire the other things). 

Link to comment

You should be OK to run as you have dual parity.

 

what I would suggest, though, is:

  • stop array
  • unassign disk2. Not necessary to physically remove it although you can if you want.  You may not want to disturb any of the servers innards until you have to while plugging the replacement disk.
  • start the array and Unraid will be emulating disk2 and show that slot as having no drive assigned.

at this point if everything is good you should then still be able to see the contents of disk2 via Unraid’s emulation of it.  If not, tell us what you are seeing.  the reason for this is that it is the contents of the ‘emulated’ disk that will be written during the rebuild process.   Keep disk2 intact in it’s current state until you have finished the replacement process as it provides a fall-back for data recovery if anything should go wrong during that process.

 

Points to note:

  • It is recommended that the scheduled parity check is set to be none correcting as you do not want a drive that is potentially returning bad reads to end up corrupting parity.
  • Unraid never fails a drive based on its SMART reports but it does send you notifications if any of the values it monitors are changing so you do not want to ignore notifications about them.
  • You can get a “array is healthy” status report despite SMART values indicating a drive should probably be replaced as the status report only indicates that at the moment Unraid does not think any disk is in an error state.
Link to comment

Ah right, and even if another one were to fail during rebuild it should still rebuild it correctly due to the dual parity?

 

The timing is a little bad, because I had planed to upgrade the server to 16TB parity disks and a new Mobo and CPU anyways on Friday but now the dying disk obviously is more important. And doing a parity swap now, no thanks. 

 

So steps are: - emulating disk 2 until replacement arrives.

- Rebuilding disk 2

- upgrading MoBo, CPU, Ram

- Replacing one of the two parity disks with the bigger one

- Replacing the other one

- If disk 3 gets worse replace that as well

- All fine and dandy (hopefully?)

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...