Fix Common Problems utility


Recommended Posts

Hi The fix common problems utility has identified imminent drive failures on my older server. In fact three drives, one of which is the parity drive.

 

The server runs 6.1.9 and the format is RFS, my newer server is XFS. I realize I need to rebuild the parity drive as the first part of the project.

 

Is there any value in changing the two data drives to XFS ? as this would not be an easy option as there is not spare slots? 

 

I am also seizing the opportunity of increasing the drive sizes and have got the drives ready to start the work.

 

Any pointers for this work would be greatly appreciated, the server has been shut down since the notification mindful that i don't want the data drives failing before i have rebuilt the parity drive.

 

  Peter

Link to comment

In such a situation, with two failing data drives and a failing parity drive, the parity drive is the least important of the three, by far. What you need to do as a matter of urgency is to copy as much of the data off the two data drives to somewhere safe (your other server, perhaps?) before they fail. If you try to replace your parity drive in this situation one or other of the two sick data drives may fail before the task is complete and you will lose data. Remember, single parity only protects against the failure of one drive - all the rest need to be in good condition to effect a rebuild.

 

Post your diagnostics - it might be apparent from the SMART information that one data drive is in more imminent danger of failing than the other and should therefore be given priority.

 

Regarding changing disk formats and expanding the array, I wouldn't even think about doing either until I have got my data safe.

Link to comment

What John_M said.  Or similar anyway.

 

 

Purchase 3 replacement drives.  Disconnect all drives particularly the power to them (after noting there serial numbers and drive slots in unRAID of course) - remove them from the drive slots if necessary.  No need to give your good drives more power on hours to follow my procedure.  Do a new config now that all drives are disconnected.  Add the three new drives to your server in the three slots you made available and preclear them to make sure you have no DOA drives.  Once that is done add two of them as data drives but don't add the third drive as parity yet.  Now put in the bad data drives one at a time in your server in another empty slot but don't add them to the array use Unassigned Devices to mount them and copy the data to your two new drives.  Once you have all of the data copied off the bad data drives remove them.  Put your remaining drives that are still good back in the server and add them to the slots they had before.  Now add the new parity drive that has been precleared to the array as the parity drive and build parity on it.  Then check the parity once the build is complete.  You are now up and running and the bad drives have only been on long enough to copy the data from them.

 

 

Note: you will be completely unprotected from a drive failure while performing this operation.  But if your drives are in "imminent drive failure" state I wouldn't want to have them on any longer than I had to and being unprotected with the rest would be OK for me.

 

 

 

 

Edit: Forgot to mention shut off your Dockers and VMs while performing this if you have any - may need to shut off your plugins too depending on what they do.  Your paths will be invalid until the you add the other good old drives anyway.

Link to comment

Thanks for your replies, Bob the process looked good but too complicated for me. The current state of play is as follows, the parity drive showed the highest figures in terms of relative danger so I replaced it with a new larger Disk and the system rebuilt it.

 

The next priority again in terms of figures was Disk 2 I replaced it with a new larger drive and it was rebuilt. The problem I have at this point is, it has a red X against it and is disabled. I have run a file system check thinking there might be some file corruption but it showed there was not.

 

I have not run a parity check of the machine for some time as i wanted to preserve the hardware pending the replacement of the drives.

 

I have attached tower diagnostics and sys log for expert eyes.

 

Thanks

 

Peter

tower-diagnostics-20160807-0742.zip

tower-syslog-20160807-0743.zip

Link to comment

Disk 2 was having ATA errors since the beginning of the rebuild, it eventually had a write error and rebuild aborted, SMART looks fine, you may want to check/replace both cables and try a new rebuild on the same disk, after checking cables, stop array, unassign disk2, start array, stop array, re-assign disk2 and start array to begin rebuild.

 

Was the disk precleared/tested? If it fails again it's probably a bad disk despite the healthy looking SMART, but my first guess would be a bad/flaking cable.

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.