6b12 Dying Drive - are these the right steps?


Recommended Posts

Hi-

 

I have a drive in my array that appears to be dying...there were 2 write errors (although syslog showed nothing to recover).  The drive was redballed and is no longer included in the array.  I don't believe the parity is currently valid and given the state I am reticent to have it recalculate. 

 

I have run short and long SMART scans and there appears to be only a single sector that might be problematic, but otherwise the drive is accessible and so is the content.  The SMART scans passed, but there were a few areas that were shown as warnings.

 

I've purchased a replacement drive and it is in the process of running the preclear script.  I know logically I should be able to format the new drive, and copy the contents of the failing drive to the new drive and then completely remove the failing drive from my system (and hopefully get it replaced under warranty).

 

What are the steps I should be taking from a specifics perspective for adding/removing drives, copying files, re configuring the array etc.

 

Thank you

Link to comment
The drive was redballed and is no longer included in the array

the drive is accessible and so is the content.

Are those two statements still simultaneously correct as of this moment? If so, and assuming the replacement drive is as large as the failed drive, and no larger than your current parity drive, then you only have to stop the array and assign the new drive to the red balled drive's slot, restart the array after confirming the rebuild operation, and unraid will recreate what is currently shown on the drive. Do NOT format it, the format is recreated during the rebuild.

 

When a write fails to a drive in unraid's protected array, that drive is immediately red balled, and all activity to that slot is calculated and emulated using ALL the rest of the drives, including the parity drive.  If you don't have valid parity at the moment the drive is red balled, then the contents of the emulated drive are at the very least corrupt, and most likely lost. Since you say the drive is red balled but you still can access the content, I believe your parity was intact, and is currently being used to emulate the failed drive slot. If another drive fails before you complete the rebuild, you will lose the data on both failed drives.

Link to comment

Thank you- so how would I tell if the drive is being emulated vs parity being corrupt?

 

Presuming it is emulated correctly and I can just replace the disk, would I be able to format the the new disk as XFS given the failing drive is RFS?

 

I was doing some research on XFS conversion a few days ago and basically what I found out is that there is no quick conversion from RFS to XFS.  (You destroy the data when you change file systems.) Similarly, I recall reading that when you rebuilt a disk using a parity reconstruction, the file system that is in the Parity reconstruction (RFS in your case) is what will be rebuilt. 

 

Now what you could do is (after preclearing the new disk) add it to your array, have it formatted in XFS, and then copy the data from the failing drive to this new drive.  You could then remove the failing disk, set up a new array to include this new disk and without the failing disk, and let it rebuilt parity.

Link to comment

Completely out Topic post!  I sure wish someone from unRAID (or one of the moderators) would start a 'sticky' post to compare and discuss the issues of RFS, XFS and  BTRFS so that all of the information could be collected in one post rather than spread across multiple threads under various topical headings.  While I have no doubt that it would soon grow so large that it usefulness be to considered questionable, a lot of some people would want to know would be a single thread and a lot easier to locate than in a gazillion ::) different threads!  Perhaps at that point, someone might create a WIKI entry to summarize everything.

Link to comment

Thank you for the insight, is there a way to validate if the drive is being emulated and backed by parity?

 

Second way.  Open Windows Explorer, Click on 'Network' in the left pane, double click on your server in the right pane, you should be able to see your shares in the right pane.  Do you see something like 'Disk 1, Disk 2, .... there also?

 

If not, back in the GUI on the 'Main' Tab (NOT the DASHBOARD), Double Click on Disk 2, Now click on the 'SMB Security Settings' Tab.  Setting 'Export to 'Yes' and 'Security to 'Public'.  Click on 'Apply' and 'Done'. 

 

Back in Windows Explorer, Refresh the window and you should see a 'Disk 2' folder.  Browse it and see if you can see any files. Can you open (or copy) any of them?

 

(It may take a while as all of your disks will have to spin up...)

 

PS--- If you see your files, You could (1) copy them off to a safe place and rebuild the disk or (2) simply rebuild the disk.

Link to comment

I presume the same as the screenshot I posted earlier, but I've got a new problem that emhttp appears to have tanked and I'm still running preclear...has anything been added for 6b12 to allow a more graceful restart of the web interface? (I've read the 5.x guides which appear to involve stopping the array among other things)

Link to comment

If the disk is showing as red-balled in the GUI and you can still access it via the user share then it is being emulated.  It sounds as if your parity might actually be good.  It could be shown as orange because it is being recalculated or something like that.

 

There is no way to restart the GUI except by rebooting the system.  This is the same in v5 and v6.  It was only possible to restart the GUI in v4.7 or earlier.

 

One thing that might be worth trying is to see if their appears to be more than one instance of the emhttp process running.    There have been reports of some people seeing more than one instance and in this case killing the latest one leaving the original running has helped.

Link to comment

So, if the drive is being emulated should I attempt just a straight swap with the new drive and let it get rebuilt?  Or am I better off copying the emulated data via the command line to the new drive? 

 

Would the following work-

Add new drive to the array configuration, format as xfs

Copy data via command line from emulated drive to the new drive

remove failing drive from array

reset configuration without the failing drive?

Link to comment

Do not see why it would not work, although you might want to reverse the order of the last two steps.  Do not forget you are running in an unprotected mode while doing this recovery.  At least you probably have the 'red-balled' physical drive to fall back on to attempt recovery if anything goes wrong.

 

Note also that there is some additional steps required to get back to a fully protected array.

  • Calculate parity using new drive set
  • Check parity to ensure it was written correctly.

Link to comment

So, if the drive is being emulated should I attempt just a straight swap with the new drive and let it get rebuilt?  Or am I better off copying the emulated data via the command line to the new drive? 

 

Would the following work-

Add new drive to the array configuration, format as xfs

I don't think you can add array members with the array in a degraded state. I would swap the drive and let it rebuild, then after you have done a non-correcting parity check and have a healthy array again, you can start the migration process. If the red balled drive is actually ok (passes a preclear cycle with no pending sectors or other badness) you could add it back and format it with xfs to use as your first xfs migration destination.

 

On the other hand, if you have full backups of all the data currently on the array, feel free to blaze new trails and experiment. Your stated method is conceptually sound, and may work if you can add an array member to an array with a red ball. I proposed a method extremely similar to migrate to xfs using only existing drives, it would involve temporarily using the emulated drive to write the files to the physical drive using a different format. The downside is you are running at risk during the entire operation, so another drive failure would pretty much guarantee the loss of data on both the failed drive and the drive being migrated.

 

Keep in mind that any mistakes while working on an array with a red ball almost surely will result in some data loss. I'd be more inclined to get the array healthy before I started playing around.

Link to comment

When you change out Disk 2, change the SATA data cable at the same time.  The SMART report show three 'UDMA CRC Error Count' errors.  These are errors where the data was read successfully from the drive but was corrupted by the time it reached the motherboard.  One source of this type of error is cheap SATA cables.  The other source is cross-talk between cables from tying the SATA cables together (to make a neat appearance).  You don't have a high number of errors but since you will already being working in there, you might as well address it now. 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.