Jump to content

Repairing Failed Array issue


Recommended Posts

Hello All,

 

While I've managed a number of NAS servers over the years I'm currently experimenting with an UnRaid server in preparation for deploying my first. (To make sure I know how to do the common/important things I need to do.) I'm actually testing the final thing, one that I hope to never need to do but if I do want to have done it once before, a disk failed. I'm using the actual MB I plan to use for the system but some old drives I had lying around and all data is just coped from another server so if I loose it oh well.

 

The setup. I installed a totally clean 6.8.3 OS, no plugins or scripts. I built an array from two 500GB drives and a 750 GB drive I had lying around, they have high hours but clear SMART data and to my knowledge are working normally. (The 750 is parity making a 1TB data array) Then loaded it with 700 GB of data. 

The simulated failure. I shut the system down and unplugged one of the 500 GB drives and installed a 4TB drive (bigger then the parity drive)

 

What I was expecting, and now realize UnRaid can't do (why I play before I deploy), was it would rebuild to the drive but only utilize the first 500 GB (maybe 750 GB) of the drive and the other 3 point something TB would be ignored.

 

I read about the 3 drive swap stuff and I'm trying it. Problem is for some reason the 8GB of RAM in the system fills up before the copy gets to 10% and the system crashes, can't get to anything via web or terminal cmd line (haven't tried terminal GUI yet.) Posting this before I head to bed, I'll let it run overnight see if the copy finishes despite no UI of any kind.

 

1) The system is crashing Help! Feels like some sort of memory leak, I watched the system RAM usage creap up to 100% then got a bunch of errors similar to this in the web interface. (Fatal error: Out of memory (allocated 2097152) (tried to allocate 53248 bytes) in /usr/local/emhttp/plugins/dynamix/include/Markdown.php on line 1995)

2) Is it possible to copy the parity data over to the new drive with all the other drives being powered down? If this were to happen for real I would rather the other drives sit there with no power while the copy is done so they have little to no chance of also failing while I first have to just copy the parity over to a new larger drive?

 

FYI: This is simulating a real scenario that happened to me once. Replacing the failed drive with a same size one was not possible so I had to go bigger, in that system it did what I expected and ignored the additional space until after slowly installing a larger drive at a time and rebuilding they were all updated then the space became available. I was hoping UnRaid would behave the same but expose the data once the Parity disks were updated. But first priority has to be getting back to a healthy (if under utilized) array ASAP because ya never know when a 2nd/3rd failure might occur.

 

Edited by davep1553
Link to comment

In Unraid no data drive can be larger than the smallest parity drive.

 

The procedure you want to use in your scenario (upsizing parity drive and then using old parity as a replacement data drive)  is what is known in Unraid as "Parity Swap".   The steps to achieve it are described here in the online documentation accessible via the 'Manual' link at the bottom of the Unraid GUI.

Link to comment

  Thank your for the replies.

2 hours ago, trurl said:

Parity swap as in the link is better than what you expected it to do. 

 

That is a matter of Opinion. I am of the believe that when the raid/array is degraded #1 priority is to get it back to not-degraded ASAP. If you don't fully utilize a drive in that process oh well.

 

Unfortunately I already have that link (it's what I'm attempting) and

1) It is resulting in a system crash

2) The data drives are online and spinning while I'm doing it, therefor they are wearing, therefor they could have a failure. 

 

I'm OK with having to copy the parity to the new disk first but 1) I need the system to not crash when doing it and 2) would VASTLY prefer to be able to do it with all the data drive unpowered.

 

P.S. Just letting it run after the crash, the parity data is not copied so either 1) I need a new version of unRAID, 2) a different mechanism in unRAID to copy the parity over or 3) I need software other then unRAID to copy the parity information to the new drive.

Edited by davep1553
Link to comment
23 minutes ago, davep1553 said:

1) It is resulting in a system crash

I think we are going to need more detail on this as this is not normal.  I, personally, have never had a system crash using the Parity Swap procedure and cannot remember any recent posts about this happening to other people unless they already had other hardware problems (e.g, RAM,  power supply).

23 minutes ago, davep1553 said:

2) The data drives are online and spinning while I'm doing it, therefor they are wearing, therefor they could have a failure. 

They are not being accessed while the parity is being copied so they can be spun down.  They do need to remain online though so must remain powered even if not actually spinning.  When the rebuild of the data drive starts then yes all drives are being read as this is required for the rebuild of the failed drive to operate.

Link to comment
4 hours ago, itimpi said:

I think we are going to need more detail on this as this is not normal.

I posted what I had in the first post.

15 hours ago, davep1553 said:

1) The system is crashing Help! Feels like some sort of memory leak, I watched the system RAM usage creap up to 100% then got a bunch of errors similar to this in the web interface. (Fatal error: Out of memory (allocated 2097152) (tried to allocate 53248 bytes) in /usr/local/emhttp/plugins/dynamix/include/Markdown.php on line 1995)

The crash happens within a few minutes of starting the parity copy, the copy was under 10% when the crash occurred

Once it crashes I can't do anything except reboot via power button, but if there is a log somewhere I can grab it.

 

I have now tried it with the system loaded into interface GUI mode. When the crash happened Firefox closed and all mouse functionality stopped so I couldn't check any of the other menus.

 

Are there any 3rd party tools that people have successfully used to clone the parity data to a new disk?

Link to comment
26 minutes ago, davep1553 said:

I'll give safe mode a try but I'm very doubtful since there are no VMs or docker images setup.

SAFE mode just starts without plugins so if you don't have plugins it shouldn't make any difference.

 

Possibly hardware related.

 

Go to Tools - Diagnostics and attach the complete Diagnostics ZIP file to your NEXT post in this thread.

Link to comment

I'm off to bed but thought I'd write a quick update. 

 

I'd seen the failure 3 times. I rebooted into safe mode and the replacement drive wouldn't show up. I pulled the drive and tested on a PC and windows wouldn't detect the drive either. Not sure what happened because the drive is still spinning, something must have gone wrong with the PCB, I have another of the same model maybe I'll try a PCB swap latter. 

 

Anyway I garbed another 4TB drive I have (this one has SMART failures but it still works for now). No safe mode and the copy is running slower then all get out but it is running. It's at 45% about 5.5 hours in and the RAM usage is down around 13%. I really hope some developer sees this and deals with the very crappy way the software failed for a bad drive.

 

I would still like to know if anybody has had any luck using 3rd party software to copy the parity information to a new disk. I'm thinking I'll try myself soon because I was overall liking everything about unRAID until this last and final test really left a sour taste on it for me.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...