Parity Rebuild Slow/Locks up


Recommended Posts

I had a scare with my unraid server were one of the drives recognized as not present. I pulled the drive, and inserted a new one to start a parity rebuild. At first, I could not even select the new drive. After reboot, same issue. It would not show as a available device. I took the server apart and verified cable and raid card were all plugged in and seated correctly.

 

I am now able to get into the server, I can even begin a parity rebuild. Once the parity rebuild begins it starts out fine, but then it will slow to 22Kb/sec. I am not sure what is going on.

 

This is my Current Set-up

NORCO-4220

SUPERMICRO|MBD-X7SBE

SUPERMICRO-SAT2-MV8  X 2

PSU 750W

PDC E5200 2.5G 2M R

Unraid 4.7

 

I have attached my log. I looked through it, but couldn't make sense of what might be causing the issue. Any help is appreciated.

 

Terminal_Saved_Output.txt

Link to comment

Can you tell me what would lead you to believe that? Is there something in the log?

 

This setup is in a Norco 4220, so the backplanes are redundantly supplied power via two molex connectors. I have that wired from one psu, but each backplane gets power from two different cables.

 

It would be nice if there was some reason behind your logic. Don't get me wrong, I appreciate the help, but I want to know the reasoning before I chase possible ghosts of power supplies.

 

I have a extra breakout cable so I am about to test the cable out.

 

Any input is appreciated, thanks again.

Link to comment

Mem test was ran and passed with no issues.

 

There are actually a number of things left, in my opinion. It could be the SAT-MV8 cards, backplane, breakout cables, or possibly a bad parity drive.

 

Am I wrong in thinking that if the parity drive is failing, a rebuild could lock up at the bad sectors? I would think that unraid would recognize it, but IDK.

 

Again though, what is the reasoning behind a bad power supply. The unit begins parity rebuild at good speeds, then locks up at the same point every time.

 

Is there nothing in the log that can help? I am going to try to post another log tonight after I check all the power supply cables. Since I don't have a spare PSU, I will also try a new breakout cable.

 

Again, dgaschk, THANK YOU for replying. I do not want to come across as a A-hole, but want to understand the logic behind the suspected Power Supply. Either way, it give me one more thing to consider that I hadn't previously, and I will check it.

Link to comment

Update: I went back and verified all the cabling, power cables SATA everything looks good. I even removed the cache drive to free up some power, just in case the 750 wasn't enough for all the drives, a good amount of 7200RPM drives.

 

 

The system will boot up, be accessible from the webgui and also show as waiting for a user to login on the local monitor. It starts a parity rebuild and then locks up at about 4.6%. The web gui will be completely inaccessible, as well the local monitor will show a bunch of stuff and the login window is now gone. I don't think there is a way to export or access a log, as the machine does not respond to keystrokes. Is it possible that the system is panicking? The system also seems to lock up when I stop the parity rebuild. I was at the point where I was just going to try to stop rebuild, cut my losses from that one drive, and try to take the array online and copy all the data off.

 

I could go buy a new PSU but would rather not if it isn't PSU related.

 

 

Any help is appreciated, I am at a loss for ideas at this point. I have almost 20TB of data that I would like to retrieve.

Link to comment

Here is a smart report for the drives, from what I can tell, they all passed.

 

FYI, I did actually go out and buy a new PSU. I got a 1200w power supply, so there is no question as to there being enough power.

 

The data rebuild got a little further this time, got to about 26%. I can actually start the array, without protection, and I guess I could copy the data off to a secondary server, but I would rather figure out what is going on.

 

dgaschk Thank you so much for the help.

 

Terminal_Saved_Output.txt

Link to comment

Here is a smart report for the drives, from what I can tell, they all passed.

 

FYI, I did actually go out and buy a new PSU. I got a 1200w power supply, so there is no question as to there being enough power.

 

The data rebuild got a little further this time, got to about 26%. I can actually start the array, without protection, and I guess I could copy the data off to a secondary server, but I would rather figure out what is going on.

 

dgaschk Thank you so much for the help.

 

 

1200W is complete overkill... and does not necessarily mean there is enough power for the disks.  What specific power supply did you get?

Link to comment

 

1200W is complete overkill... and does not necessarily mean there is enough power for the disks.  What specific power supply did you get?

 

I understand that 1200 watts is complete overkill. Since PSU was the only thing brought into question, I figured, ahh why not.

 

It is a corsair 1200w, single rail, modular, gold rated PSU. It seems fair to say that the problem is not with the psi, so that will be going back to frys.

 

At this point the parity rebuild will start, and will be screaming fast for a while, then it will crawl to a slow, and if stop rebuild once it has started to slow, the machine goes unresponsive and seems to panic. I can however, stop the rebuild, or just not start it, early on and then allow the shares to stay live. With the shares live I will try to copy data off.

 

I have a 4224 build that is currently pre clearing 8 drives out of 20 at a time. Since 8 are done, I will probably create a quick share and copy the data to them, then once the 4220 is "fixed", I will complete multi-cycle pre clears on the drives in the 4224.

 

 

Also, can you tell me what part of the smart tests show that there is a problem with the drives?

Link to comment

Ok. So I finally got home and it looks like sdp is my parity drive. How will I go about taking care of this? Remember that I have 1 1tb drive that the system is trying to rebuild every time I start it. So I have 1 2tb parity drive that is failing, 1 1tb drive that is attempting to be rebuilt, and 1 1tb drive that might be failing.

 

I do not think that I have a 2tb lying around to replace that failing parity drive.

 

 

What is everyone's advice? ???

Link to comment
  • 1 month later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.