Extremely slow preclear + Memory allocation problem + Failed Drive?


Recommended Posts

Hi guys, I have an interesting problem that I am facing.. It's going to be a long story.. lol

 

So, last week, I was able to get some great deal on ram (2 x 8GB for $39.99) and a 3tb hdd ($90). I didn't have time to do any setup for them last week so I put them in the machine this week.

 

I replaced the ram, and added the hdd. Naturally, I went ahead and precleared the 3tb hdd. At this time there was no problem at all.

I was running CouchPotato, maraschino, SABnzbd, SickBeard, MySQL Server, Subsonic, Transmission. They were all active.

 

After preclear was done, I should have checked the parity first.. but I just went ahead and replaced the parity drive thinking everything will be okay.. and it started Parity-Sync. While it was in the process of parity sync.. again I thought everything will be okay, and went ahead and started preclearing the old parity drive (2tb) to use it as a data drive..

 

This is where everything started.

 

During the parity sync, one of my hdd basically crapped out. It didn't have the red ball but its errors column was showing over 10,000,000 errors and parity sync kept going... It still had that green ball.

I had no idea what to do, and thought maybe i should wait, so I waited until the parity sync was done with bunch of errors. error was shown only on the bad drive.

However, I noticed that preclearing the 2tb hdd was extremely slow. Writing zero (step 2) was getting like 1.1MB/s.

 

I wasn't sure what I had to do, so I just waited till the parity sync was done.

When it was done with the parity sync, preclear's speed started going up again, so I thought there has to be a problem with that failed drive, but it was still showing "Green ball" after the parity sync.

 

When I stopped the array, the failed disk was showing "unassigned" and it was saying the hdd was missing. I know i didn't touch anything in the machine, so i thought the hdd must've gone bad, so I turned off the machine (preclear wasn't done), plugged out the bad hdd, turned on the machine, and started the array without the failed disk. Luckily, the failed drive was old low capacity (500GB) hdd that didn't have any data yet, I was okay with just removing it)

 

Since I turned off the machine when the preclear wasn't done, I restarted the preclear..

 

While the array was started without the disk, I was actively downloading something from SABnzbd, and the preclear was running.

 

This is where another problem occurred. I was not able to access the server at all. NO webgui, no telnet. nothing. The server wasn't getting the ip address from the router. The machine was physically still on.

On the server, I checked ifconfig and it seems to have the ipsetting..

 

First, I thought it had to do with the failed hdd, so i turned off the machine again, completely removed the failed hdd, then turned on the machine again.

Since I added new ram as well, i thought it could be a problem with ram. so I ran memtest with the new ram. ran it for about 2 - 3 hours, did not have any problem and it passed 1 test.

i exited out of memtest, rebooted, then I was able to access the server again..

Then again I started preclear (under screen) and started downloading some stuff from SABnzbd again..

 

However, this morning, i lost the access to the server again. ifconfig returned the correct address, but the router wasn't showing the machine as a connected device

so i thought i should try renewing the ip address.. so i ran these commands:

 

ifconfig eth0 down
ifconfig eth0 up

 

when i ran ifconfig eth0 up, i got this error

 

SIOCSIFFLAGS: cannot allocate memory

 

so i stopped preclear then exited out of 'screen'. when i ran ifconfig eth0 up, it set up the right ip address and i was able to access the server without a problem.

I saved the syslog when this happened.. so the attached syslog is from this point. when i look at the syslog, it does seem like i have a lot of memory allocation error.

 

I am guessing that there could've been a problem with parity-sync with failed drive.. so i ran the initconfig command to reset array.. and started rebuilding the array.

I also thought there could be a problem with my plugins, so I disabled all plugins except for MySQL and subsonic.

And I am running the preclear on 2tb again to see if I get that extremely slow preclear speed.

 

so.. that's the story. lol

 

in case you dont wan't to read my story.. lol here is my summed up question for you guys..

 

1. What would cause preclear to run at 1MB/s? If the drive that i'm preclearing is bad, would preclear stop in the middle? or would it keep going and use lots of memory?

2. Can parity-sync with failed drive cause a problem like this? or does parity sync even work with a failed drive?

3. This was the first time that a drive failed on me.. what happens when a drive fails while the array was started? does it show bunch of errors in the error column like i saw? or just the redball?

syslog_11292012.zip

Link to comment

Im not an expert as Im very new but it sounds like a loose cable.

 

If Im correct unRaid will only red ball a drive on a write failure so if you were just calculating parity as long as the bad drive is not the parity drive it wouldnt red ball it as its only reading from the data drives.

 

I see.. I will have to check cables when I get home.

 

The drive itself is a bad drive though. it was causing a problem in my dad's desktop (problem with reading there too) but i just took it out and put it in the unraid machine. probably not a good idea haha.

Link to comment

Okay.. came home.. made sure all cables were secured.

tested copying some files to my desktop from the unraid server, and then it started generating crazy errors in the syslog.

 

and then.. i found this.

 

http://lime-technology.com/forum/index.php?topic=3999.0

 

I set min_free_kbytes to 8192, then BAM no memory or page allocation errors...

 

i am not too familiar with this side of linux.. what does min_free_kbytes do?

 

also, found this page:

 

http://dd.qc.ca/people.redhat.com/kernel/min_free_kbytes.html

 

Looks like it should be set to 16384.. so i set min_free_kbytes to 16384 and then the whole performance of unraid machine improved. writing speed is  between 20 mb/s and 40 mb/s without a cache drive, and read speed is over 80 mb/s

 

Before I changed min_free_kbytes, i checked the value.. i dont remember what it was but it was in like 3k value..

 

so then i have new questions..

 

Why was min_free_kbytes set so low? or is it safe to set it at 16384?

Link to comment
The drive itself is a bad drive though. it was causing a problem in my dad's desktop (problem with reading there too) but i just took it out and put it in the unraid machine. probably not a good idea haha.

VERY bad idea. Unraid parity protection only works if there is only one bad drive, so you are putting all the rest of your data at serious risk by running a known bad drive.
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.