Jump to content

Monthly parity check causing a crash... (Beta 11)


Recommended Posts

Oh..  and here is the script that's called from the "go" file

 

 

#!/bin/sh
crontab -l >/tmp/crontab
grep -q "/root/mdcmd check" /tmp/crontab 1>/dev/null 2>&1
if [ "$?" = "1" ]
then
    echo "# check parity on the first of every month at midnight:" >>/tmp/crontab
    echo "0 0 5 * * /root/mdcmd check 1>/dev/null 2>&1" >>/tmp/crontab
    crontab /tmp/crontab
fi

Link to comment

Oh..  and here is the script that's called from the "go" file

 

 

#!/bin/sh
crontab -l >/tmp/crontab
grep -q "/root/mdcmd check" /tmp/crontab 1>/dev/null 2>&1
if [ "$?" = "1" ]
then
    echo "# check parity on the first of every month at midnight:" >>/tmp/crontab
    echo "0 0 5 * * /root/mdcmd check 1>/dev/null 2>&1" >>/tmp/crontab
    crontab /tmp/crontab
fi

Personally, I would run i in NOCORRECT mode... that way, if there is an issue, YOU can decide which drive to trust, the parity drive, or a misbehaving data drive.

 

The better command to run would be:

crontab -l >/tmp/crontab

grep -q "/root/mdcmd check" /tmp/crontab 1>/dev/null 2>&1

if [ "$?" = "1" ]

then

    echo "# check parity on the first of every month at midnight:" >>/tmp/crontab

    echo "0 0 5 * * /root/mdcmd check NOCORRECT 1>/dev/null 2>&1" >>/tmp/crontab

    crontab /tmp/crontab

fi

 

The command to run a parity check has not changed in many years.  If you are crashing, either you have a bad disk and accessing it fills the syslog using all memory until you crash, OR you have bad RAM in your server, or a power supply unable to supply all the disks, or a bad splitter, or a loose connection that vibrates loose when all the disks are shaking the server case, or a bad disk controller...  It is most certainly some bad hardware in your server.

 

You need to find it and fix it, since the process of re-constructing a failed drive is 99.9% identical to that used in a parity check.  If you cannot complete a parity check, you probably cannot re-construct a failed drive either.

 

Link to comment
If you cannot complete a parity check, you probably cannot re-construct a failed drive either.

 

Strangely, I had exactly this situation while my LSI2008-based controller was throwing errors (fixed by firmware upgrade).  The non-correcting parity checks would always fail, but several parity rebuilds never failed.

Link to comment

I don't think it's hardware...  It ran fine until I upgraded.  And It happens right away.  Like right at 2AM right with it kicks off. 

 

I can run a parity check manually with all drives spun down and it's fine (I believe)

 

I was hoping that it had something to do with the beta version!

 

Jim

Link to comment
  • 2 months later...

Well...  It happened again.  I think I'll disable this and have it e-mail me a reminder once a month to do it manually! :(

 

Maybe when I have some spare cycles I'll try it manually from the command line mimicking the cron script.

 

*pout*

 

 

 

Does it works if you do it manually? Have you tried the unmenu installed version?

Link to comment
  • 1 month later...

Ok..  It's still doing this!

 

But today it also crashed while doing a regular parity check.  Sometime along the way..  it crashed.  But the difference is

I was also trying to do other things as well.

 

Today I noticed that UnRaid wasn't responding. hmm..  beginning of the month.  Probably the auto parity check!

So I power cycled and it started the parity check.  It seemed to be humming along.  So I decided to rip a DVD to

the server.  Sometime after that, UnRaid crashed again!

 

I have a 630W power supply (This one)

Which half my drives are on one 12V rail and the others are on another rail.

 

My issues started when I went to the 5.0 Beta  but I also added more drives at the same time!

 

So I'm scratching my head to figure this out!  There is a part of me that thinks my PSU just plain sucks!  But I don't want to drop

>$100 on a new PSU to test that theory.

But there is also a part of me that thinks the Beta my be causing my issues! 

 

Any thoughts on how to diagnose?  I can't go back to a non beta due to my inclusion of 3TB drives.

 

Jim

 

P.S. I have 10 drives (8 data, 1 parity, 1 cache) almost all green drives.

 

 

Link to comment

It is very likely the PSU. The one your using only has 20Amps for the disks and MB. The drives alone are using more than 20Amps. This is something that must be resolved. A disk rebuild will fail at some point and a second drive crashed in the process.

Link to comment

I would suggest that you open up the case and carefully inspect everything inside for dirt and dust clogging up the the CPU cooler and the power supply.  Make sure that all case fans, filters, etc. are clean and nothing is blocking the air flow.

That's all good.  It went into a new case a couple months ago.  Everything is clean.

(problem occurred with old case as well - same PSU though)

Link to comment

Sorry, I somehow missed the balancing act that you mentioned.

 

Did you modify one of the cables? How did you determine that the cables are on different rails? The rail with the SATA connectors often also powers the MB reducing the available amperage by 4 or 5.

 

 

Edit ($24.99): http://www.newegg.com/Product/Product.aspx?Item=N82E16817139026

 

For non-green drives ($49.99): http://www.newegg.com/Product/Product.aspx?Item=N82E16817139027

Link to comment

I actually have the drives split across two of the 12V rails.  So I have 5 on 1 and 5 on the other.  So each 5 drives has (in theory) 20A available to it (there are 4 20A rails total)

I don't know about your specific power supply, but almost all 4-rail multi rail supplies use one of the rails for the CPU power, one or two for PCIe video cards,  one for all the disks connectors, and one for the Motherboard.  TO say you are using two rails might be accurate, and might not be.
Link to comment

It originally used 1 rail for the MB, 2 for PCIE, and 1 for drives.

I rewired (not mickey moused.. soldered correctly)  half of the drives to use one of PCIE power rails (which I have no need of)

While I do not doubt your skills... my suggestion is to by a new PSU that is capable of handeling all of your drives.

Link to comment

I found a CORSAIR Gaming Series GS600 at work that I can borrow for a couple of days.  Right now it $82.48 with a $5MIR till tomorrow...

 

I'll give this a try tonight.

What a pain!  I'm not looking forward to swapping out the PSU!

 

But I guess I really shouldn't be cheap on the PSU (or any component) for a system that I need to have running 24/7 that

has things I don't really want to lose!

Link to comment

It originally used 1 rail for the MB, 2 for PCIE, and 1 for drives.

I rewired (not mickey moused.. soldered correctly)  half of the drives to use one of PCIE power rails (which I have no need of)

That's how I would have done it...  clearly you are more technical than 95% of unRAID users, as re-wiring a power supply is definitely not something just anybody will do.  (I have a 3-rail supply I'm planning to re-wiring in exactly the same way, but my original background is in hardware, and I can make almost anything I want from spare parts in the basement... I did not get involved in software until the early 70s.)

 

Most people with multi-rail supplies do not have your skills.    Your power supply is probably not the issue here unless you are running into a limit on how you re-wired it..

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...