unRAID issues after bad power down


Recommended Posts

Hi

 

Recently my area suffered a power outage.  My unRAID box wasn't doing anything at the time (all drives spun down).  When it came to rebooting, it would be mounting some of the disks for a long time, and then disappear off the network altogether.  After a few goes, I left it overnight, and when I came back it was OK again.  A few parity check errors, but otherwise fine.  I tried some normal restarts and we were back to where we started.

 

Unfortunately, this happened again this week, and I've just about got it back on again five days later.  First time it's booted past the mounting stage.  I'm tempted to think I've got a hardware fault somewhere.

 

Some reboots would lock the machine completely (I assume it had locked, I couldn't type on the keyboard or ping it).

 

I'm attaching a syslog in the hope that somebody can help me get to the root of this problem, since I'm currently always one bad shutdown away from have data loss for a week.

 

A few thoughts and stats...

 

- There are 10 WD Green Power drives in the array, a mixture of EACS, EADS and EARS

- 8 drives are plugged straight into the Gigabyte motherboard, 2 into a SuperMicro AOC-SAT2-MV8 PCI card

- Drive 9 is taking ages to mount at the moment, and sometimes lists errors on the home page errors column.  I've just replaced the SATA cable.  It is plugged into the PCI card

- Drive 3 occasionally has errors on the home page, but I've not yet found the root of that.  Don't think it's related to this.

- All are coming off a 500w power supply, with a few molex-Sata splitters

- Afraid I couldn't capture any log files from when it bombed out whilst mounting

- The log file attached was accessed during a parity check via http://unraid1/log/syslog

- It's normally a headless machine that I telnet into, but I took it out of the rack last night to see whether anything on screen was to blame.  I took it into work today and saw a few error messages flash by on boot when it locked, but no such errors on a clean boot

- I'm running 4.7.  I was previously running 4.6, but upgraded after the first round of errors

 

Any help very gratefully received, since I'm a little lost here.

 

Take care, Simon

syslog_11-Mar-2011.zip

Link to comment

/dev/sdj is suffering from many UNC media errors.  (UNC = un-correctable) These are unreadable sectors on the disk.  This is disk9 in your array.  The "errors" you see on the unRAID display are "read" errors.  When you read those sectors, and when the disk reports a UNC error the unRAID array re-constructs the correct contents be reading parity in combination with all the other data disks.

 

Please get a smart report on that drive by typing

smartctl -d ata -a /dev/sdj

 

Look for the attributes for Sectors that have been re-allocated, and those pending re-allocation.

Odds are you have a disk that is slowly failing (since you say these errors have been happening for some time now)

 

We'll know when you post the output of the smart report.

 

Joe L.

Link to comment

SMART report looks OK. Might be cables or port.

It indicates the sectors were successfully re-written to their original locations.  Probably nothing to do with the cables, but it could be almost anything that prevented the initial "write" from being written properly, including vibration, a poor power supply, or poor power connection.

 

Joe L.

Link to comment

Ahh, thanks guys.

 

So you're saying that the drive's OK, but my PSU or mains itself might be the culprit?

 

I was thinking of changing the power supply, since although it was new, it was fairly cheap.  I have loaded it with quite a few molex splitters too, which can't be great.

Link to comment

In my experience, I can't tell you how many problems can be caused by an inadequate or unreliable power supply.  If I were you, I'd invest in a good single rail power supply with a rating of 20% more power than you will need.  In the long run it will save you much trouble.

Link to comment

Thanks guys - I'll look to upgrade the PSU.  The one in there's nothing special - something that came with the case.  I'll check what it is shortly.

 

I was thinking of upgrading anyway, since I got it before 80plus PSUs hit the market.  Figured it would pay for itself anyway, since it's on 24/7.

Link to comment

Thanks guys - I'll look to upgrade the PSU.  The one in there's nothing special - something that came with the case.  I'll check what it is shortly.

 

I was thinking of upgrading anyway, since I got it before 80plus PSUs hit the market.  Figured it would pay for itself anyway, since it's on 24/7.

With 10 drives on a generic supply that came with the case I can almost guarantee it is WAY overloaded.

 

Even if all green drives, at 2 Amps per disk that needs 20 Amps capacity on the 12 volt rail.  And that does not include the needs of the motherboard, disk controller cards, memory, or fans.  You probably need at least 35 Amps capacity to have some safety margin.

 

Joe L.

Link to comment

Hi guys - just a quick note to say thank you again for your support with this.  I've got my unRAID system up and running again for now, but I've just ordered a Corsair 650w PSU, as recommended in the Wiki.

 

The more I read about PSUs, the more I realised you just can't cut corners.  Sorry if this was a bit of a schoolboy error, but when I started the project I was struggling to pay for the components.  I was offered a 500w supply with the case, and I (wrongly) assumed that by adding up the sum component draw I'd be OK.

 

Since the Corsair's got 8 SATA leads on it, I can ditch most of my molex->sata convertors, which I've never been confident with.

 

Going to have a good look at the SATA leads too while I'm there, since two of them in my system are ones that came with the motherboard.

 

Thanks again

Link to comment

Hi

 

New PSU finally turned up yesterday, installed it in about half an hour and everything seems to be working fine.

 

Meant I could get rid of a couple of Molex->SATA splitters, and everything is much tidier now.  Taking the old one out, te difference in build quality is immense.  I'm sure it'll get re-used in a desktop somewhere.

 

Still getting a few errors on my Disk 3, which could either be a SATA or drive issue.  I'll look a little more into smartctl, and raise an issue elsewhere if required.

 

Just a note to say thank you for your time and patience.

Link to comment

Taking the old one out, te difference in build quality is immense.  I'm sure it'll get re-used in a desktop somewhere.

For what it's worth, in my experience old power supplies can be more trouble than they are worth.  Keep it as an emergency spare.

 

(Just my 2c.)

 

Les.

Link to comment

Yeah, good point.  Thanks.  I was thinking just an old machine that's used in the shed or somewhere.

 

The machine's been on for 24 hours now, and I've been throwing stuff at it.  Constantly parity checking whilst copying files to it and recreating the meta-data for the movies on there.  Hasn't missed a step or thrown an error (touch wood).

 

The old setup would never have coped with anything like this, throwing up "network location no longer available" type messages as it blipped in and out.  Golden rule: you can re-use drives and other components, but don't cut corners on the power supply.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.