Cessquill Posted March 11, 2011 Share Posted March 11, 2011 Hi Recently my area suffered a power outage. My unRAID box wasn't doing anything at the time (all drives spun down). When it came to rebooting, it would be mounting some of the disks for a long time, and then disappear off the network altogether. After a few goes, I left it overnight, and when I came back it was OK again. A few parity check errors, but otherwise fine. I tried some normal restarts and we were back to where we started. Unfortunately, this happened again this week, and I've just about got it back on again five days later. First time it's booted past the mounting stage. I'm tempted to think I've got a hardware fault somewhere. Some reboots would lock the machine completely (I assume it had locked, I couldn't type on the keyboard or ping it). I'm attaching a syslog in the hope that somebody can help me get to the root of this problem, since I'm currently always one bad shutdown away from have data loss for a week. A few thoughts and stats... - There are 10 WD Green Power drives in the array, a mixture of EACS, EADS and EARS - 8 drives are plugged straight into the Gigabyte motherboard, 2 into a SuperMicro AOC-SAT2-MV8 PCI card - Drive 9 is taking ages to mount at the moment, and sometimes lists errors on the home page errors column. I've just replaced the SATA cable. It is plugged into the PCI card - Drive 3 occasionally has errors on the home page, but I've not yet found the root of that. Don't think it's related to this. - All are coming off a 500w power supply, with a few molex-Sata splitters - Afraid I couldn't capture any log files from when it bombed out whilst mounting - The log file attached was accessed during a parity check via http://unraid1/log/syslog - It's normally a headless machine that I telnet into, but I took it out of the rack last night to see whether anything on screen was to blame. I took it into work today and saw a few error messages flash by on boot when it locked, but no such errors on a clean boot - I'm running 4.7. I was previously running 4.6, but upgraded after the first round of errors Any help very gratefully received, since I'm a little lost here. Take care, Simon syslog_11-Mar-2011.zip Quote Link to comment
Joe L. Posted March 11, 2011 Share Posted March 11, 2011 /dev/sdj is suffering from many UNC media errors. (UNC = un-correctable) These are unreadable sectors on the disk. This is disk9 in your array. The "errors" you see on the unRAID display are "read" errors. When you read those sectors, and when the disk reports a UNC error the unRAID array re-constructs the correct contents be reading parity in combination with all the other data disks. Please get a smart report on that drive by typing smartctl -d ata -a /dev/sdj Look for the attributes for Sectors that have been re-allocated, and those pending re-allocation. Odds are you have a disk that is slowly failing (since you say these errors have been happening for some time now) We'll know when you post the output of the smart report. Joe L. Quote Link to comment
Cessquill Posted March 11, 2011 Author Share Posted March 11, 2011 Hi - and thank you so much for your time. I've just run this on the drive (didn't umount it, or stop parity check - hope that's OK). Results are attached. smartctl_test.txt Quote Link to comment
dgaschk Posted March 11, 2011 Share Posted March 11, 2011 SMART report looks OK. Might be cables or port. Quote Link to comment
Joe L. Posted March 11, 2011 Share Posted March 11, 2011 SMART report looks OK. Might be cables or port. It indicates the sectors were successfully re-written to their original locations. Probably nothing to do with the cables, but it could be almost anything that prevented the initial "write" from being written properly, including vibration, a poor power supply, or poor power connection. Joe L. Quote Link to comment
Cessquill Posted March 11, 2011 Author Share Posted March 11, 2011 Ahh, thanks guys. So you're saying that the drive's OK, but my PSU or mains itself might be the culprit? I was thinking of changing the power supply, since although it was new, it was fairly cheap. I have loaded it with quite a few molex splitters too, which can't be great. Quote Link to comment
jsdds Posted March 12, 2011 Share Posted March 12, 2011 In my experience, I can't tell you how many problems can be caused by an inadequate or unreliable power supply. If I were you, I'd invest in a good single rail power supply with a rating of 20% more power than you will need. In the long run it will save you much trouble. Quote Link to comment
PeterB Posted March 12, 2011 Share Posted March 12, 2011 My guess would be a problem with your psu. If ten drives spin up simultaneously, that can cause a problem for most split-rail power supplies, which are typically specified for less than 20 amps per rail. What psu are you using? Quote Link to comment
Cessquill Posted March 13, 2011 Author Share Posted March 13, 2011 Thanks guys - I'll look to upgrade the PSU. The one in there's nothing special - something that came with the case. I'll check what it is shortly. I was thinking of upgrading anyway, since I got it before 80plus PSUs hit the market. Figured it would pay for itself anyway, since it's on 24/7. Quote Link to comment
Joe L. Posted March 13, 2011 Share Posted March 13, 2011 Thanks guys - I'll look to upgrade the PSU. The one in there's nothing special - something that came with the case. I'll check what it is shortly. I was thinking of upgrading anyway, since I got it before 80plus PSUs hit the market. Figured it would pay for itself anyway, since it's on 24/7. With 10 drives on a generic supply that came with the case I can almost guarantee it is WAY overloaded. Even if all green drives, at 2 Amps per disk that needs 20 Amps capacity on the 12 volt rail. And that does not include the needs of the motherboard, disk controller cards, memory, or fans. You probably need at least 35 Amps capacity to have some safety margin. Joe L. Quote Link to comment
Cessquill Posted March 14, 2011 Author Share Posted March 14, 2011 Hi guys - just a quick note to say thank you again for your support with this. I've got my unRAID system up and running again for now, but I've just ordered a Corsair 650w PSU, as recommended in the Wiki. The more I read about PSUs, the more I realised you just can't cut corners. Sorry if this was a bit of a schoolboy error, but when I started the project I was struggling to pay for the components. I was offered a 500w supply with the case, and I (wrongly) assumed that by adding up the sum component draw I'd be OK. Since the Corsair's got 8 SATA leads on it, I can ditch most of my molex->sata convertors, which I've never been confident with. Going to have a good look at the SATA leads too while I'm there, since two of them in my system are ones that came with the motherboard. Thanks again Quote Link to comment
Cessquill Posted March 22, 2011 Author Share Posted March 22, 2011 Hi New PSU finally turned up yesterday, installed it in about half an hour and everything seems to be working fine. Meant I could get rid of a couple of Molex->SATA splitters, and everything is much tidier now. Taking the old one out, te difference in build quality is immense. I'm sure it'll get re-used in a desktop somewhere. Still getting a few errors on my Disk 3, which could either be a SATA or drive issue. I'll look a little more into smartctl, and raise an issue elsewhere if required. Just a note to say thank you for your time and patience. Quote Link to comment
S80_UK Posted March 22, 2011 Share Posted March 22, 2011 Taking the old one out, te difference in build quality is immense. I'm sure it'll get re-used in a desktop somewhere. For what it's worth, in my experience old power supplies can be more trouble than they are worth. Keep it as an emergency spare. (Just my 2c.) Les. Quote Link to comment
Cessquill Posted March 23, 2011 Author Share Posted March 23, 2011 Yeah, good point. Thanks. I was thinking just an old machine that's used in the shed or somewhere. The machine's been on for 24 hours now, and I've been throwing stuff at it. Constantly parity checking whilst copying files to it and recreating the meta-data for the movies on there. Hasn't missed a step or thrown an error (touch wood). The old setup would never have coped with anything like this, throwing up "network location no longer available" type messages as it blipped in and out. Golden rule: you can re-use drives and other components, but don't cut corners on the power supply. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.