spin down problems (causing reboots) - methinks it could be my PSU? HELP!


directo

Recommended Posts

Just upgraded some components in my unRAID setup. High Level overview:

- C2SEE mobo

- 2 1430sa's

- Corsair TX 750 W (750 watt PSU)

- I've got 13 drive in total

- using Lian Li case with 5-3 icy dock backplanes

 

When I click the "Spin Down" button to spin all the disks down, I can hear the drives starting to spin down, and then it just hangs.  Sometimes it resets the system, sometimes I need to do a physical reset from the box.

 

Here's the tail of my syslog when it does that (I stopped a parity check that was in progress):

Mar 21 12:14:10 DIRECTO_MEDIA kernel: mdcmd (10): nocheck

Mar 21 12:14:10 DIRECTO_MEDIA kernel: md: md_do_sync: got signal, exit...

Mar 21 12:14:10 DIRECTO_MEDIA kernel: md: recovery thread sync completion status: -4

Mar 21 12:14:25 DIRECTO_MEDIA emhttp: shcmd (19): sync

Mar 21 12:14:29 DIRECTO_MEDIA emhttp: shcmd (20): /usr/sbin/hdparm -y /dev/sdd >/dev/null

Mar 21 12:14:29 DIRECTO_MEDIA emhttp: shcmd (21): /usr/sbin/hdparm -y /dev/sdk >/dev/null

Mar 21 12:14:30 DIRECTO_MEDIA emhttp: shcmd (22): /usr/sbin/hdparm -y /dev/sdm >/dev/null

Mar 21 12:14:31 DIRECTO_MEDIA emhttp: shcmd (23): /usr/sbin/hdparm -y /dev/sdi >/dev/null

Mar 21 12:14:31 DIRECTO_MEDIA emhttp: shcmd (24): /usr/sbin/hdparm -y /dev/sdn >/dev/null

Mar 21 12:14:32 DIRECTO_MEDIA emhttp: shcmd (25): /usr/sbin/hdparm -y /dev/sdj >/dev/null

Mar 21 12:14:33 DIRECTO_MEDIA emhttp: shcmd (26): /usr/sbin/hdparm -y /dev/sdc >/dev/null

Mar 21 12:14:33 DIRECTO_MEDIA emhttp: shcmd (27): /usr/sbin/hdparm -y /dev/sdb >/dev/null

Mar 21 12:14:34 DIRECTO_MEDIA emhttp: shcmd (28): /usr/sbin/hdparm -y /dev/sda >/dev/null

Mar 21 12:14:34 DIRECTO_MEDIA emhttp: shcmd (29): /usr/sbin/hdparm -y /dev/sdl >/dev/null

Mar 21 12:14:35 DIRECTO_MEDIA emhttp: shcmd (30): /usr/sbin/hdparm -y /dev/sde >/dev/null

 

I've read a few posts on spin up/down - could this be related to my HD?

 

I've connected the backplanes with the molex connectors.  The PSU has 8 molex connectors, and I split a couple of those to power the back planes and for ease of cable routing (3 for each back plane).

 

Any thoughts?

Link to comment

admittedly, it hasn't been running for more than a day with the new configurations, but it's going through the parity check now and seems to be running fine (I've historically had problems with disks spinning back up causing errors).

 

I'll monitor the usage of the system (I currently have it set to no spin downs) but eventually want to get to a state where I have spin down enabled (to save both on power, and longevity of my drives).

 

Anybody else have any suggestions?

Link to comment

Usually a PSU would not be the problem on a drive spindown.

 

I would check to make sure that all of your drives are connected securely to the motherboard.  I am a huge fan of locking cables and use them wherever possible.  Also make sure that you are using high quality splitter for the molex connectors.

Link to comment

The drives that do not appear in the list of drives spun down are sdf, sdg, and sdh, and immediately follow the final drive sde.  That suggests that those 4 sequentially enumerated drives are on the same controller, which indicates a possible problem with that controller.  (I haven't looked at your syslog, no time yet.)

Link to comment

Thanks Rob....I'll investigate this a little further.  Admittedly, I'm not an expert in unraid / linux...so if you do see something else that catches your eye, please let me know...

 

Like I said, it's running the parity check (seems to be going fine).  Once that's done, I'll check into this.

 

I'm also not sure if I've setup my controllers properly - I just glanced through my syslog (I don't really understand msot of it :P) and noticed I saw a lot of UDMA / 133.  Somehow that doesn't seem right to me?  (I could be wrong)....

Link to comment

Did some more testing...I don't think it's the controlllers...I switched the controllers, and then took the controller out that was apparently causing the problem, but I still have issues with spin down (shuts the server down).

 

So after I took the suspected controller it, it still keeps crashing (now it's crashing during spin down of disks that previously would spin down fine.

 

Needless to say, I'm a bit confused.....

Link to comment

Now that I've looked at your syslog, forget what I said about those 4 drives appearing to be on the same controller, they aren't.  sdg was your flash drive, sde and sdf were connected to ports 3 and 4 of the second Adaptec (first 2 ports were empty), and sdh was connected by itself to the SiI3132-based addon card.  Now that you have moved things around, this may no longer apply.

 

Unfortunately, your syslog shows no problems at all.  UDMA/133 is the correct mode for drives, quite normal.  The only 'tweak' I can see is to change to using AHCI for your SATA drives connected to the onboard SATA ports, in your BIOS settings, but that should not make any difference at all.

 

I have never heard of any problems spinning drives down, except that a very few drives would not spin back up, and that would result in LOTS of errors in the syslog, but that does not seem to be your problem.  I would capture additional syslogs, look for errors at the bottom, and note whatever the very last syslog lines are, whenever it crashes.  In the tail piece you included above, I would suspect sde, as no further commands were issued after the spin down command to sde (Disk 10, Hitachi, serial ending in 51B).  You might watch to see if the same drive is the last one mentioned on the screen or in the syslog, or note which one *is* the last, and possibly which disk controller is involved.  I don't have any other ideas yet, because as I said, this is a first report of problems when spinning down.  Also, drive problems should not crash a machine, just cause errors, very visible errors.

 

One last long shot, try a memtest overnight.

Link to comment

Well, something is DEFINITELY not right.  Was trying to stream a movie, and the whole thing just conked about a minute into the movie.  My build is extremely unstable now I think (I hard restarted it, and it conked again).  I have a feeling it's a hardware issue, now it's just a matter of trying to isolate which piece of the puzzle it is...

 

I'm doing a memtest right now.  I've reseated all the cables, double checked all the power connections, etc.

 

As to your suggestion of the actual drive being the problem - along with the mobo, I added 3 new drives (one of them being sde that's referenced in your post).  I yanked all three and did the spin down test again and it still happened.

 

I'm using the C2SEE mobo - and I tried to find the setting to change it to AHCI, but I can't seem to find it.  It's in the manual, but that parameter seems to be missing from my menus, I'll need to see if I'm on the latest firmware.

 

I'm a little dissapointed now, as I thought upgrading to this hardware would at least put me in a better position - hopefully I can get this working!  I'll try posting another syslog once the memcheck is done.

 

Frustrated, but optimistic!

 

(note, I've been using the PSU, cables, 1 controller card just fine.  I've changed the mobo, added another 1430SA, and a masscool controller (to get an extra 2 more ports)).

 

Thanks again Rob for the insights, and the help...

Link to comment

ok - it just keeps getting weirder...

 

Did the memtest - ran it over night, and passed all the tests.

 

Now I notice that when I'm trying to copy files from the array, it craps out after a few seconds.  The lights on my Icy Dock back planes go red for some drives (this changes every time I try to run the copy test).  I've tried to tail my syslog as I do the copy, but nothing shows up in the syslog.

 

I have since, totallydisconnected the new 1430SA, new drives I installed, and the third back plane I was using.  This leaves me with this setup:

- 2 ICY Dock Back planes powered

- 10 drives (6 using the onboard mobo SATA connections, 4 using my original 1430SA (that has been working find with the old mobo)

- removed all molex splitters (as I only have to 2 back planes powered right now)

 

with this configuration, the spin down test also fails (in addition to the copy file test)

 

I'm now going to go through the process of removing drives from the array and testing each cable out.  If anyone has suggestions on how to do this, I'm welcoming anything at this point.....what has me perplexed is that I've been using the above configuration (with albeit a different motherboard) for months now and everything had worked (I was running the server without spin downs previously).

 

could this be a bad mobo?

Link to comment

ok - it just keeps getting weirder...

 

Did the memtest - ran it over night, and passed all the tests.

 

Now I notice that when I'm trying to copy files from the array, it craps out after a few seconds.  The lights on my Icy Dock back planes go red for some drives (this changes every time I try to run the copy test).  I've tried to tail my syslog as I do the copy, but nothing shows up in the syslog.

 

I have since, totallydisconnected the new 1430SA, new drives I installed, and the third back plane I was using.  This leaves me with this setup:

- 2 ICY Dock Back planes powered

- 10 drives (6 using the onboard mobo SATA connections, 4 using my original 1430SA (that has been working find with the old mobo)

- removed all molex splitters (as I only have to 2 back planes powered right now)

 

with this configuration, the spin down test also fails (in addition to the copy file test)

 

I'm now going to go through the process of removing drives from the array and testing each cable out.  If anyone has suggestions on how to do this, I'm welcoming anything at this point.....what has me perplexed is that I've been using the above configuration (with albeit a different motherboard) for months now and everything had worked (I was running the server without spin downs previously).

 

could this be a bad mobo?

 

It could be the motherboard, but what I would do first is to remove any new stuff you have added.  It sounds like you have added one more IcyDock; am I correct?  If so the first thing I would do is remove that one backplane and work from there.  Remove the backplane, connect the drives like normal, and try to spin the server down.  If it borks go from there.

Link to comment

did some more 'fiddling', not sure if any of this will help...

 

- I took out all external SATA controllers (only testing with onboard SATA controller)

- I only connected 1 drive to the MOBO at a time

- I only powered 1 back plane at a time

 

These are some of my findings:

- With no drives plugged into the MOBO, the system WON'T boot.  It gets to the menu and goes through the bzimage and bzroot steps, and then quickly shuts down after that. (Is this normal?)

- If I connect a WD10EACS drive to the MOBO, the same thing happens (shuts down quickly after the bzimage and bzroot steps).  It does not matter which back plane or which SATA header the drive is plugged into.

- If any other drive is connected, unRAID will boot and I can get into the GUI, etc.

- If I spin down any single disk that's attached that successfully booted, the system immediately shuts down.  It doesn't matter which plane the drive is plugged into, or which SATA header on the MOBO it's plugged into.

- I also repeated these tests with different cables.

 

 

Can anyone explain this?  (should I try to flash the BIOS?)

Link to comment

did some more 'fiddling', not sure if any of this will help...

 

- I took out all external SATA controllers (only testing with onboard SATA controller)

- I only connected 1 drive to the MOBO at a time

- I only powered 1 back plane at a time

 

These are some of my findings:

- With no drives plugged into the MOBO, the system WON'T boot.  It gets to the menu and goes through the bzimage and bzroot steps, and then quickly shuts down after that. (Is this normal?)

- If I connect a WD10EACS drive to the MOBO, the same thing happens (shuts down quickly after the bzimage and bzroot steps).  It does not matter which back plane or which SATA header the drive is plugged into.

- If any other drive is connected, unRAID will boot and I can get into the GUI, etc.

- If I spin down any single disk that's attached that successfully booted, the system immediately shuts down.  It doesn't matter which plane the drive is plugged into, or which SATA header on the MOBO it's plugged into.

- I also repeated these tests with different cables.

 

Can anyone explain this?  (should I try to flash the BIOS?)

 

I would suggest taking the backplanes out of the equation.  Hook drives directly to your motherboard.

Link to comment

Just tried that...with a NON WD10EACS (western digital 1TB) drive...

Boots, but when spun down, the server shuts down...

 

If you still have the old motherboard then I would consider trying that one.  If it does not do it then it is likely it could be the new motherboard.  If so I would look to update the BIOS on the board and try again.

Link to comment

BIOS updated, and didn't seem to do any good...

 

I'm at a loss - I can't even copy files when there's 1 drive in the array (with or without backplane).

 

Any other tricks I should try?  Or does this sound like a bunk board...(I'll have to RMA)

 

Help me Obi-Wan Kenobe, you're my only hope...

Link to comment

Few questions ...

 

1.  When it "fails" does the whole thing power down?  If not, what does it do.

 

2.  When you try and copy a file, are you doing it over the network on just on that machine.

 

3,  Have you tried a different USB stick?

 

Link to comment

1. When I spin the disks down, the whole thing shuts down.  When I'm copying a file, the server freezes and I need to do a hard reset.  I've tried to tail the syslog, but nothing happens.

 

2. I'm doing a copy of the file over the network, using SAMBA.

 

3. I've tried a different USB stick.

 

- Do you guys think this is an RMA candidate?  I'd really like to avoid that...

UPDATE

I just tried using another controller to do a file copy, and it still hangs.....

 

Few questions ...

 

1.  When it "fails" does the whole thing power down?  If not, what does it do.

 

2.  When you try and copy a file, are you doing it over the network on just on that machine.

 

3,  Have you tried a different USB stick?

 

Link to comment

Strange combination of symptoms.

 

Motherboard problem high on my list.  PSU problem seems unlikely but I wouldn't rule it out. 

 

Were you careful mounting the motherboard?  Is it possible something is shorting out via the case?

 

Have yoiu completely removed the add-on controllers?  Are there any other add-on cards?

 

One other idea - go into the BIOS and select the option to set all setting to default.  Then make any changes you need to make.  When you upgrade the BIOS it sometimes shuffles how CMOS memory is used.  Not resetting can sometimes leave something weird (and undocumented) set.  Worth a try.  (This is a good thing to remember to do after upgrading BIOS).

Link to comment

Strange combination of symptoms.

 

Motherboard problem high on my list.  PSU problem seems unlikely but I wouldn't rule it out. 

 

Were you careful mounting the motherboard?  Is it possible something is shorting out via the case?

 

Have yoiu completely removed the add-on controllers?  Are there any other add-on cards?

 

One other idea - go into the BIOS and select the option to set all setting to default.  Then make any changes you need to make.  When you upgrade the BIOS it sometimes shuffles how CMOS memory is used.  Not resetting can sometimes leave something weird (and undocumented) set.  Worth a try.  (This is a good thing to remember to do after upgrading BIOS).

This is going to sound weird, but exactly which connectors from the power supply did you connect to the mother-board?  (describe exactly where you plugged the power supply connectors relative to chips/sockets on the motherboard.)

 

Your title line for this thread says you suspect the power supply... heave you tried a different one?

 

It almost sounds as if the current drawn by the disks is causing some kind of spike that affects the mother-board.

 

Did you install the mother-board with all the screws?  (perhaps it is a bad ground)

 

Joe L.

Link to comment

i've unmounted the board, from the case and still not working.  I think I'm going to have to revert to my old mobo for now, and see what's up.

 

Any other MOBO recommendations other than the C2SEE? (with two PCI-E slots)...

very frustrating...

 

The C2SEE motherboard is definitely compatible.

 

We have no idea of your experience putting computers like this together.  You seem to know what you're doing, but you still may have made a mistake with something relatively simple, like not plugging in all of the power cables.

 

I can tell you in that in 25+ years putting computers together (~12 computers), I have never had a bad MB out of the box.  I have had 2 of them go bad over time (both P4 MBs by the way).  PSUs - never a bad one OOTB, but about 5-7 have gone bad over time.

 

If you have another motherboard, try using it in a very simple configuration with one drive, mimicking what you had with the C2SEE.  If it works fine and the C2SEE crashes, you have your answer - its a bad motherboard.  Not an incompatible one, but a defective one.

Link to comment

I know enough to get myself into trouble :P  (all kidding aside, I know how to build a computer - been doing so for 15 or so years)....

 

I plugged my old MOBO in, and copies seem to work fine.  I've also never had a MOBO go bad, let alone get one on the first go....I guess I'll start filling out the RMA papers....

 

I'm wondering what would cause this kind of error....seems so random....

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.