Jump to content

System Rebooting During Drive Usage


Recommended Posts

So i had a thread last week about my usb stick becoming corrupt, sadly i am still having system reboots so im here to try and get some help and figure out whats causing it.  Basically the system will run forever if i just leave it and do nothing and had no issues during preclear with it rebooting, but as soon as i start an application like Sabnzbd+Sickbeard that is writing to the drives through a user share within an hour or two the system just hard reboots with no errors.

 

So without further ado here is everything i know

 

Unraid 4.7

2GB Sandisk Cruzer

 

ASRock 880GM-LE AM3 AMD 880G Micro ATX AMD Motherboard

AMD Sempron 140

2GBx1 Kingston ValuRam 1333Mhz

Corsair Builder 500W PSU

3x2TB WD Caviar Green EARS drives in AHCI

 

Things I Know

PreCleared all 3 Drives with no issues

Ran Memtest overnight, ran 14 passes with no errors.

Only happens when things are actively being written...

 

Attached is my Syslog, although im not sure how much it will help.  It wont have captured the crash itself but here it is...

Running Sabnzbd+Sickbeard off a user share on my drives to keep it from read/writing to my USB stick all the time.  Changed all paths within sabnzbd/sickbeard to run off the drive not USB

 

Any help on where to go from here would be GREATLY appreciated, if im going to have to return something its in the next week or two...

 

syslog.txt

Link to comment

Can you telnet in and leave a tail -f /var/log/syslog running? At least you'd have a chance of seeing a remnant from before it dies.

 

The usual effects of disk activity would be power draw, heat, and maybe interface problems. You said you precleared the 3 drives successfully. Were those running simultaneously or sequentially?

 

Have you been able to crash it at will, say with simultaneous copies from the command line?

Link to comment

Ill give that a try, but its so random if i have to be here to catch it i dont think i can.

 

Two drives were Precleared at the same time, one after.  Also LOL it just rebooted again but i couldnt see anything happen in the tail of the log...

 

Right now data is only being written to one of the drives, i will try changing my user share to write to the other drive and see if it still happens.

Link to comment

Ill give that a try, but its so random if i have to be here to catch it i dont think i can.

 

Two drives were Precleared at the same time, one after.  Also LOL it just rebooted again but i couldnt see anything happen in the tail of the log...

 

Right now data is only being written to one of the drives, i will try changing my user share to write to the other drive and see if it still happens.

 

So long as the tail command stays running you will see something from the server before it goes down.  The machine you are doing the tail from though will have to stay on and not go to sleep/hibernate/etc.

Link to comment

But the system is hard rebooting, unless i sit there and stare at it for two hours at the last line hoping to catch one thing being written before the machine reboots the next millisecond how does tail help me?  Or am i missing something about how the command works?  Its not gracefully shutting down, one second my monitor is on..the next its off and my BIOS is booting.

 

Thanks in advance for all the help, im leaning towards a Power supply issue, but i dont know how i can verify that...

Link to comment

But the system is hard rebooting, unless i sit there and stare at it for two hours at the last line hoping to catch one thing being written before the machine reboots the next millisecond how does tail help me?  Or am i missing something about how the command works?  Its not gracefully shutting down, one second my monitor is on..the next its off and my BIOS is booting.

 

Thanks in advance for all the help, im leaning towards a Power supply issue, but i dont know how i can verify that...

 

You run the tail command from another PC that is always on. Telnet into your tower from the PC and then run the Tail command. This way you will have a screen showing all the output up till the time of the reboot, whether it takes an hour or a couple of days to do it...

 

Shawn

Link to comment

If you suspect the power supply, you could try pressing the 'spin up' button on the unRAID main web page.  That will cause all of your drives to spin up at once.  If the server immediately reboots, then a power supply issue is likely.  If not, then I would guess there's something else at play.

 

Have you tried reseating all the cables inside your server?  An improperly grounded cable can lead to random reboots.  Also, is the server plugged into a UPS or at least a good quality surge protector?

Link to comment

So it finally rebooted, nothing showing in the console ..the last thing in it was from like 30 minutes earlier from sickbeard...

 

It would be nice if we could still get that output.

 

If indeed there was nothing specific in the log tail then it is more than likely not an application issue.

Start by making sure the heat sink is securely attached.  The run memtest for at least over night, and then move on to removing a stick of ram at a time and see if stability improves.  If not then move on to PSU and motherboard.

Link to comment

Already way ahead of you

 

Already ran Memtest overnight and it worked fine for 14x passes

Its only 1 stick of RAM, so theres nothing to remove ...

 

I dont have the output for the log, but rest assured it had nothing to do with this rebooting.  It was just info from updating sickbeard and a few rmdir commands failing because the folder didnt exist.

I wish there was an easy way to test Power supplies and motherboards :(  I could just return both i suppose and hope that fixes the issue...

 

Although i would hate to get new PS and Mobo and still have this issue lol...

Link to comment

Well im running out of ideas here...swapped PSU with the 430W and it still happens.

 

Since i have tested all RAM, HD's, and now PSU the only logical thing left is that the Motherboard is faulty...

 

Im gonna try and flash it to the latest FW and double check all my BIOS settings and try one more time then RMA it i guess unless anybody has any other ideas...

Link to comment

More info, tried swapping the DDR 3 from my HTPC out into the unRaid box, still hard rebooting...

 

Kinda disheartened by this board honestly, any suggestions for a replacement for AM3 DDR3?  I got this because it comes with 6 SATA ports onboard and seemed to get good reviews on this site, maybe just bad luck and try a new one?

Link to comment

Well im running out of ideas here...swapped PSU with the 430W and it still happens.

 

Since i have tested all RAM, HD's, and now PSU the only logical thing left is that the Motherboard is faulty...

 

Im gonna try and flash it to the latest FW and double check all my BIOS settings and try one more time then RMA it i guess unless anybody has any other ideas...

Have you run a reiserfsck on each of your data disks?  I've seen file-system corruption crash a server.

 

Joe L.

Link to comment

Hmmm when i try to run reiserfsck on any of these drives it says

reiserfs_open: the resiserfs superblock can not be found on /dev/sda

failed to open the filesystem

 

If the partition table has not been changed, and the partition is valid, and it really contains a reiserfs partition, then the superblock is corrupted and you need to run this utility with --rebuild-sb

 

I started running it with rebuild-sb, but it asks a bunch of questions i dont know the answers too about which reiserfs and block size, etc...

 

Link to comment

Hmmm when i try to run reiserfsck on any of these drives it says

reiserfs_open: the resiserfs superblock can not be found on /dev/sda

failed to open the filesystem

 

If the partition table has not been changed, and the partition is valid, and it really contains a reiserfs partition, then the superblock is corrupted and you need to run this utility with --rebuild-sb

 

I started running it with rebuild-sb, but it asks a bunch of questions i dont know the answers too about which reiserfs and block size, etc...

 

It is because the partition is NOT on /dev/sda, but on the first partition (/dev/sda1), which is attached to /dev/mdX

(where mdX - md1 for disk1, md2 for disk2, etc.)

 

You may have caused more corruption if you started a rebuild on the wrong device.  Hope it did not run.

 

To keep parity in sync, you MUST have the array started, the disk un-mounted, and run the reiserfsck on the /dev/mdX device.

If you run reiserfsck on the raw disk partition, as you were attempting, parity will not be fixed in sync, and you'll end up with a lot of parity errors.

 

 

Joe L.

Link to comment

Ok thanks, i do need some further clarification though.

 

I have 3 drives

sda(parity)

sdb(data)

sdc(data)

 

Looking in /dev i only have /dev/md1 and /dev/md2...i assume that those are my two data drives and it just doesnt make a /dev/md0 because thats my parity drive with no filesystem on it?

Just want to make sure i dont run it on my parity drive and that im not missing a /dev/md3 for example.

Link to comment

No right now its literally just plugged straight into the wall as i have it in my office from setting it up.

Go figure though after connecting with putty its been an hour and no reboot...

 

You really need to plug your server into a UPS or at very least a good surge protector.  What if you have flaky power that is causing this?

Link to comment

Looking in /dev i only have /dev/md1 and /dev/md2...i assume that those are my two data drives and it just doesnt make a /dev/md0 because thats my parity drive with no filesystem on it?

Correct.  parity drive has no file-system on it.

Just want to make sure i dont run it on my parity drive and that im not missing a /dev/md3 for example.

You are fine.  you are not missing a device.
Link to comment

I did buy a UPS as ive needed one, but it didnt make a difference.

 

So far ive replaced everything except the motherboard and drives, the drives check out according to their SMART reports and Preclear...with that information is that enough to say with certainty its not my drives and has to be the Motherboard?

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...