April 17, 201115 yr When trying to copy large amounts of data over to the unraid server it seems to crash. I've been trying to figure out what it is with no luck. The copying will just stop and the unraid server becomes unavailable on the network. I can get it back on the network if I manually reboot it with a hard reset. Shutting it down through the web interface is not possible. Here is an error log through a telnet session while copying below. I will also attach my syslog as well. Says EIP error. Don't understand what is failing here? My system specs are included in my signature. Tower login: root Linux 2.6.32.9-unRAID. root@Tower:~# Message from syslogd@Tower at Sun Apr 17 04:44:21 2011 ... Tower kernel: Oops: 0010 [#1] SMP Message from syslogd@Tower at Sun Apr 17 04:44:21 2011 ... Tower kernel: last sysfs file: /sys/devices/pci0000:00/0000:00:11.0/host2/target2:0:0/2:0:0:0/block/sdb/stat Message from syslogd@Tower at Sun Apr 17 04:44:21 2011 ... Tower kernel: Stack: Message from syslogd@Tower at Sun Apr 17 04:44:21 2011 ... Tower kernel: Call Trace: Message from syslogd@Tower at Sun Apr 17 04:44:21 2011 ... Tower kernel: Process unraidd (pid: 1341, ti=c389c000 task=f767def0 task.ti=c389c000) Message from syslogd@Tower at Sun Apr 17 04:44:21 2011 ... Tower kernel: EIP: [<00000000>] 0x0 SS:ESP 0068:c389df08 Message from syslogd@Tower at Sun Apr 17 04:44:21 2011 ... Tower kernel: CR2: 0000000000000000 Message from syslogd@Tower at Sun Apr 17 04:44:21 2011 ... Tower kernel: Code: Bad EIP value. root@Tower:~# syslog.txt
April 17, 201115 yr Author could this all be caused by a faulty weak power supply? I don't know what else to think?
April 17, 201115 yr Author What type of PSU is it? It's a nzxt 400w. Waiting on my 650w single rail in the mail.
April 18, 201115 yr Author well replaced the power supply and the unraid server is still crashing when I try to copy large amounts of data over to it at one time. Tried copying over around 40gb and it makes it 3/4 of the way through and crashes and the server then becomes unresponsive. The web interface is no longer accessible and the server on the network is not either. I must then do a hard reset to get it to come back online. I have ran a check disk command on the hard drives with no data corruption or errors. I have 2 tb wd drives and have not activated the parity drive yet until I get all my data copied over. I've run memtest overnight with no errors to speak of. All my hardware is listed in my sig. Could it be the brand of ram I'm using or is it more the motherboard? I'm not sure what to do next? I've attached my last syslog. syslog.txt
April 19, 201115 yr One thing I noticed, you NIC is going nuts. Lots of link lost and DHCP renegotiations. And weird'r still, it negotiates a 192.168.0.x ip, then got a different one (dhcp will normally give you back the same) and then on another one, got a 10.0.1.x ip?? You changing network addressing on your local LAN? One thing I would do, set a static IP on your server, will hopefully clear up the DHCP mess going on. From there see if it stabilizes a bit... Shawn
April 19, 201115 yr Author One thing I noticed, you NIC is going nuts. Lots of link lost and DHCP renegotiations. And weird'r still, it negotiates a 192.168.0.x ip, then got a different one (dhcp will normally give you back the same) and then on another one, got a 10.0.1.x ip?? You changing network addressing on your local LAN? One thing I would do, set a static IP on your server, will hopefully clear up the DHCP mess going on. From there see if it stabilizes a bit... Shawn The network ip range changing is due to me changing out my router during this whole process. Replaced a dlink router with an Airport Extreme. I had my ip on the server setup static to begin with and I still encountered the same problems. Anyway I think I may have solved the problem. Even though my ram checked out in the memtest ok, I thought I'd try pulling a 2gb stick out of the board and Voila it works flawlessly now. I'm not holding my breath but I've managed to copy over 100gigs so far with no problems. Is it a problem running two sticks of ram instead of one you think? I can't see the ram being bad but I have no other way to test it. Looks like my server is going to be running 1 2gb stick now instead of 2 2gb sticks.
April 19, 201115 yr Anyway I think I may have solved the problem. Even though my ram checked out in the memtest ok I am puzzled about what you meant in your earlier post when you said: I've run memtest overnight with no errors to speak of. Did it report errors, or didn't it? I thought I'd try pulling a 2gb stick out of the board and Voila it works flawlessly now. Have you tried swapping to the other memory stick? Is it a problem running two sticks of ram instead of one you think? Of course it shouldn't be. I nearly always run with matched pairs of RAM in order to benefit from the small gain of having interleaved memory accesses. Is your ram on the manufacturer's qvl? Does the memory manufacturer recommend the ram for use in that mobo? Obviously these lists are never exhaustive, but there are some known mobo/ram incompatibilities. Which two sockets did you have the ram installed in?
April 19, 201115 yr Is it a problem running two sticks of ram instead of one you think? I can't see the ram being bad but I have no other way to test it. Looks like my server is going to be running 1 2gb stick now instead of 2 2gb sticks. [/color] Not at all, you can run 1x2GB, 2x2GB, 1 x 4GB, whatever.... you might see a very minor speed increase in dual channel mode, but nothing much to worry about it. And yes, RAM, or a slot, can just go bad... Couple things, put that ram stick back in same slot, remove other chip. Same results? If it ok, maybe a mobo problem. If yes, then most like a RAM issue or slot issue. Try the "working" 2GB stick in the other slot and test. That way you can narroe down to either a bad stick, a bad slot, or the mobo not play nice with 2 x xGB chips for some reason. And as PeterB asked, did you check the mobo compatibility chart with the memory your using? This may also be an issue... Shawn
April 20, 201115 yr Author Well it looks like I only have problems with the server if I run the ram in dual channel mode on the motherboard. I can run both sticks of memory successfully if I run them both in single channel mode. Not sure why this would be but I don't think running dual channel over single channel is going to make that much of a performance difference anyway. Thanks for all the help guys.
April 20, 201115 yr Have you updated the BIOS - this is the first thing one should do. Load the default values, change the HD to AHCI mode and then disable anything you wont use - serial an parallel ports, audio, firewire, floppy drive, even the IDE controller if you are not going to use any older PATA HD. If you are using fancy Muskin memory - high speed, high voltage - make sure all the memory settings are set in Manual mode and even you can bump the voltage a bit. Then run a single pass of the memtest to see if there are no problems with the new BIOS and RAM settings. Post your syslog again, then do a short SMART test to both drives and post the stats.
Archived
This topic is now archived and is closed to further replies.