Frequent Crashes - unknown cause

mcai3db3 · February 10, 2011

So my server keeps deciding to reboot itself. It's driving me mad. It has happened every few days and I really don't know why. Once it happened whilst streaming a video from the server, and on multiple occasions it has happened whilst I've been on the web management pages, messing around with settings etc. I ran a memtest and it came up with no issues, so I throw it out to you all now. What information should I upload? I have included a syslog but I doubt it's of any use.

http://www.2shared.com/document/dkdSeI5G/syslog-2011-02-09.html

The most recent crash was when I 'connected to server' from my Mac and it was spinning up one of my drives.

My server is a 3.06Ghz i3, 8 gig ram, gigabyte GA-H57M-USB3, I have 7 drives in there - 1 500GB WD cache, my 2tb parity, 3 * 2tb WD20EARS and a random 1TB hard drive.

SSD · February 10, 2011

Thoughts-

1. Bad or underpowered PSU

2. Bad caps on motherboard

3. Overclocking not stable

mcai3db3 · February 10, 2011

It's a 3.06Ghz i3, I don't think it's underpowered... it's also not overclocked.

Aaand I don't know what bad caps means

mcai3db3 · February 10, 2011

Ooohh, wait, capacitors I guess.

I've not looked. I'd be most annoyed if it was, the thing is about a month old. I guess it's possible.

Update: capacitors look fine.

SSD · February 10, 2011

PSU = power supply unit. What is your PSU?

mcai3db3 · February 10, 2011

Oh jeez sorry I misread the post. It's a PSU CORSAIR|CMPSU-550VX 550W RT.

SSD · February 10, 2011

Usually hard crashes as you are reporting are caused by a hardware problem, usually related to power. The fact that you observed a crash while a drive was spinning up tends to support that theory. You may have a defective PSU or motherboard, or perhaps a bad splitter / short somewhere.

How many drives are you running?

mcai3db3 · February 10, 2011

7

4 WD20EARS Green Drives

1 500GB Cache Drive

1 random 1TB drive

1 2TB Seagate barracuda w/updated firmware parity drive

I have actually added 1 or 2 of those drives since I first encountered this issue.

I don't really know how to troubleshoot this. I don't have any spare components and really can't buy a random PSU/Mobo on the off-chance that the problem is being caused that way.

mcai3db3 · February 10, 2011

Smart report on my cache drive - the only drive to return any errors. I'm suspicious of this drive.

SMART status Info for /dev/sda

smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build)

Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===

Model Family: Western Digital Caviar Blue Serial ATA family

Device Model: WDC WD5000AAKS-60Z1A0

Serial Number: WD-WCAWF4971192

Firmware Version: 06.01D06

User Capacity: 500,107,862,016 bytes

Device is: In smartctl database [for details use: -P show]

ATA Version is: 8

ATA Standard is: Exact ATA specification draft version not indicated

Local Time is: Thu Feb 10 08:35:44 2011 EST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status: (0x82) Offline data collection activity

was completed without error.

Auto Offline Data Collection: Enabled.

Self-test execution status: ( 241) Self-test routine in progress...

10% of test remaining.

Total time to complete Offline

data collection: (8160) seconds.

Offline data collection

capabilities: (0x5b) SMART execute Offline immediate.

Auto Offline data collection on/off support.

Suspend Offline collection upon new

command.

Offline surface scan supported.

Self-test supported.

No Conveyance Self-test supported.

Selective Self-test supported.

SMART capabilities: (0x0003) Saves SMART data before entering

power-saving mode.

Supports SMART auto save timer.

Error logging capability: (0x01) Error logging supported.

General Purpose Logging supported.

Short self-test routine

recommended polling time: ( 2) minutes.

Extended self-test routine

recommended polling time: ( 97) minutes.

SCT capabilities: (0x303f) SCT Status supported.

SCT Feature Control supported.

SCT Data Table supported.

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0

3 Spin_Up_Time 0x0027 150 141 021 Pre-fail Always - 3458

4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 380

5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0

7 Seek_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0

9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 794

10 Spin_Retry_Count 0x0033 100 100 051 Pre-fail Always - 0

11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0

12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 39

184 End-to-End_Error 0x0033 100 100 097 Pre-fail Always - 0

187 Reported_Uncorrect 0x0032 100 099 000 Old_age Always - 1

188 Command_Timeout 0x0032 100 098 000 Old_age Always - 4295032834

190 Airflow_Temperature_Cel 0x0022 073 066 040 Old_age Always - 27 (Lifetime Min/Max 25/27)

192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 16

193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 363

196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0

197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0

198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0

199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0

200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0

SMART Error Log Version: 1

ATA Error Count: 1

CR = Command Register [HEX]

FR = Features Register [HEX]

SC = Sector Count Register [HEX]

SN = Sector Number Register [HEX]

CL = Cylinder Low Register [HEX]

CH = Cylinder High Register [HEX]

DH = Device/Head Register [HEX]

DC = Device Command Register [HEX]

ER = Error register [HEX]

ST = Status register [HEX]

Powered_Up_Time is measured from power on, and printed as

DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,

SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1 occurred at disk power-on lifetime: 354 hours (14 days + 18 hours)

When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

ER ST SC SN CL CH DH

-- -- -- -- -- -- --

10 51 10 7f 04 04 e0 Error: IDNF at LBA = 0x0004047f = 263295

Commands leading to the command that caused the error were:

CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

-- -- -- -- -- -- -- -- ---------------- --------------------

ca 00 10 7f 04 04 e0 08 1d+17:40:52.426 WRITE DMA

ca 00 08 5f d7 00 e0 08 1d+17:40:51.965 WRITE DMA

ca 00 68 f7 d6 00 e0 08 1d+17:40:51.965 WRITE DMA

SMART Self-test log structure revision number 1

Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error

# 1 Short offline Completed without error 00% 785 -

# 2 Short offline Completed without error 00% 785 -

# 3 Extended offline Aborted by host 90% 785 -

# 4 Extended offline Aborted by host 90% 785 -

# 5 Extended offline Aborted by host 90% 785 -

SMART Selective self-test log data structure revision number 1

SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS

1 0 0 Not_testing

2 0 0 Not_testing

3 0 0 Not_testing

4 0 0 Not_testing

5 0 0 Not_testing

Selective self-test flags (0x0):

After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

SSD · February 10, 2011

Your PSU should be fine with that many drives.

I would recommend unplugging, blowing out, and replygging all PSU connections (to MB and disks). If you are using power splitters, replace them if you have extras.

You should also post a syslog to see if there are any clues. It is possible that a good clue might be added to the syslog just before it reboots. To see if you should run the following from a telnet session on you desktop:

tail -f /var/log/syslog

As things get written to the syslog they will automatically be shown here. After a spontaneous reboot, you can check this window for clues.

Figuring these types of things out can be hard. JoeL had a power splitter problem that took him a long time to isolate. Try the above and let's see if we eliminate the problem or get a better clue.

mcai3db3 · February 10, 2011

Thanks for your help. Off to work right now so I'll continue on the testing tonight. I did post a syslog from unMENU in my OP, but I was confused as to how useful they'd be, given they begin at "Feb 9 18:27:50 SERVER syslogd 1.4.1: restart.".

SSD · February 10, 2011

Definitely some problems shown in your syslog. Not 100% following it - reading these things not my specialty.

Looks like you booted around 6:30pm and problems started around 11:30pm.

Here are a few lines around the transtion. Did you just start the array at 11:30?

JoeL or RobJ - can you download and have a look? Click the 2nd download button (that threw me first time I tried to dl it).

Here is a snippet:

Feb  9 18:27:56 SERVER emhttp: disk_temperature: ATTR_Temperature_Celsius not found
Feb  9 18:27:56 SERVER kernel: REISERFS (device md5): replayed 7 transactions in 4 seconds
Feb  9 18:27:57 SERVER kernel: REISERFS (device md5): Using r5 hash to sort names
Feb  9 18:27:57 SERVER emhttp: disk_temperature: ATTR_Temperature_Celsius not found
Feb  9 18:27:59 SERVER emhttp: disk_temperature: ATTR_Temperature_Celsius not found
Feb  9 23:28:02 SERVER ntpd[1511]: synchronized to 69.50.231.130, stratum 2
Feb  9 23:28:02 SERVER ntpd[1511]: time reset +18000.007084 s
Feb  9 23:28:02 SERVER emhttp: disk_temperature: ATTR_Temperature_Celsius not found
Feb  9 23:28:03 SERVER emhttp: shcmd (25): /usr/sbin/hdparm -y /dev/sda >/dev/null
Feb  9 23:28:26 SERVER emhttp: shcmd (26): /usr/sbin/hdparm -y /dev/sda >/dev/null
Feb  9 23:28:28 SERVER emhttp: disk_temperature: ATTR_Temperature_Celsius not found
Feb  9 23:28:36 SERVER last message repeated 11 times
Feb  9 23:28:37 SERVER init: Re-reading inittab
Feb  9 23:29:14 SERVER emhttp: disk_temperature: ATTR_Temperature_Celsius not found
Feb  9 23:29:15 SERVER last message repeated 2 times
Feb  9 23:29:20 SERVER kernel: REISERFS (device md2): replayed 60 transactions in 18088 seconds
Feb  9 23:29:20 SERVER kernel: REISERFS (device md2): Using r5 hash to sort names
Feb  9 23:29:25 SERVER kernel: REISERFS (device md3): replayed 71 transactions in 18093 seconds
Feb  9 23:29:25 SERVER kernel: REISERFS (device md3): Using r5 hash to sort names
Feb  9 23:30:29 SERVER emhttp: disk_temperature: ATTR_Temperature_Celsius not found
Feb  9 23:30:31 SERVER emhttp: disk_temperature: ATTR_Temperature_Celsius not found
Feb  9 23:31:03 SERVER kernel: REISERFS (device md1): replayed 260 transactions in 18191 seconds
Feb  9 23:31:03 SERVER kernel: REISERFS (device md1): Using r5 hash to sort names
Feb  9 23:31:25 SERVER kernel: REISERFS (device md4): replayed 324 transactions in 18213 seconds
Feb  9 23:31:25 SERVER kernel: REISERFS (device md4): Using r5 hash to sort names
Feb  9 23:31:25 SERVER emhttp: disk_temperature: ATTR_Temperature_Celsius not found
Feb  9 23:31:25 SERVER emhttp: shcmd (31): rm /etc/samba/smb-shares.conf >/dev/null 2>&1
Feb  9 23:31:25 SERVER emhttp: _shcmd: shcmd (31): exit status: 1
Feb  9 23:31:25 SERVER emhttp: shcmd (32): cp /etc/exports- /etc/exports
Feb  9 23:31:25 SERVER emhttp: shcmd (33): killall -HUP smbd
Feb  9 23:31:25 SERVER emhttp: shcmd (34): /etc/rc.d/rc.nfsd restart | logger
Feb  9 23:32:02 SERVER emhttp: disk_temperature: ATTR_Temperature_Celsius not found
Feb  9 23:32:04 SERVER kernel: mdcmd (18): nocheck 
Feb  9 23:32:04 SERVER kernel: md: md_do_sync: got signal, exit...
Feb  9 23:32:04 SERVER kernel: md: recovery thread sync completion status: -4
Feb  9 23:32:04 SERVER emhttp: disk_temperature: ATTR_Temperature_Celsius not found
Feb  9 23:32:05 SERVER emhttp: disk_temperature: ATTR_Temperature_Celsius not found
Feb  9 23:32:21 SERVER emhttp: shcmd (35): /etc/rc.d/rc.samba stop | logger
Feb  9 23:32:21 SERVER emhttp: shcmd (36): /etc/rc.d/rc.nfsd stop | logger
Feb  9 23:32:22 SERVER emhttp: Spinning up all drives...
Feb  9 23:32:22 SERVER emhttp: shcmd (37): /usr/sbin/hdparm -S0 /dev/sda >/dev/null
Feb  9 23:32:22 SERVER kernel: mdcmd (19): spinup 0
Feb  9 23:32:22 SERVER kernel: mdcmd (20): spinup 1
Feb  9 23:32:22 SERVER kernel: mdcmd (21): spinup 2
Feb  9 23:32:22 SERVER kernel: mdcmd (22): spinup 3
Feb  9 23:32:22 SERVER kernel: mdcmd (23): spinup 4
Feb  9 23:32:22 SERVER kernel: mdcmd (24): spinup 5
Feb  9 23:32:22 SERVER emhttp: shcmd (38): sync
Feb  9 23:32:22 SERVER emhttp: disk_temperature: ATTR_Temperature_Celsius not found
Feb  9 23:32:23 SERVER emhttp: shcmd (39): umount /mnt/user >/dev/null 2>&1
Feb  9 23:32:23 SERVER emhttp: _shcmd: shcmd (39): exit status: 1
Feb  9 23:32:23 SERVER emhttp: shcmd (40): rmdir /mnt/user >/dev/null 2>&1
Feb  9 23:32:23 SERVER emhttp: _shcmd: shcmd (40): exit status: 1
Feb  9 23:32:23 SERVER emhttp: Retry unmounting user share(s)...
Feb  9 23:32:24 SERVER emhttp: disk_temperature: ATTR_Temperature_Celsius not found
Feb  9 23:32:25 SERVER emhttp: disk_temperature: ATTR_Temperature_Celsius not found
Feb  9 23:32:28 SERVER emhttp: shcmd (41): umount /mnt/user >/dev/null 2>&1
Feb  9 23:32:28 SERVER emhttp: _shcmd: shcmd (41): exit status: 1
Feb  9 23:32:28 SERVER emhttp: shcmd (42): rmdir /mnt/user >/dev/null 2>&1
Feb  9 23:32:28 SERVER emhttp: _shcmd: shcmd (42): exit status: 1
Feb  9 23:32:28 SERVER emhttp: Retry unmounting user share(s)...
Feb  9 23:32:33 SERVER emhttp: shcmd (43): umount /mnt/user >/dev/null 2>&1

mcai3db3 · February 10, 2011

Towards the end it's me telling the system to stop the array and the cache drive not wanting to unmount, can't remember the exact steps but I just tried to shut down all sab/sickbeard etc and tried to close everything I could think of... it didn't help, I ended up just using powerdown in UnMENU.

Also the cache drive temperature does not display in unRAID, so thats one of the errors (it just shows 0C - but it works fine in unMENU).

sda is my cache btw.

SSD · February 10, 2011

There can be a problem with excessive logging caused by certain types of problem. This can cause the server to start killing processes (to free up memory) and ultimately the server becomes unresponsive. I don't remember if this causes the server to reboot - but seems logical that it could. I know that the Web GUI becomes inaccessible.

Your syslog seems to be doing a lot of logging near the end, although it stops and the file is not that large. If the box continued logging at this rate after the syslog was captured, however, it could create this excessive logging situation. Or it could be that the errors are somehow triggering a reboot not related to running out of memory. Either way, I suggest you work through trying to address the issues in the syslog and see if the rebooting issue goes away.

If no one responds, you might want to send Joe L. or RobJ a PM with a link to this thread and ask if one of them to look at your syslog.

Joe L. · February 10, 2011

Towards the end it's me telling the system to stop the array and the cache drive not wanting to unmount, can't remember the exact steps but I just tried to shut down all sab/sickbeard etc and tried to close everything I could think of... it didn't help, I ended up just using powerdown in UnMENU.

Also the cache drive temperature does not display in unRAID, so thats one of the errors (it just shows 0C - but it works fine in unMENU).

sda is my cache btw.

What is going on with the time sync. It threw me for a few minutes when I saw the lines saying the reiserfs transactions were replayed in a bit over 18000 seconds. (5+ hours)

Then I saw the lines:

Feb 9 23:28:02 SERVER ntpd[1511]: synchronized to 69.50.231.130, stratum 2

Feb 9 23:28:02 SERVER ntpd[1511]: time reset +18000.007084 s

Before you can troubleshoot your server crashes I think you need to stop the use of any add-ons that would hold disks busy.

Something is keeping /mnt/user busy and not allowing unRAID to un-mount it.

Furthermore, it appears as if even after un-mounting, the mount point (/mnt/user) cannot be removed because it is not empty or if a process has it as its current directory. This can occur if something on your server is creating a file or directory under it before the user-share file system has been started, or if a process is changing directory to /mnt/user.

Feb 9 23:36:17 SERVER unmenu[1583]: umount: /mnt/user: not mounted

Feb 9 23:36:17 SERVER unmenu[1583]: rmdir: /mnt/user: Directory not empty

With /mnt/user un-mounted that should be nothing in /mnt/user (It is a mount point, and should be just an empty directory)

What add-ons are you installing? What EXACTLY does your config/go script contain?

Are your add-ons delaying their access of /mnt/user until after the user share file system is established.

To see what processes are holding file systems busy, type:

fuser -mv /mnt/disk* /mnt/user/*

You might want to run that command and post its output before you force the shutdown of the server.

Joe L.

mcai3db3 · February 10, 2011

Thanks for your help Joe. I'll take a look at this further this evening, but my first comments are:

RE: Time sync - I have no clue what any of that means

RE: Add-ons - off the top of my head I have sabnzbd, sickbeard and the add-on for printer drivers. I'm going to take my cache offline for now and thus remove sabnzbd and sickbeard. I will uninstall any other add-ons I have and see how it runs for a few days. The go script DOES have a 30 seconds wait before installing sab/sick.

When I get home I'll post my exact goscript.

It's worth noting - that syslog was taken after my system had done one of it's random reboots, at this point the system tries to parity check (I stopped it). At that point when you connect to the server most of the shares do not appear, so I tried to stop my array and start it again. It would not stop, then I powered down. This makes me think the syslog in question may have errors that would not normally appear in day-to-day use, and maybe a product of the shares not being available.

mcai3db3 · February 10, 2011

Ok, first of all, here is a recent error I just got:

Feb 10 17:24:19 SERVER kernel: irq 16: nobody cared (try booting with the "irqpoll" option) (Errors)

Feb 10 17:24:19 SERVER kernel: Pid: 0, comm: swapper Not tainted 2.6.32.9-unRAID #8 (Errors)

Feb 10 17:24:19 SERVER kernel: Call Trace: (Errors)

Feb 10 17:24:19 SERVER kernel: [<c10451cf>] __report_bad_irq+0x2e/0x6f (Errors)

Feb 10 17:24:19 SERVER kernel: [<c1045305>] note_interrupt+0xf5/0x13c (Errors)

Feb 10 17:24:19 SERVER kernel: [<c1045a14>] handle_fasteoi_irq+0x5f/0x9d (Errors)

Feb 10 17:24:19 SERVER kernel: [<c1004a82>] handle_irq+0x1a/0x24 (Errors)

Feb 10 17:24:19 SERVER kernel: [<c1004285>] do_IRQ+0x40/0x96 (Errors)

Feb 10 17:24:19 SERVER kernel: [<c1002f29>] common_interrupt+0x29/0x30 (Errors)

Feb 10 17:24:19 SERVER kernel: [<c10085f9>] ? mwait_idle+0x4c/0x52 (Errors)

Feb 10 17:24:19 SERVER kernel: [<c1001a14>] cpu_idle+0x3a/0x4e (Errors)

Feb 10 17:24:19 SERVER kernel: [<c129c662>] start_secondary+0x195/0x19a (Errors)

Feb 10 17:24:19 SERVER kernel: handlers:

Feb 10 17:24:19 SERVER kernel: [<c11ead61>] (usb_hcd_irq+0x0/0x5b) (Drive related)

Feb 10 17:24:19 SERVER kernel: [<f8247022>] (ahci_interrupt+0x0/0x3df [ahci]) (Drive related)

Feb 10 17:24:19 SERVER kernel: Disabling IRQ #16

mcai3db3 · February 10, 2011

My entire Go script as requested:

#!/bin/bash

# Start the Management Utility

/usr/local/sbin/emhttp &

/boot/unmenu/uu

#sleep for 30 seconds

sleep 30

#Start SABnzbd

installpkg /mnt/cache/.custom/SABnzbdDependencies-2.1-i486-unRAID.tgz

python /mnt/cache/.custom/sabnzbd/SABnzbd.py -f /mnt/cache/.custom/sabnzbd.ini -d -s 192.168.1.2:1066

python /mnt/cache/.custom/sickbeard/SickBeard.py --daemon

cd /boot/packages && find . -name '*.auto_install' -type f -print | sort | xargs -n1 sh -c

mcai3db3 · February 10, 2011

I've attached a screenshot of my unMenu main screen - I was confused by the sda and sda1 that exists there for the cache drive. Is this normal?

Joe L. · February 11, 2011

I've attached a screenshot of my unMenu main screen - I was confused by the sda and sda1 that exists there for the cache drive. Is this normal?

Yes, it is normal.

The three letter designation is the name of the device used to access the entire drive, including the master boot record.

The same three letter name with the number at the end is the name of the device used to access that partition (partition 1 in your case for sda1) unRAID creates one partition on each of its drives. Other systems might create more than one partition, in which case you would see /dev/sda1, /dev/sda2, etc...

Joe L.

mcai3db3 · February 11, 2011

Ok thanks.

Well for now I've removed my cache driving that was giving me grief. I've got sab/sick installed on a regular disk, which is far from ideal, but will do for a week or so. I'm basically just going to wait it out and see if it crashes again. At some point I'll open her up and dust off the various connection cables & make sure they're all kosher.

Also, I have a spare 2TB drive, and one of my array disks is 1TB 7200, so I may just swap the array disk out for my 2TB and then make that disk my new cache. If my theory of crappy cache drive is actually anywhere near correct.

Last note: I think my syslog that I posted wasn't helpful because it just showed the problems caused by the random reboot, rather than the cause of the reboot. This was why the user folder was not empty, because UnRAID had issues after the reboot, it did not create all the shares, only created one of them (and then it couldn't get rid of that 1 share when I tried to stop the array - thus it wasn't empty).

bcbgboy13 · February 11, 2011

You have Gigabyte GA-H57M-USB3 with 8GB of memory (you did not state the brand)

H57 is a relatively new chipset so it looks strange that this board has been deactivated now by Newegg. And also the reviews there were / are generally not favorable.

This looks to me as a troubling sign.

So I will suggest to decrease the memory to 4GB only, make sure you have the latest BIOS, load the optimal settings and then disable any unused hardware (serial and par.ports, firewire, audio, IDE ports, etc,).

Also do not leave any voltage setting in the BIOS at [auto]. If auto-detected at 1.2V change it manually to 1.2V. Bump the voltage settings for the memory with 0.05V to 0.1V, do the memtest again and lets hope that this will keep you out of troubles.

mcai3db3 · February 11, 2011

You have Gigabyte GA-H57M-USB3 with 8GB of memory (you did not state the brand)

H57 is a relatively new chipset so it looks strange that this board has been deactivated now by Newegg. And also the reviews there were / are generally not favorable.

This looks to me as a troubling sign.

So I will suggest to decrease the memory to 4GB only, make sure you have the latest BIOS, load the optimal settings and then disable any unused hardware (serial and par.ports, firewire, audio, IDE ports, etc,).

Also do not leave any voltage setting in the BIOS at [auto]. If auto-detected at 1.2V change it manually to 1.2V. Bump the voltage settings for the memory with 0.05V to 0.1V, do the memtest again and lets hope that this will keep you out of troubles.

My ram is Kingston HyperX 8GB (4 x 2GB) 240-Pin DDR3 SDRAM DDR3 1600 (PC3 12800) Desktop Memory Model KHX1600C9AD3K2/4G

I'll take a look at the BIOS tonight and see what my settings are, I believe I already updated the BIOS when I first received the components. I'm not sure how easy it is to change all of this stuff though, will have to take a look. I'm apprehensive to change anything until it crashes again though, thanks for your help.

bcbgboy13 · February 11, 2011

Kingston are generally very good but KHX1600C9AD3K2/4G are rated at 1.65V and I hope you are driving them with at least this voltage.

DDR3 standard is 1.5V

I personally wont buy (and use for Unraid) any performance memories but it is just me.

mcai3db3 · February 11, 2011

haha I think you're grossly overestimating my knowledge on this kinda thing! Right now the voltages will be whatever the default was. I'll give them a look tonight and let you know what they say.

Frequent Crashes - unknown cause

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation