newbie_dude Posted February 1, 2016 Share Posted February 1, 2016 Hi All, My parity checks have become a source of pain since last month. Unraid will start a parity check and then shut down without any notice. I think it might be temperature related, but that should be a clean shoutdown right? I have clean powerdown installed. The next time I start the array, it detects an unclean shutdown. Is there any way for me to confirm? To see if there was any logs, I ran "tail -F /var/log/syslog" over ssh to try and capture any output. Unfortunately, the only output I see after starting parity check and before shutdown is this: Feb 1 18:05:13 Tower emhttp: shcmd (80): /usr/sbin/hdparm -y /dev/sdg &> /dev/null SDG is my cache drive. Would the rest of the log help? I have what I captured from syslog over ssh. Is there anything I can do to track down the cause? Thanks so much! Quote Link to comment
trurl Posted February 2, 2016 Share Posted February 2, 2016 If you mean drive temp, those are displayed and you should get a warning before it hits critical. Do you have notifications set up? If you mean CPU or mobo overtemp shutdown that is out of unRAID control. Quote Link to comment
Squid Posted February 2, 2016 Share Posted February 2, 2016 Most likely a CPU overtemp, or the power supply is giving out with the load on it. Quote Link to comment
newbie_dude Posted February 2, 2016 Author Share Posted February 2, 2016 Thanks so much! That actually giving me a lot to think about. The only reason I suspected HD overtemp was because the very first time this happened, I got an email with HD Temp warning because I do have notifications setup. Then I noticed the server had shut down. Ever since then I do not get a warning email, the server just shuts down. And I just assumed it was the same issue. I will install the Dynamix System Temperature plugin and check to see what's happening. I have a cheap amd sempron 145 in the system because I didn't want anything fancy. But if it's overheating, maybe it's time for an upgrade This is what I get when I follow https://lime-technology.com/wiki/index.php/Setting_up_CPU_and_board_temperature_sensing: k10temp-pci-00c3 Adapter: PCI adapter temp1: +53.5°C (high = +70.0°C) nct6776-isa-0290 Adapter: ISA adapter Vcore: +1.06 V (min = +0.00 V, max = +1.74 V) in1: +1.86 V (min = +0.00 V, max = +0.00 V) ALARM AVCC: +3.38 V (min = +2.98 V, max = +3.63 V) +3.3V: +3.38 V (min = +2.98 V, max = +3.63 V) in4: +1.62 V (min = +0.00 V, max = +0.00 V) ALARM in5: +1.72 V (min = +0.00 V, max = +0.00 V) ALARM in6: +0.96 V (min = +0.00 V, max = +0.00 V) ALARM 3VSB: +3.46 V (min = +2.98 V, max = +3.63 V) Vbat: +3.38 V (min = +2.70 V, max = +3.63 V) fan1: 804 RPM (min = 0 RPM) fan2: 3183 RPM (min = 0 RPM) fan3: 0 RPM (min = 0 RPM) fan4: 0 RPM (min = 0 RPM) fan5: 0 RPM (min = 0 RPM) SYSTIN: +38.0°C (high = +0.0°C, hyst = +0.0°C) ALARM sensor = thermistor CPUTIN: +54.0°C (high = +80.0°C, hyst = +75.0°C) sensor = thermistor AUXTIN: +3.5°C (high = +80.0°C, hyst = +75.0°C) sensor = thermistor PCH_CHIP_TEMP: +0.0°C PCH_CPU_TEMP: +0.0°C PCH_MCH_TEMP: +0.0°C intrusion0: ALARM intrusion1: ALARM beep_enable: disabled Should I be keeping an eye on CPUTIN? 54 seems very high for idle does it not? I haven't started the array yet since it will start a parity check. I will run sensors over ssh every 30 seconds and see what happens to it and when it shuts down with root@Tower:~# while true; do date; sensors; sleep 30; done I hope it's not the power supply though. It's a 1000w supply that is supposed to be good This is just a very odd issue. It will shut down for a few times, and then complete the parity check without shutting down. No idea why. I will be back. Thanks so much for the hints! Quote Link to comment
newbie_dude Posted February 2, 2016 Author Share Posted February 2, 2016 Well, it's definitely that .... When I started parity check: Mon Feb 1 22:07:02 EST 2016 k10temp-pci-00c3 Adapter: PCI adapter temp1: +56.0°C (high = +70.0°C) nct6776-isa-0290 Adapter: ISA adapter Vcore: +1.23 V (min = +0.00 V, max = +1.74 V) in1: +1.86 V (min = +0.00 V, max = +0.00 V) ALARM AVCC: +3.38 V (min = +2.98 V, max = +3.63 V) +3.3V: +3.36 V (min = +2.98 V, max = +3.63 V) in4: +1.63 V (min = +0.00 V, max = +0.00 V) ALARM in5: +1.72 V (min = +0.00 V, max = +0.00 V) ALARM in6: +1.00 V (min = +0.00 V, max = +0.00 V) ALARM 3VSB: +3.46 V (min = +2.98 V, max = +3.63 V) Vbat: +3.38 V (min = +2.70 V, max = +3.63 V) fan1: 809 RPM (min = 0 RPM) fan2: 3176 RPM (min = 0 RPM) fan3: 0 RPM (min = 0 RPM) fan4: 0 RPM (min = 0 RPM) fan5: 0 RPM (min = 0 RPM) SYSTIN: +40.0°C (high = +0.0°C, hyst = +0.0°C) ALARM sensor = thermistor CPUTIN: +56.0°C (high = +80.0°C, hyst = +75.0°C) sensor = thermistor AUXTIN: +0.5°C (high = +80.0°C, hyst = +75.0°C) sensor = thermistor PCH_CHIP_TEMP: +0.0°C PCH_CPU_TEMP: +0.0°C PCH_MCH_TEMP: +0.0°C intrusion0: ALARM intrusion1: ALARM beep_enable: disabled When I decided to chicken out: Mon Feb 1 22:10:33 EST 2016 k10temp-pci-00c3 Adapter: PCI adapter temp1: +74.6°C (high = +70.0°C) nct6776-isa-0290 Adapter: ISA adapter Vcore: +1.06 V (min = +0.00 V, max = +1.74 V) in1: +1.85 V (min = +0.00 V, max = +0.00 V) ALARM AVCC: +3.38 V (min = +2.98 V, max = +3.63 V) +3.3V: +3.36 V (min = +2.98 V, max = +3.63 V) in4: +1.64 V (min = +0.00 V, max = +0.00 V) ALARM in5: +1.72 V (min = +0.00 V, max = +0.00 V) ALARM in6: +1.02 V (min = +0.00 V, max = +0.00 V) ALARM 3VSB: +3.46 V (min = +2.98 V, max = +3.63 V) Vbat: +3.38 V (min = +2.70 V, max = +3.63 V) fan1: 817 RPM (min = 0 RPM) fan2: 3176 RPM (min = 0 RPM) fan3: 0 RPM (min = 0 RPM) fan4: 0 RPM (min = 0 RPM) fan5: 0 RPM (min = 0 RPM) SYSTIN: +41.0°C (high = +0.0°C, hyst = +0.0°C) ALARM sensor = thermistor CPUTIN: +71.0°C (high = +80.0°C, hyst = +75.0°C) sensor = thermistor AUXTIN: -0.5°C (high = +80.0°C, hyst = +75.0°C) sensor = thermistor PCH_CHIP_TEMP: +0.0°C PCH_CPU_TEMP: +0.0°C PCH_MCH_TEMP: +0.0°C intrusion0: ALARM intrusion1: ALARM beep_enable: disabled Now: Mon Feb 1 22:17:44 EST 2016 k10temp-pci-00c3 Adapter: PCI adapter temp1: +62.0°C (high = +70.0°C) nct6776-isa-0290 Adapter: ISA adapter Vcore: +1.06 V (min = +0.00 V, max = +1.74 V) in1: +1.86 V (min = +0.00 V, max = +0.00 V) ALARM AVCC: +3.38 V (min = +2.98 V, max = +3.63 V) +3.3V: +3.38 V (min = +2.98 V, max = +3.63 V) in4: +1.65 V (min = +0.00 V, max = +0.00 V) ALARM in5: +1.72 V (min = +0.00 V, max = +0.00 V) ALARM in6: +1.05 V (min = +0.00 V, max = +0.00 V) ALARM 3VSB: +3.46 V (min = +2.98 V, max = +3.63 V) Vbat: +3.38 V (min = +2.70 V, max = +3.63 V) fan1: 817 RPM (min = 0 RPM) fan2: 3183 RPM (min = 0 RPM) fan3: 0 RPM (min = 0 RPM) fan4: 0 RPM (min = 0 RPM) fan5: 0 RPM (min = 0 RPM) SYSTIN: +42.0°C (high = +0.0°C, hyst = +0.0°C) ALARM sensor = thermistor CPUTIN: +62.0°C (high = +80.0°C, hyst = +75.0°C) sensor = thermistor AUXTIN: -3.0°C (high = +80.0°C, hyst = +75.0°C) sensor = thermistor PCH_CHIP_TEMP: +0.0°C PCH_CPU_TEMP: +0.0°C PCH_MCH_TEMP: +0.0°C intrusion0: ALARM intrusion1: ALARM beep_enable: disabled The CPU temp is going up, but what is going up even more is the PCI device. Which is my sata controller card (I think. I have a pci-e sata card and a pci video card and that's it). It doesn't have a fan. But I guess it's time for me to put in a fan on the side vent to cool it and see what happens! Is there even remotely a possibility that this might be an unraid issue? I have had the same setup for about 8 months or so. And only recently upgraded to unraid 6 when this started happening. Thanks so much!! Quote Link to comment
Frank1940 Posted February 2, 2016 Share Posted February 2, 2016 I would suggest opening up the case and inspect the inside of the case. Look at the CPU cooling fins. Are they filled up with dust? Start up the server and look at the all of the fans. Are they running? Are the fan blades clogged with dust and dirt. Are the inlets on the case clogged with dust? It would not hurt to take the case outside (or some place where you aren't worried it gets really, really dirty) and blow out all of the accumulated dust and dirt. My semprons run about 38C... and I have a speed controller on the case fans so they are not running at full speed. If, after you have cleaned everything, you still have a cooling problem, I would suggest researching strategies for proper case cooling for servers. The problems and solutions are a bit different than for gaming systems. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.