Server powering itself off after 6.3.0 upgrade


Ryland

Recommended Posts

My unraid server has started occasionally powering itself off, I think when disk writes are occurring and I can't find a reason for it.  One thing I did notice was that shortly after upgrading I found that my server stopped responding and top showed that the shfs service was using 100% cpu.  Searching on that led to questions about reseirfs formatted disks, which I have two of but smart diagnostics shows them both as GOOD drives.  The last time this happened the parity check found over 2500 errors which needed fixing. I have attached the diagnostics logs.

 

Edit: I was just looking around my NAS and found that one of three files that would have been moved from the cache drive last night was 0 bytes in length.

tower-diagnostics-20170212-0839.zip

Link to comment

I can shut it down and blow it out once the parity check is finished but it is making it through parity checks.  I expect that an overheating issue is the cause but I was thinking that the shfs process keeping the cpu pegged is causing the overheat and not regular operations.

Link to comment

Searching on that led to questions about reseirfs formatted disks, which I have two of but smart diagnostics shows them both as GOOD drives. 

 

The reiserfs issue has nothing to do with good or bad disks, the problem is reiserfs itself, recommend converting the remaining disks to xfs.

 

 

I expect that an overheating issue is the cause but I was thinking that the shfs process keeping the cpu pegged is causing the overheat and not regular operations.

 

If the cooling was working as it should the server wouldn't overheat even with CPU stuck at 100% for days.

Link to comment

If the cooling was working as it should the server wouldn't overheat even with CPU stuck at 100% for days.

Got it.  First order of business once the parity check is done is to blow out all the fans.  Second order of business is to figure out how to switch my disks from reiserfs to xfs.

 

Edit:  What Im finding is requiring me adding another disk to may array to copy the contents to which I really dont want to do unless I absolutely have to.  Is there a method to copy the contents of one of my drives to the others and reformat it as xfs?

 

Edit2: This looks like the proper sequence https://lime-technology.com/forum/index.php?topic=50111.0

Link to comment

Is it actually powering down, rather than hanging? It would be worth checking the power supply and running the Fix Common Problems plugin in troubleshooting mode to try and catch something in the log.

 

It was actually powering down in such a way that I had to shut the psu off so that I could turn it back on.  Im in the process of moving files off one of the resierfs disks so that I can convert it to xfs.

Link to comment

Is it actually powering down, rather than hanging? It would be worth checking the power supply and running the Fix Common Problems plugin in troubleshooting mode to try and catch something in the log.

 

It was actually powering down in such a way that I had to shut the psu off so that I could turn it back on.  Im in the process of moving files off one of the resierfs disks so that I can convert it to xfs.

 

Really? Sure sounds like a dodgy PSU to me.

Link to comment

I may have a spare good PSU in my basement from a previous build.  If not I can track one down since I dont really need a particularly powerful one.

 

You are really looking for one with a single 12V rail.  (It is cheaper to manufacture PS's with two 12V rails...)  It is important to realize that if the power rating is exceeded for a fraction of a second on any rail, the whole PS will shutdown! 

Link to comment

You are really looking for one with a single 12V rail.  (It is cheaper to manufacture PS's with two 12V rails...)  It is important to realize that if the power rating is exceeded for a fraction of a second on any rail, the whole PS will shutdown!

 

And here I was going to go looking to try to find PSU recommendations, thanks.

 

I will have to take a look to see what is currently in that machine but im pretty sure its about a 2 year old Antec gold psu.  I converted one of my two reiserfs disks yesterday and was in the process of copying all the files off the last one when the nas rebooted again around 1am.  Its currently checking parity so I will start the rsync again when I get home from work tonight which should only take another 4 hours or so then I can convert it and start moving files back over to it.

 

My best guess as to the shutdowns was the CPU running at 100% for too long and overheating but thats a bit iffy.  It hasnt done that again but I also havent seen that process going crazy either.  I do have a diagnostics zip from the first time I saw the process at 100% before I forced the machine to turn off.

 

Im not sure whats causing my reboots but last time I had this problem all fingers pointed at Plex and switching it to a docker solved it.  Its still running in a docker so I doubt thats it.

Link to comment

Do you have the Dynamix System Temperature plugin installed? That plugin can provide the actual temperature of your CPU.  Also made sure that your CPU fan is plugged into the proper fan header on your MB.  If the plugin doesn't work properly on your system, you could reboot your server after the CPU has been at 100% for fifteen to twenty minutes and see what the BIOS reports the temperature being.  You might also check to see if the max temperature of CPU can be set in the BIOS. 

 

You have never told us what Hardware you are running.  Overheating problems with unRAID systems are usually mechanical in nature and not the result of software overloading the system.  Where the ventilation of the case has been carefully thought out, The CPU should be able to be run at 100% forever without any overheating issues.  (Ignoring the complaint of sluggish response...) 

 

When the CPU is running at 100%, you might want to log into your server (via the console or PuTTY), and run the command  htop  and that will show what processes are running and what percent of the CPU each is using.

Link to comment

Do you have the Dynamix System Temperature plugin installed? That plugin can provide the actual temperature of your CPU.  Also made sure that your CPU fan is plugged into the proper fan header on your MB.  If the plugin doesn't work properly on your system, you could reboot your server after the CPU has been at 100% for fifteen to twenty minutes and see what the BIOS reports the temperature being.  You might also check to see if the max temperature of CPU can be set in the BIOS. 

 

You have never told us what Hardware you are running.  Overheating problems with unRAID systems are usually mechanical in nature and not the result of software overloading the system.  Where the ventilation of the case has been carefully thought out, The CPU should be able to be run at 100% forever without any overheating issues.  (Ignoring the complaint of sluggish response...) 

 

When the CPU is running at 100%, you might want to log into your server (via the console or PuTTY), and run the command  htop  and that will show what processes are running and what percent of the CPU each is using.

 

The hardware is part of the problem in that its a gigabyte motherboard which doesnt allow fan control.  I will get exact models when I get home and can look at the boxes but for now its an i3 running on a gigabyte motherboard in a small tower case.  I have front and rear fans both constantly blowing and the stock cpu fan.  I may have accidentally covered the side air holes on the case.  I do have the dynamix plugin installed but the machine has been powering off in the middle of the night and I only once caught it where it turned sluggish and that was due to "shfs" taking 100% of the cpu.  At that point I could only ssh into the machine and I did run diagnostics which I could post the log later on if useful. 

 

From what I could find the shfs pegging the cpu was caused by having reiserfs disks so Im in the process of converting them to xfs with one left to go. 

Link to comment

For most unRAID servers, we want to have the air flow in through the front of the case, across the Hard Drives first and exhaust it out the back of the case.  The case-side openings should either (a) be plugged or (b) have a fan installed blowing out.  (Most servers tend to have a number of drives tightly packed together and high drive temperatures is usually the problem area.) 

 

You will note that I have two i3 servers, the CPU temperatures are 39C and 34C.  (The 34C one is in an unheated portion of the basement where the temperature is probably about 60F.)

Link to comment

I have a fan in the front pulling air across the drives and another fan to expel the hot air out the back.  My drives are normally around 34C when being used.

 

Many  of us have configured the front fans so that they blow air into the case.  (The premise being that the hard drives are probably the most sensitive to degradation due to high temperatures.)  You have to carefully look at how the air is flowing through your case.  You have to allow for sufficient air to be able to get into the case. The CFM of these types of fans drop rapidly as the pressure different between the back and the front of the fan increases!  If you are pulling air out of the case front-and-back, where is the air entering the case? You should be trying to maximize the air flow through the case...

Link to comment

 

Many  of us have configured the front fans so that they blow air into the case.  (The premise being that the hard drives are probably the most sensitive to degradation due to high temperatures.)  You have to carefully look at how the air is flowing through your case.  You have to allow for sufficient air to be able to get into the case. The CFM of these types of fans drop rapidly as the pressure different between the back and the front of the fan increases!  If you are pulling air out of the case front-and-back, where is the air entering the case? You should be trying to maximize the air flow through the case...

 

The fan in the front is pulling air INTO the case across the drives and the fan at the back is pushing air OUT of the case.  I guess I didnt make that as clear as I thought I had.  The side panel of the case has holes in it that could be screwing airflow up.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.