Jump to content
donr

(Solved) Unraid 6.6.3 constant reboot

15 posts in this topic Last Reply

Recommended Posts

Hi everyone. I have been running my unraid box for quite some years now but, recently noticed a problem. The server would constantly reboot every 2-3 minutes. One of my disks is not recognized anymore. So I opened up the server, cleaned everything, made sure all sata cables were plugged in properly and rebooted. Same thing. So again, I opened my box, checked my Seasonic 750 watt psu under load and everything checks out. I plugged in one of those post code debug cards and everything checks out. I installed a Linux OS on a thumb drive so I could stress my pc with the stress command. Even with the thumb drive, the server would reboot 2 to 3 times before stabilizing. I stress tested the cpu, io, and what not for a few hours and all good. I started memtest and let it run for 24 hours and not one single error. I thought maybe the unraid thumb drive was corrupt but, it rebooted when I tried it with another OS. I am in the process of rebuilding my #2 disk after it was recognized and, I hope I did not do anything wrong because sync shows a total size of 4 TB, which is the size of my disk and not the array. Phew, I am running out of wind. Anyway, I downloaded the diagnostic for yesterday and the one today while rebuilding. I don't know how to post them or if they are even needed. Oh BTW, all my dockers are gone and I don't know why. I don't have another psu to swap, in case someone suggests it. Any help would be greatly appreciated.
Don

PS- I included the latest diagnostic. I hope this is the way to do it.

tower-diagnostics-20190805-0321.zip

Share this post


Link to post
24 minutes ago, donr said:

I thought maybe the unraid thumb drive was corrupt but, it rebooted when I tried it with another OS.

What does the underlined "it" stand for---  the server hardware, the boot disk, etc?  

 

27 minutes ago, donr said:

I don't have another psu to swap, in case someone suggests it.

Do you have a friend who might have one?  I have occasionally used a vendor with a liberal return policy from whom I have purchased an items to test the possibility that the current installed one might be defective...     

 

32 minutes ago, donr said:

 Any help would be greatly appreciated.

1--   Have you tried the Safe Mode boot option?  

 

2-- Your UPS is not being detected?  Do you know why?  Could the UPS be flakely?

 

3-- Hook up a monitor and keyboard up and see if anything appears on the screen when the reboot occurs.  (Reboots are more difficult than crashes as information on the screen may only be there for a second or two. You may want to use a camera if this is the case.  Try to use something to steady it as someone will have to try to read the resulting photo!   

Share this post


Link to post
14 minutes ago, Frank1940 said:

What does the underlined "it" stand for-----   The Unraid Server

 

Do you have a friend who might have one?------  No I don't. I need at least 600 watts. If all else fails, I will buy a new one and if that does not work, I will have a spare.

 

1--   Have you tried the Safe Mode boot option?  No I have not. What will this accomplish?

 

2-- Your UPS is not being detected?  Do you know why?------ Because the server is on my bench and not connected to the UPS

 

3-- Hook up a monitor and keyboard up and see if anything appears on the screen when the reboot occurs.   ----  Already hooked up on my bench. This is where I am running the rebuild and so far after 5 hours, the server is still running and no reboots.  Fingers crossed!!

 

Share this post


Link to post
1 hour ago, donr said:

This is where I am running the rebuild and so far after 5 hours, the server is still running and no reboots.  Fingers crossed!!

Is this server now on the bench now opened up -- side off?  IS it significantly cooler than the regular location?  Is the CPU fan running?  A few years ago, there would be a question here about the PS fan but many PS now use fan control...  (Rebooting every few minutes has all signs of being heat related.)  That, or the UPS has a problem as it is now not a part of the equation and the earlier problem of rebooting every two or three minutes has not resurfaced after several hours.   

 

2 hours ago, donr said:

1--   Have you tried the Safe Mode boot option?  No I have not. What will this accomplish?

This is often done just to eliminate the possibility that one of the plugins, Dockers, or VM's is causing an issue. 

Share this post


Link to post
30 minutes ago, Frank1940 said:

Is this server now on the bench now opened up -- side off?---- Yes

IS it significantly cooler than the regular location?  Is the CPU fan running? ------ Yes and Yes.

Rebooting every few minutes has all signs of being heat related. ----- I agree but, after I removed the server from it's location and after it had cooled down, the same problem existed. But now that I am thinking more about it, I did have alerts about the hard drives heating up (50 C) It is still rebuilding but the hard drives had started giving me alerts so I installed a 70 cm fan blowing right in the sever. All disks are now at 27 to 32 C and the rebuild has not stopped. The server has been in a dedicated space with air circulation for years and has given me no problems before although now, it has grown with more disks and also more dust. (It was very dirty when I took it out to bench test it) I do not believe the ups is to blame  given that it is only a few months old and the server did reboot on the bench without being plugged in to the ups.

I do believe that you pointed me in the right direction and it is very possible that the server did in fact overheat. It is humming along very nicely now. When the rebuild is complete, I will close the case, stop the big fan and boot in non-GUI mode as instructed above and see what happens. I still don't know where all my dockers went but, I can reinstall those if all is good. Thank you and I will give feedback in a few days to see if everything is going well or not.

Don

 

 

 

Main.png

Share this post


Link to post

Heat flow can be a problem with any server.   Best practice is to set up the fans and the case so that the incoming air flows first over the disks and exits out the rear of the case.  (Have a look at cooling setup of commercial servers.)  In my case, I have a fan opening in the right side panel of my server cases. (Apparently intended to blow air over the GPU card which many servers do not have.)  I have a fan in that opening BUT it blows out.  While this might seem counterintuitive, it is not. The combined suction of all the fans in the back half of the case (behind the HD's), helps to increase the total air flow available to cool both the HD's and the MB and LSI card.  You have to get rid of the hot air and replace it with cooler air!  And keep those air intakes clean...

Share this post


Link to post

Right now, I have 4- 80 mm fans in front of the hard drives, 2 in the back, one on top plus cpu and psu fans. I will verify flow of all fans so , basically the front ones should blow in the case and over the HD's while the rear ones and top pull the air out. I will add more if needed since I already have a bunch of these fans.

Don

Share this post


Link to post

It was to good to be true. The system rebooted after 80% of sync was done. I have to start all over again (roughly 12 hrs). I booted in safe mode, no plugins or gui. I tried running the command as explained in the other thread but it kept saying no such file or directory and tail has no files. So after booting in safe mode, I cd to /var/log and am running syslog from there.  I am attempting to rebuild again and will see what happens. Maybe the overheating damaged my motherboard, cpu or disks.

Share this post


Link to post
59 minutes ago, donr said:

I will add more if needed

This implies that you may have openings not sealed. ALL case openings should either flow incoming air over the drives, or have fans actively pushing air out. Any extraneous openings should be taped over, clear packing tape on the inside does a good job if you care how it looks. If you leave a passive opening, air will flow through the path of least resistance, bypassing your drives.

 

Consumer cases just aren't built for server grade 24/7 heat management, you have to be extra vigilant when setting up a server in a consumer grade case.

 

Also, be sure any disk controller cards have forced circulation. In a server case, it's designed to push air over all the slots. Consumer cases often leave a stagnant area extending from the bottom of the graphics card slot to the bottom of the case, since the only card that most consumers use that need extra cooling is the video card. Mount an extra fan internal to the case if needed.

Share this post


Link to post

It seems I posted my reply before yours came in but, it is duly noted. Thank you. BTW the drives or an average of 29 C with the fan blowing on them.

Share this post


Link to post

Make sure that you have clean the dust out of the cooling fins on the CPU cooler.  Someone a while back found that the CPU cooler mounting bracket had come loose and was not making a solid connection with the CPU cooling plate.  

Share this post


Link to post

I finally got the rebuild done in safe mode and the server has been on for over 13 hours. I will shut it down and take everything apart to really clean all the components. I will inspect the motherboard, try to stress test my psu to see if that could be the problem and replace whatever components might be defective. This server has been good to me over the years and although it only has 12 TB of movie and music files, I would hate to lose it all. I am thinking of taking apart my main pc, installing the I5 cpu and motherboard along with the 16 GB of ram in the server and running a VM as my main pc. Will have to watch more of SpaceInvader One videos on youtube to get that setup. Thanks for all the help.

Don

Share this post


Link to post

I know it's late but I wanted to give an update of what I did and now the system is working like a champ.

First, I dismantled part of the server. 2 of the 4 fans that are keeping the hard drives cool were shot. No wonder I  was getting alerts, So I replaced them  and added another 120 mm fan to remove hot air as suggested.

The system was still unstable so, I took my main pc apart and swapped motherboard, cpu and ram and was left with what I believe to be a corrupted flash drive. So I took a backup of the previous Unraid os and copied everything on the new flash drive. I applied the new key from Lime tech and rebooted. I had to rebuild my hd #2 again as it seems that the previous one was not left in memory. Now everything is working perfectly and I even managed to put the components taken out of the server and install them on this box. So special thanks to Frank1940  for his help, So now, how do I tag this as solved?

Share this post


Link to post
2 hours ago, donr said:

So now, how do I tag this as solved?

Edit the thread title in the first post. 

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.