Jump to content
cdnboy

[Closed] HP Proliant Microserver 10 plus crash

22 posts in this topic Last Reply

Recommended Posts

New to unraid and just got an HP micro server 10 plus.  Every couple days or in the case today after about 12 hours it just hangs.  Web interface not available and most of the time the docker containers not available.  Today was able to capture the syslog through an ssh session.   Nothing there makes sense on how to fix. Running the latest beta and wonder if I should roll back and start over - again   unraid crash 09-22-2020.rtf

rubbersoul-diagnostics-20200922-2056.zip

Edited by cdnboy
Closed issue

Share this post


Link to post

Am I correct in assuming that this is a brand-new piece of hardware?  If this is the case, I would suggest that you run a 24 hour memtst from the Unraid boot menu.  

 

Also, in the future, please post any captured syslogs as straight text (*.txt) files.  They are much much easier for the Gurus to read in that format.  There is also a Syslog Server included in Unraid.  For instructions in how to use it, see here:

 

      https://forums.unraid.net/topic/46802-faq-for-unraid-v6/page/2/?tab=comments#comment-781601

 

Share this post


Link to post

Yes it is a new piece of hardware - I just downgraded from the beta and will see if it crashes again.   Basically a fresh start as I did move the USB from one system to another.  If it crashes again I will do the memtest and will be setting up a syslog server.  I thought was over the crashes since I removed some logging messages

Share this post


Link to post

I was able to get it to capture the crash again.   I did have the console running as well and was able to try a ping and failed then I tried ifconfig -a and it didnt do anything and I couldn't control C to get out.   Had to do hard reboot.    

I have done a memtest and HP system test in the BIOS and all passed

I am running the lastest stable release

crash1008.txt

Share this post


Link to post

Crashed again - I have mover turned off to make sure wasn't that.

On the server had the GUI interface running with a monitor attached.   The GUI was frozen - time showed 03:57 and CPUs were all pegged at 100%

Had to do a hard reset and have attached the syslog

 

crash1009.txt

Share this post


Link to post

I would at this point suggest that you try the Safe Mode.

 

May I ask why you are only sending a portion of the syslog?  (Often, there is some clue as to what is happening to cause the problem prior to when the server starts barfing.  Even if there isn't, that is also a clue..)

Share this post


Link to post

At this point, I believe that you probably have a hardware problem.  (New hardware makes this even more probable.)  I would start with a memory test.  However, you seem to have ECC memory in this server.  If that is the case, the standard memtst program that is a part of the Unraid distribution does not properly test ECC memory.  You will have to download the proper package from MemTest86 website.  Google to find it...

 

A second option might be to return this hardware to the seller as defective and request replacement. 

Share this post


Link to post

Ok.  Just did the complete memtest and no errors. Recognized the ECC memory in the test as well.   Doesn't look like a hardware issue I can point out to HP for a replacement or service call.    

Could there be something on the software side if I moved from an older AMD to this PC?   I think I started fresh just trying to think of options.   Or do I go with the latest Beta which has a newer kernel as maybe there is a support issue with something on my side?   I really like what I have seen with unraid and dont want to switch to promox or anything to see if it helps

Share this post


Link to post

I was hoping that someone else would jump in to provide another set of eyes (and thoughts) on your problem.  But alas...

 

AMD systems (Ryzen) do have some stability issues-- and I believe all of them can be addressed with various system settings.  But I believe that your system is Intel based and I have not heard of any issues with them. 

 

OK, I would do the upgrade and look at the latest beta series.  Get screen shots of your current setup-- disk assignments, basic network setup and server ID information.  Use a Trial license to test.  If it crashes, I believe you have a hardware problem as I have not heard of any issues with basic NAS functions.

 

EDIT:  One more thing to try is one of the Run-From-Bootable-CD/FlashDrive Linux distributions and see if that has a problem.

Edited by Frank1940

Share this post


Link to post

I am starting as a fresh system, installed the preclear plugin and clearing all the disks now.  Not sure if it could be the .m2 drive I have in there off a startech card.   Also been looking in the BIOS if there is anything.   Been also looking at the forums to see if there are HP specific things I can try.  Debated to try FreeNAS or Xpenology on it since it is a bit of a play system and was to be my backups for my Synology.   Will try this fresh unRaid config again to see how it behaves

I was hoping another guru might t give us some other ideas as well

Share this post


Link to post

I see lots of network errors in the last syslog you uploaded, along with some BRRFS warnings. But one thing at a time.

 

Sometimes HP Proliant onboard network controllers are stupid. So let's try this: Make a copy of your network.cfg file for safe keeping. Then delete the one you have off the usb drive. This will reset network configuration. Also disable the openvpn plugin. Boot normally and see if the problem fixes itself. If so, then re-enamble openvpn and reboot. If it crashes, then you have a configuration issue with openvpn OR the onboard controllers are being finicky with correct settings and there is nothing more to be done except to try the latest beta with newer linux kernel or go without openvpn.

 

If it crashes regardless of the above, then if you're only using one of your ports, disable the others in Unraid networking settings. If it crashes, then go into the bios and manually disable the ports you aren't using, delete the network.cfg file again, and reboot.

 

 

 

 

Share this post


Link to post

Ok, I suspected something on the network side and set a static address and tried without the interfaces being bonded etc.   I am redoing the USB and preclearing the disks.  Will make sure the file system is set for xfs as well.   I will skip openVPN with the new config as I can install that elsewhere.   Just patiently waiting to finish the preclears and will start with just the one interface.  

Should I use the cache m2 device with this fresh config?

Share this post


Link to post
7 hours ago, cdnboy said:

Ok, I suspected something on the network side and set a static address and tried without the interfaces being bonded etc.   I am redoing the USB and preclearing the disks.  Will make sure the file system is set for xfs as well.   I will skip openVPN with the new config as I can install that elsewhere.   Just patiently waiting to finish the preclears and will start with just the one interface.  

Should I use the cache m2 device with this fresh config?

I would wait and see if you can get stability first. I don’t think it’s an issue but it’s one other thing that could be eliminated in a step by step process. If you are stable without it, you can always add/replace cache with it after.

Share this post


Link to post

Fingers crossed. Complete blank USB, drives precleared, one network interface and cache installed all as xfs.  Seems decent now and with some mover events to clear cache.  Running some Docker containers but no open vpn.  Did one reboot so not a complete two days but seems better with less syslog messages.  Or nothing that seems worrying.  Will watch for couple days then mark as solved. 

Share this post


Link to post

Spoke to soon - crashed over night - no syslog as its not running which looks like an issue with the beta.   I deleted /boot/config/rsyslog.conf but still has a message - rsyslogd: error during parsing file /etc/rsyslog.conf, on or before line 66

Was a Kernel panic on the console over night hopefully will capture it - doesn't look like this issue will go away

 

Share this post


Link to post
52 minutes ago, cdnboy said:

Spoke to soon - crashed over night - no syslog as its not running which looks like an issue with the beta.   I deleted /boot/config/rsyslog.conf but still has a message - rsyslogd: error during parsing file /etc/rsyslog.conf, on or before line 66

Was a Kernel panic on the console over night hopefully will capture it - doesn't look like this issue will go away

 

see if you can capture what the call trace is for. that could help.

Share this post


Link to post
10 minutes ago, 1812 said:

see if you can capture what the call trace is for. that could help.

Can I do that without a working syslog?

 

Share this post


Link to post
23 hours ago, cdnboy said:

Can I do that without a working syslog?

 

have you tried setting up syslog server or writing a copy to usb? it'll preserve everything up to the moment of lockup

Share this post


Link to post

Syslog won't start - another issue.

Reformated the system again and trying another OS on it

How do I mark this thread as closed?

Share this post


Link to post
4 hours ago, cdnboy said:

How do I mark this thread as closed?

Edit your first post in the thread.

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.