cdnboy Posted September 23, 2020 Share Posted September 23, 2020 (edited) New to unraid and just got an HP micro server 10 plus. Every couple days or in the case today after about 12 hours it just hangs. Web interface not available and most of the time the docker containers not available. Today was able to capture the syslog through an ssh session. Nothing there makes sense on how to fix. Running the latest beta and wonder if I should roll back and start over - again unraid crash 09-22-2020.rtf rubbersoul-diagnostics-20200922-2056.zip Edited October 16, 2020 by cdnboy Closed issue Quote Link to comment
Frank1940 Posted September 23, 2020 Share Posted September 23, 2020 Am I correct in assuming that this is a brand-new piece of hardware? If this is the case, I would suggest that you run a 24 hour memtst from the Unraid boot menu. Also, in the future, please post any captured syslogs as straight text (*.txt) files. They are much much easier for the Gurus to read in that format. There is also a Syslog Server included in Unraid. For instructions in how to use it, see here: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/page/2/?tab=comments#comment-781601 Quote Link to comment
cdnboy Posted September 24, 2020 Author Share Posted September 24, 2020 Yes it is a new piece of hardware - I just downgraded from the beta and will see if it crashes again. Basically a fresh start as I did move the USB from one system to another. If it crashes again I will do the memtest and will be setting up a syslog server. I thought was over the crashes since I removed some logging messages Quote Link to comment
cdnboy Posted October 8, 2020 Author Share Posted October 8, 2020 I was able to get it to capture the crash again. I did have the console running as well and was able to try a ping and failed then I tried ifconfig -a and it didnt do anything and I couldn't control C to get out. Had to do hard reboot. I have done a memtest and HP system test in the BIOS and all passed I am running the lastest stable release crash1008.txt Quote Link to comment
cdnboy Posted October 9, 2020 Author Share Posted October 9, 2020 Crashed again - I have mover turned off to make sure wasn't that. On the server had the GUI interface running with a monitor attached. The GUI was frozen - time showed 03:57 and CPUs were all pegged at 100% Had to do a hard reset and have attached the syslog crash1009.txt Quote Link to comment
Frank1940 Posted October 9, 2020 Share Posted October 9, 2020 I would at this point suggest that you try the Safe Mode. May I ask why you are only sending a portion of the syslog? (Often, there is some clue as to what is happening to cause the problem prior to when the server starts barfing. Even if there isn't, that is also a clue..) Quote Link to comment
cdnboy Posted October 9, 2020 Author Share Posted October 9, 2020 Thanks - I have attached the complete syslog now. Will reboot to safe mode in a bit syslog-192.168.50.15.log Quote Link to comment
cdnboy Posted October 10, 2020 Author Share Posted October 10, 2020 Switched to safe mode and still crashed overnight so not the plugins causing it Some kernel messages again before crash - Fixing recursive fault but reboot is needed! syslog-192.168.50.15.log Quote Link to comment
Frank1940 Posted October 10, 2020 Share Posted October 10, 2020 At this point, I believe that you probably have a hardware problem. (New hardware makes this even more probable.) I would start with a memory test. However, you seem to have ECC memory in this server. If that is the case, the standard memtst program that is a part of the Unraid distribution does not properly test ECC memory. You will have to download the proper package from MemTest86 website. Google to find it... A second option might be to return this hardware to the seller as defective and request replacement. Quote Link to comment
cdnboy Posted October 10, 2020 Author Share Posted October 10, 2020 Ok. Just did the complete memtest and no errors. Recognized the ECC memory in the test as well. Doesn't look like a hardware issue I can point out to HP for a replacement or service call. Could there be something on the software side if I moved from an older AMD to this PC? I think I started fresh just trying to think of options. Or do I go with the latest Beta which has a newer kernel as maybe there is a support issue with something on my side? I really like what I have seen with unraid and dont want to switch to promox or anything to see if it helps Quote Link to comment
Frank1940 Posted October 12, 2020 Share Posted October 12, 2020 (edited) I was hoping that someone else would jump in to provide another set of eyes (and thoughts) on your problem. But alas... AMD systems (Ryzen) do have some stability issues-- and I believe all of them can be addressed with various system settings. But I believe that your system is Intel based and I have not heard of any issues with them. OK, I would do the upgrade and look at the latest beta series. Get screen shots of your current setup-- disk assignments, basic network setup and server ID information. Use a Trial license to test. If it crashes, I believe you have a hardware problem as I have not heard of any issues with basic NAS functions. EDIT: One more thing to try is one of the Run-From-Bootable-CD/FlashDrive Linux distributions and see if that has a problem. Edited October 12, 2020 by Frank1940 Quote Link to comment
cdnboy Posted October 12, 2020 Author Share Posted October 12, 2020 I am starting as a fresh system, installed the preclear plugin and clearing all the disks now. Not sure if it could be the .m2 drive I have in there off a startech card. Also been looking in the BIOS if there is anything. Been also looking at the forums to see if there are HP specific things I can try. Debated to try FreeNAS or Xpenology on it since it is a bit of a play system and was to be my backups for my Synology. Will try this fresh unRaid config again to see how it behaves I was hoping another guru might t give us some other ideas as well Quote Link to comment
1812 Posted October 12, 2020 Share Posted October 12, 2020 I see lots of network errors in the last syslog you uploaded, along with some BRRFS warnings. But one thing at a time. Sometimes HP Proliant onboard network controllers are stupid. So let's try this: Make a copy of your network.cfg file for safe keeping. Then delete the one you have off the usb drive. This will reset network configuration. Also disable the openvpn plugin. Boot normally and see if the problem fixes itself. If so, then re-enamble openvpn and reboot. If it crashes, then you have a configuration issue with openvpn OR the onboard controllers are being finicky with correct settings and there is nothing more to be done except to try the latest beta with newer linux kernel or go without openvpn. If it crashes regardless of the above, then if you're only using one of your ports, disable the others in Unraid networking settings. If it crashes, then go into the bios and manually disable the ports you aren't using, delete the network.cfg file again, and reboot. Quote Link to comment
cdnboy Posted October 12, 2020 Author Share Posted October 12, 2020 Ok, I suspected something on the network side and set a static address and tried without the interfaces being bonded etc. I am redoing the USB and preclearing the disks. Will make sure the file system is set for xfs as well. I will skip openVPN with the new config as I can install that elsewhere. Just patiently waiting to finish the preclears and will start with just the one interface. Should I use the cache m2 device with this fresh config? Quote Link to comment
1812 Posted October 13, 2020 Share Posted October 13, 2020 7 hours ago, cdnboy said: Ok, I suspected something on the network side and set a static address and tried without the interfaces being bonded etc. I am redoing the USB and preclearing the disks. Will make sure the file system is set for xfs as well. I will skip openVPN with the new config as I can install that elsewhere. Just patiently waiting to finish the preclears and will start with just the one interface. Should I use the cache m2 device with this fresh config? I would wait and see if you can get stability first. I don’t think it’s an issue but it’s one other thing that could be eliminated in a step by step process. If you are stable without it, you can always add/replace cache with it after. Quote Link to comment
cdnboy Posted October 15, 2020 Author Share Posted October 15, 2020 Fingers crossed. Complete blank USB, drives precleared, one network interface and cache installed all as xfs. Seems decent now and with some mover events to clear cache. Running some Docker containers but no open vpn. Did one reboot so not a complete two days but seems better with less syslog messages. Or nothing that seems worrying. Will watch for couple days then mark as solved. Quote Link to comment
cdnboy Posted October 15, 2020 Author Share Posted October 15, 2020 Spoke to soon - crashed over night - no syslog as its not running which looks like an issue with the beta. I deleted /boot/config/rsyslog.conf but still has a message - rsyslogd: error during parsing file /etc/rsyslog.conf, on or before line 66 Was a Kernel panic on the console over night hopefully will capture it - doesn't look like this issue will go away Quote Link to comment
1812 Posted October 15, 2020 Share Posted October 15, 2020 52 minutes ago, cdnboy said: Spoke to soon - crashed over night - no syslog as its not running which looks like an issue with the beta. I deleted /boot/config/rsyslog.conf but still has a message - rsyslogd: error during parsing file /etc/rsyslog.conf, on or before line 66 Was a Kernel panic on the console over night hopefully will capture it - doesn't look like this issue will go away see if you can capture what the call trace is for. that could help. Quote Link to comment
cdnboy Posted October 15, 2020 Author Share Posted October 15, 2020 10 minutes ago, 1812 said: see if you can capture what the call trace is for. that could help. Can I do that without a working syslog? Quote Link to comment
1812 Posted October 16, 2020 Share Posted October 16, 2020 23 hours ago, cdnboy said: Can I do that without a working syslog? have you tried setting up syslog server or writing a copy to usb? it'll preserve everything up to the moment of lockup Quote Link to comment
cdnboy Posted October 16, 2020 Author Share Posted October 16, 2020 Syslog won't start - another issue. Reformated the system again and trying another OS on it How do I mark this thread as closed? Quote Link to comment
JonathanM Posted October 16, 2020 Share Posted October 16, 2020 4 hours ago, cdnboy said: How do I mark this thread as closed? Edit your first post in the thread. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.