Unraid server becomes unresponsive (SOLVED)


Recommended Posts

Hello all!

I'm new on the forums so I apologize in advance if I forget anything..

I have an issue with my current Unraid server, for some reason it randomly completely freezes/becomes unresponsive. I have a monitor hooked up to the server because I'm still actively setting it up and the backlit stays on, but no keyboard input is accepted and the web interface is offline, as well as all VM's and docker containers are offline.

I have been unable to find a real source of this problem... but right now it's starting to worry me because it's happening more often.. Last night it happened after starting the party check.. some time before it was when I was watching a movie on Plex... before that it happened a few times while I myself wasn't doing anything special in particular on the server.

 

I have attached the diagnostics output to this post, and just this morning I received a notification regarding a hardware issue? (I never had this notification before so that one is new!) 


I already replace the HBA because I was running a Supermicro AOC-SASLP-MV8 but that one is using a Marvell controller so I thought that, that might have been causing issues, so I replaced that with a Dell H200 flashed to IT mode but like I already said, it froze I think about an hour or maybe 2 after starting the parity check the disk usage LED's just stopped blinking and the server was unresponsive again... So that didn't help unfortunately...

 

The specs of my server are as followed:

- Motherboard: Asus ROG Rampage V Extreme

- RAM: Corsair Vengeance LPX 4x 16 GB 2133 Mhz 

- GPU: MSI Geforce GTX 960

- CPU: Intel Xeon E5-2690 V4

- HBA: Dell Perc H200 

- SAS Expander: HPE 468406-B21

- PCI Ethernet card: Intel Corporation 82571EB/82571GB Gigabit Ethernet Controller

- Power Supply: Corsair V1200

- Unraid USB: SanDisk Cruzer Fit

 

I really hope someone can point me in the right direction because I have no idea how or what might be going on... I have searched the forums for similar symptoms and I looked in to all those possible fixes or solutions, but until now nothing has worked for me...

obscurity-diagnostics-20210204-1040.zip

Edited by jeffrey.el
Link to comment

 

12 hours ago, JorgeB said:

Try this and then post that log after a crash.

Alright, I setup a external syslog server on a different PC in the network.. It froze again and all that was in the syslog was this:

 

Feb  5 02:57:32 Obscurity rsyslogd: [origin software="rsyslogd" swVersion="8.1908.0" x-pid="16104" x-info="https://www.rsyslog.com"] start
Feb  5 02:58:30 Obscurity kernel: mdcmd (69): nocheck Cancel
Feb  5 02:58:30 Obscurity kernel: md: recovery thread: exit status: -4
Feb  5 02:58:37 Obscurity kernel: eth0: renamed from vethb4ba995
Feb  5 02:59:01 Obscurity sSMTP[19696]: Creating SSL connection to host
Feb  5 02:59:01 Obscurity sSMTP[19696]: SSL connection using ECDHE-RSA-AES256-GCM-SHA384
Feb  5 02:59:06 Obscurity sSMTP[19696]: Sent mail for [email protected] (221 2.0.0 Service closing transmission channel) uid=0 username=root outbytes=744
Feb  5 03:00:16 Obscurity crond[2934]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null

Does this help any?.... 

I can't find anything in it... I did cancel the parity check because that one was running because of the previous crash. (I know I shouldn't do this, but because I'm still trying to figure out what is wrong...) 
 

Link to comment
7 hours ago, jeffrey.el said:

Does this help any?.... 

Not really, nothing about why it crashed, it could be a hardware problem, one thing you can try it to boot the server in safe mode with all docker/VMs disable, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.

Link to comment
6 hours ago, JorgeB said:

Not really, nothing about why it crashed, it could be a hardware problem, one thing you can try it to boot the server in safe mode with all docker/VMs disable, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.

Well the thing is... it seems to have something to do with more power which is needed?

Because the last two times was when I started streaming a movie on Plex which had to transcode... And before that it happened during the installation of some virtual machines..

 

It also happened just randomly after running fine for like 5 days.. that's when I read something about docker containers that should be rebooted on a daily basis, so I wrote a script for that, so they would restart every day.

 

Is there some way/proper tools to test this? Or are there any maybe known issues with the hardware I'm using?... 

Link to comment
  • jeffrey.el changed the title to Unraid server becomes unresponsive (SOLVED)
  • 5 months later...
On 2/5/2021 at 11:27 PM, jeffrey.el said:

Alright, after some testing I found out the problem..

Apparently there we're still some overclocking settings applied to the CPU... Which remained active after resetting the BIOS.

I updated the BIOS and things seem to be running stable for now!

 

Thanks for the help! 

Could you elaborate on what you changed here? I'm having a similar issue and trying to track down the cause. Weird thing is, I've had this system running for a couple years and at one time had 180+ days uptime. This just started happening in the last couple months.

Link to comment
22 hours ago, PISTOL_CUPCAKES said:

Could you elaborate on what you changed here? I'm having a similar issue and trying to track down the cause. Weird thing is, I've had this system running for a couple years and at one time had 180+ days uptime. This just started happening in the last couple months.

What my issue was that I was using an motherboard from Asus which has some form of overclocking applied by default.
So I changed in the bios a settings to something like "Intel Approved settings"? or something? It disabled all the minor tweaks Asus did by default and that stopped it from crashing for me.

If you want I can check the BIOS of that motherboard? I don't know which one you're currently using?

Link to comment
17 hours ago, jeffrey.el said:

What my issue was that I was using an motherboard from Asus which has some form of overclocking applied by default.
So I changed in the bios a settings to something like "Intel Approved settings"? or something? It disabled all the minor tweaks Asus did by default and that stopped it from crashing for me.

If you want I can check the BIOS of that motherboard? I don't know which one you're currently using?

Okay, no that helps. I'll check but I don't think there are any overclock settings enabled on mine. Thanks!

Link to comment
  • 1 month later...

Didn't want to open another thread since I fit the category. My unRaid server froze during the night, couldn't login, couldn't SSH, I did a restart, parity check started and everything is back to normal. I had some memory module isues before and after aprox. 1 year later server froze again. My motherboard is ASUSTeK COMPUTER INC. Z10PA-D8 Series, Version Rev 1.xx. Can someone please check my diags file and provide some info what could be wrong. Thank you.

unrsrv-diagnostics-20220118-1640.zip

Link to comment
  • 2 months later...

I just had this happen last night.  UnRaid 6.9.2.  Rebooted the server this morning, and it came back up fine, and started a parity check.  I've enabled the syslog, so hopefully if it happens again, I'll have a clue as to cause.  I haven't changed anything on the server in quite some time.  The only recent change was to enable a plugin for the Logitech Media Server docker for "shairTunes" a few days ago.

Link to comment
  • 4 weeks later...

I've just had this happen again on my server after about 180 days uptime so I have no idea what is causing it. Has anyone had any luck? My server became unresponsive May 2 approx 8:59 and I rebooted at 13:33.

 

I did enable mirroring syslog to flash so I have this but it doesn't appear useful to me:
May  2 05:31:40 NASsy kernel: IPv6: ADDRCONF(NETDEV_CHANGE): veth8a7d9f3: link becomes ready
May  2 05:31:40 NASsy kernel: docker0: port 7(veth8a7d9f3) entered blocking state
May  2 05:31:40 NASsy kernel: docker0: port 7(veth8a7d9f3) entered forwarding state
May  2 05:32:00 NASsy CA Backup/Restore: #######################
May  2 05:32:00 NASsy CA Backup/Restore: appData Backup complete
May  2 05:32:00 NASsy CA Backup/Restore: #######################
May  2 05:32:00 NASsy CA Backup/Restore: Deleting /mnt/user/backups/AppdataBackup/[email protected]
May  2 05:32:00 NASsy CA Backup/Restore: Backup / Restore Completed
May  2 05:32:12 NASsy nginx: 2022/05/02 05:32:12 [error] 17190#17190: *14284161 FastCGI sent in stderr: "Primary script unknown" while reading response header from upstream, client: 127.0.0.1, server: , request: "GET /admin/api.php?version HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "127.0.0.1"
May  2 05:32:12 NASsy nginx: 2022/05/02 05:32:12 [error] 17190#17190: *14284163 FastCGI sent in stderr: "Primary script unknown" while reading response header from upstream, client: 127.0.0.1, server: , request: "GET /admin/api.php?version HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "localhost"
May  2 07:42:45 NASsy apcupsd[14578]: Power failure.
May  2 07:42:45 NASsy apcupsd[14578]: Power is back. UPS running on mains.
May  2 07:42:46 NASsy apcupsd[14578]: Power failure.
May  2 07:42:46 NASsy apcupsd[14578]: Power is back. UPS running on mains.
May  2 07:43:02 NASsy kernel: TCP: request_sock_TCP: Possible SYN flooding on port 32033. Sending cookies.  Check SNMP counters.
May  2 08:33:01 NASsy kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]
May  2 08:33:01 NASsy kernel: caller _nv000712rm+0x1af/0x200 [nvidia] mapping multiple BARs
May  2 13:33:33 NASsy kernel: Linux version 5.10.21-Unraid (root@Develop) (gcc (GCC) 9.3.0, GNU ld version 2.33.1-slack15) #1 SMP Sun Mar 7 13:39:02 PST 2021
 

 

Additionally, I thought maybe it was an out of memory issue so I wrote a script to log memory usage (free -m) every minute to a file. Seems to be plenty of memory available but this has the added benefit of knowing when the server became unresponsive:
           date        time           total            used         free      shared  buff/cache   available
2022-05-02 08:57:52           32095       10222         417        1661       21455       19770
2022-05-02 08:58:52           32095       10235         398        1661       21461       19757
2022-05-02 08:59:52           32095       10245         379        1661       21470       19747

Link to comment

I have had this same issue (three times over two weeks), so I followed the recommendation on this thread to setup remote logging. Last night the server apparently crashed and I had to hard reboot. I have attached the log file and sincerely hope someone can help to identify the cause of the issue. Thank you in advance for any assistance. I have added my diagnostics file as suggested. Thanks!

syslog-192.168.1.140_after_crash.log

mongo-diagnostics-20220507-0805.zip

Edited by hfuhruhurr
Link to comment
5 minutes ago, hfuhruhurr said:

I have had this same issue (three times over two weeks), so I followed the recommendation on this thread to setup remote logging. Last night the server apparently crashed and I had to hard reboot. I have attached the log file and sincerely hope someone can help to identify the cause of the issue. Thank you in advance for any assistance.

syslog-192.168.1.140_after_crash.log 291.6 kB · 1 download

You are likely to get better informed feedback if you post your system’s diagnostics zip file as well.

Link to comment
18 hours ago, hfuhruhurr said:

I have had this same issue (three times over two weeks)

May  4 09:45:00 Mongo kernel: macvlan_broadcast+0x116/0x144 [macvlan]
May  4 09:45:00 Mongo kernel: macvlan_process_broadcast+0xc7/0x110 [macvlan]

 

Macvlan call traces are usually the result of having dockers with a custom IP address, upgrading to v6.10 and switching to ipvlan might fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enable, top right)), or see below for more info.

https://forums.unraid.net/topic/70529-650-call-traces-when-assigning-ip-address-to-docker-containers/

See also here:

https://forums.unraid.net/bug-reports/stable-releases/690691-kernel-panic-due-to-netfilter-nf_nat_setup_info-docker-static-ip-macvlan-r1356/

Link to comment

Thank you for the suggestion. I have stopped Docker and set the 'Docker Custom Network Type' to 'ipvlan'. At present I have made no other changes. The guide says that 'your router and switch must support vlans', and mine does as I am using a Unifi Dream Machine. However, it seems as if I only need to create a vlan if I want to assign a static ip address to my containers (I will at some point), so I have not done this yet. Most of my containers are set to a custom network (merely so I can refer to their container names, and ensure thay can communicate with each other).

Are there any other implications to changing the docker network type to ipvlan? Would I create a vlan in Unifi, then select it in my docker container configuration? I am a little fuzzy on the steps necessary between Unifi and unRaid. Thank you for any clarification you can provide.

Link to comment

I found an issue with the ipvlan setting. I have previously confidured a VPN Tunnel and successfully tested prior to changing the 'Docker Custom Network Type' to 'ipvlan'. The tunnel no longer works with this setting. To verify, I stopped Docker and changed the 'Docker Custom Network Type' back to 'macvlan', and the tunnel works again. Does the ipvlan setting preclude the use of the VPN Tunnel?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.