Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Defeated & Looking for Help

Featured Replies

To begin, I am an IT Technician with 10yrs of experience under my belt. Asking for help (in general) is very hard for me so admittedly I've been holding off on posting to these forums. After about a year of troubleshooting I'm truly out of ideas and can't think of where to go next.

 

I have a Dell R710 running the last supported BIOS, Lifecycle controller, and iDRAC package. 

 

It is equipped with a PERC H200 flashed to IT/JBOD mode.

 

Periodically the server crashes and I lose everything. There is no rhyme or reason as to why and the only indicator I get (when I'm able to catch it) is my VMs and Dockers begin to crash/slow down/act weird. For instance I am running Home Assistant in a VM converted from their official website and my ZWave plugin begins to lose connectivity with my ZWave network. 

 

I have read several posts regarding similar issues and found some people have luck changing their docker network interface from macvlan to ipvlan (which I have done), and running Memtest86 to verify ram functionality. I ran Memtest for about 13hrs and was unable to get it to fail.

 

I have BTRFS, TRIM, and mover schedules set so those are all being taken care of. No SMART errors either. 

 

At this point I am at a loss and cannot for the life of me figure out how to mitigate these crashes. Any input or guidance would be appreciated. All I want is for this thing to work consistently and with the struggles of life and taking on a mortgage I'm running out of time and energy to fix this.

 

Attached is my diagnostics, if someone could please go through them and give me some pointers I would truly appreciate it. I feel like I'm just an idiot and missed a setting somewhere.

 

Thanks again in advance, I can provide more information if required!

zserve-diagnostics-20230207-2144.zip

  • Author
11 hours ago, trurl said:


syslog has already been setup! Sorry I wrote this while tired last night. Before this last crash I setup syslog and mirrored it all to flash, so I have logs of it while crashing. However after skimming through them I couldn’t see anything out of the ordinary. Would you be interested in taking a look?

  • Author

Absolutely, here you go.

 

 

syslog.zip

Looks like NIC problems, try with a different NIC if available.

  • Author
23 minutes ago, JorgeB said:

Looks like NIC problems, try with a different NIC if available.

 

Just out of curiosity where in the logs where you able to determine that? Right now I'm actually using all 4 NICs in a LAG to my switch. (the switch has been configured for it as well).

 

I also had these exact issues when I used just one NIC, but I hadn't tried removing the LAG and trying another one of the 4. 

 

Is the driver acting up or does it appear to be hardware related?

Look at the syslog, anything that mentions bnx2 is about the NICs, e.g.:

 

Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: <--- start TBDC dump --->
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: TBDC free cnt: 32
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: LINE     CID  BIDX   CMD  VALIDS
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 00    001300  c560   00    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 01    001300  c560   00    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 02    001080  a2f8   00    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 03    001080  a300   00    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 04    000800  1f38   00    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 05    001300  1110   00    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 06    001300  1120   00    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 07    001000  6308   00    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 08    001000  62e8   00    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 09    059f80  ff50   fc    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 0a    1ffd80  f660   f4    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 0b    1eaa80  f6f8   af    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 0c    0fff80  3cc8   3d    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 0d    129f80  df90   a5    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 0e    1fd780  7fd0   ff    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 0f    0fff80  57f8   6e    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 10    1bef00  7fd8   d1    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 11    1dfe80  7ee8   ef    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 12    17ec80  dbf0   ff    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 13    01ac80  ddf0   7b    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 14    00ff00  dff8   46    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 15    1b1d80  76e8   ff    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 16    1fff80  3328   7f    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 17    1d3b80  fdb0   c3    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 18    0d7f00  af48   5d    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 19    1bdf80  9de0   f4    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 1a    0ff300  ebd0   eb    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 1b    1f7d80  b9f8   ff    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 1c    0fe980  fdf0   fd    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 1d    1ff980  6770   ff    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 1e    197f00  f7f8   fc    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 1f    137980  fff8   ed    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: <--- end TBDC dump --->

 

  • Author
1 hour ago, JorgeB said:

Look at the syslog, anything that mentions bnx2 is about the NICs, e.g.:

 

Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: <--- start TBDC dump --->
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: TBDC free cnt: 32
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: LINE     CID  BIDX   CMD  VALIDS
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 00    001300  c560   00    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 01    001300  c560   00    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 02    001080  a2f8   00    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 03    001080  a300   00    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 04    000800  1f38   00    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 05    001300  1110   00    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 06    001300  1120   00    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 07    001000  6308   00    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 08    001000  62e8   00    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 09    059f80  ff50   fc    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 0a    1ffd80  f660   f4    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 0b    1eaa80  f6f8   af    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 0c    0fff80  3cc8   3d    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 0d    129f80  df90   a5    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 0e    1fd780  7fd0   ff    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 0f    0fff80  57f8   6e    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 10    1bef00  7fd8   d1    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 11    1dfe80  7ee8   ef    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 12    17ec80  dbf0   ff    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 13    01ac80  ddf0   7b    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 14    00ff00  dff8   46    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 15    1b1d80  76e8   ff    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 16    1fff80  3328   7f    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 17    1d3b80  fdb0   c3    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 18    0d7f00  af48   5d    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 19    1bdf80  9de0   f4    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 1a    0ff300  ebd0   eb    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 1b    1f7d80  b9f8   ff    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 1c    0fe980  fdf0   fd    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 1d    1ff980  6770   ff    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 1e    197f00  f7f8   fc    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: 1f    137980  fff8   ed    [0]
Jan 19 05:52:07 zServe kernel: bnx2 0000:01:00.0 eth0: <--- end TBDC dump --->

 

 

Gotcha, are you seeing anything suspicious after that date? I changed some of my logs to just show the last 2 days and haven't seen anything related to that since. Not sure if that was due to me moving the connection or what. 

 

Regardless I'll give disabling the LAG a shot and moving the primary port over to eth1. 

 

Will report back on what I find!

17 minutes ago, echooffzack said:

are you seeing anything suspicious after that date?

Not really, and didn't even notice the date, I do also see this:

 

Feb  2 07:45:54 zServe kernel: macvlan_process_broadcast+0xea/0x128 [macvlan]

 

Macvlan call traces are usually the result of having dockers with a custom IP address and will end up crashing the server, upgrading to v6.10 or later and switching to ipvlan should fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enabled, top right)).

  • Author
25 minutes ago, JorgeB said:

Not really, and didn't even notice the date, I do also see this:

 

Feb  2 07:45:54 zServe kernel: macvlan_process_broadcast+0xea/0x128 [macvlan]

 

Macvlan call traces are usually the result of having dockers with a custom IP address and will end up crashing the server, upgrading to v6.10 or later and switching to ipvlan should fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enabled, top right)).

 

I should note the last time the system crashed was after I switched from macvlan. I initially did some research on 2+ day Unraid crashing and found a couple of posts where you had mentioned making that switch, and I am using VLANs for my dockers/vms/cameras so I made the change.

 

I believe the last crash has been within the last 2 days, hopefully that helps with combing through the logs. I believe it was between 12-1AM on Feb 7th, I remember waking up to my house freezing because Home assistant hadn't recognized I was home and didn't make the adjustment to my thermostat, tried changing it in HA but the UI wouldn't load. I woke up that morning at 7am and told Unraid to start the array again.

 

Again I appreciate your help with this so far. 

 

My only other guess is that my Home Assistant VM could be contributing, but I couldn't imagine a VM would have that much power over the entire kernel. HA has been pretty straightforward and isn't very problematic. I had to make some modifications to the actual image before I got it working, I downloaded the KVM file from them directly and imported it into the VM manager.

 

Is this a common point of failure on a lot of Unraid installs? After going through some previous posts it seems a lot of people run into issues with the networking side of Unraid. I'm just worried I've overcomplicated a lot of this with a modded HBA and all of these VLANs.

8 minutes ago, echooffzack said:

Is this a common point of failure on a lot of Unraid installs?

Macvlan yes, for a considerable number of users, network in general no.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.