unraid goes offline - starting to happen more frequently

dgtlman · May 31, 2023

Recently, my unraid box has started going off line. I am unsure if it is a hardware or software issue. Additionally, I don't know if it is that the whole system goes down or just the networking. It happens randomly and I haven't been able to determine if it is caused by anything specific. My concern is when this happens, the only solution is to power cycle it which I know isn't good for the health of the data array.

Today I hooked up a monitor to be able to better determine what is causing this. But additionally, I figured I would post my diagnostics here to see if there is something going on behind the scenes.

Any insight would be really helpful.

unraid-diagnostics-20230531-1133.zip

Edited May 31, 2023 by dgtlman

JorgeB · May 31, 2023

Enable the syslog server and post that after it happens again.

dgtlman · May 31, 2023

I just enabled it. When it happens again, I will post it here.

Thanks!

Edited May 31, 2023 by dgtlman

Olympus_Media · May 31, 2023

2 hours ago, JorgeB said:

Enable the syslog server and post that after it happens again.

Hi All,

My server has randomly started to do the same today after completing a data rebuild after upgrading a hdd.

The server seems to remain powered on but is showing as offline and the GUI is unreachable, only a hard reboot solves it.

I enabled the syslog server after the 2nd time it happened then after an hour it happened again.

I honestly have no idea what to look for nor where to start any help would much appreciated.

syslog

olympus-diagnostics-20230531-1927.zip

Edited May 31, 2023 by Olympus_Media

Olympus_Media · May 31, 2023

UPDATE

After 2nd offline and force reboot the server remained online for an hour and went offline again but came back after couple mins. My internet did not go offline, the server threw a notification of unclean shutdown even ddoe it didn't loose power and started a parity check.

It is currently almost 4hrs into the parity check and has not gone offline yet. All i have done in the meantime is upgrade from 6.11.1 to 6.11.5.

JorgeB · June 1, 2023

12 hours ago, Olympus_Media said:

My server has randomly started to do the same today after completing a data rebuild after upgrading a hdd.

Please start your own thread or it can get confusing, since the OP issue is still ongoing.

Olympus_Media · June 1, 2023

4 hours ago, JorgeB said:

Please start your own thread or it can get confusing, since the OP issue is still ongoing.

Apologise

dgtlman · June 3, 2023

Here is the syslog. I also noticed from the monitor that I had hooked up, that the system went down from a kernel panic. Hopefully this log defines what is causing that and the easiest way to resolve this.

The kernel panic happened on 6/1. I was unable to post this since then. Since then, it captured more of the log. Sorry if this add to the complexity of figuring things out.

Thanks

syslog-192.168.1.50.log

Edited June 3, 2023 by dgtlman
clarification

JorgeB · June 4, 2023

Jun  2 01:41:10 iron kernel: macvlan_broadcast+0x10a/0x150 [macvlan]
Jun  2 01:41:10 iron kernel: macvlan_process_broadcast+0xbc/0x12f [macvlan]

Macvlan call traces are usually the result of having dockers with a custom IP address and will end up crashing the server, switching to ipvlan should fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enabled, top right)).

dgtlman · June 4, 2023

thanks for sharing this information with me. To make sure I understand, all I need to do is switch it from macvlan to ipvlan? Is there any other configuration changes that need to be made?

Thanks for your assistance.

JorgeB · June 5, 2023

10 hours ago, dgtlman said:

all I need to do is switch it from macvlan to ipvlan?

Usually that's it.

dgtlman · June 5, 2023

weird. I made that change again, the system did another kernel panic last night, here is the new syslog.

syslog-192.168.1.50.log

JorgeB · June 5, 2023

Last logged call trace is from June 2nd:

Jun  2 01:41:10 iron kernel: macvlan_broadcast+0x10a/0x150 [macvlan]
Jun  2 01:41:10 iron kernel: macvlan_process_broadcast+0xbc/0x12f [macvlan]

dgtlman · June 5, 2023

The problem happened on June 4th though.

dgtlman · June 6, 2023

11 hours ago, JorgeB said:
Last logged call trace is from June 2nd:
Jun  2 01:41:10 iron kernel: macvlan_broadcast+0x10a/0x150 [macvlan]
Jun  2 01:41:10 iron kernel: macvlan_process_broadcast+0xbc/0x12f [macvlan]

How could it have been macvlan causing it on 6/4 if the log you are referencing is from 6/2?

JorgeB · June 6, 2023

I meant that there aren't any call traces after that date, that usually points to a hardware issue, one thing you can try is to boot the server in safe mode with all docker/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.

Tristankin · June 6, 2023

Try blacklisting the gpu and removing intel gpu top. You may be experiencing the following issue.

unraid goes offline - starting to happen more frequently

Recommended Posts

dgtlman

Link to comment

JorgeB

Link to comment

dgtlman

Link to comment

Olympus_Media

Link to comment

Olympus_Media

Link to comment

JorgeB

Link to comment

Olympus_Media

Link to comment

dgtlman

Link to comment

JorgeB

Link to comment

dgtlman

Link to comment

JorgeB

Link to comment

dgtlman

Link to comment

JorgeB

Link to comment

dgtlman

Link to comment

dgtlman

Link to comment

JorgeB

Link to comment

Tristankin

Link to comment

Join the conversation