[Crash again]Cannot connect to UNRAID, no video out (no GPU passthru), no keyboard response, macvlan


Recommended Posts

Hi UNRAID community,

 

Setup

I don't have GPU passthrough. I have plex docker with standard intel iGPU hardware transcoding setting (like modprobe i915).

Once array and plex docker is running, at least for a couple hours, there is video out for UNRAID command line and response to keyboard input.

 

I have plugin a Mellanox MCX311a to UNRAID made it to be primary in bond and On board 1G to be secondary in active backup config.

Current switch is Netgear MX510TX 10G/multigig switch (I returned Mikrotik CRS305)

These have been stable and I got full speed to 2.5G with 0 Retr for most of the time.

I have got 7-8Gbps and 9.2 Gbps between MCX311a <-> Netgear MX510TX <-> HP 533FLR 10Gbe

 

Problem

I had this happen to my UNRAID server recently.

First time, UNRAID be come unreachable in early morning, no reply to ping, no video out, no response to key board.

The power indicator was lit and fans were running.

HDD busy indicator was NOT lit or blink at all.

Have to do hard shut down and it booted fine and just need to do parity check.

 

Second morning, same exact story, UNRAID is not responsive and not reachable in the morning.

Hard shut down.

I did 4 pass of full memtest with my memtest DOS flash drive, no error. Somehow, memtest in the UNRAID boot menu doesn't work and it will just reboot.

After memtest, it booted fine and parity check.

 

Third morning, UNRAID is not reachable.

The power indicator was lit and fans were running.

HDD busy indicator blinked occasionally. 

 

ping response is very wired. Notice the third time is 3.7s. 

Pinging 192.168.x.x with 32 bytes of data:
Reply from 192.168.x.x: bytes=32 time=14ms TTL=64
Request timed out.
Reply from 192.168.x.x: bytes=32 time=3704ms TTL=64
Reply from 192.168.x.x: bytes=32 time=62ms TTL=64

If I unplug SPF+ cable, and have 1G NIC plugged to switch or router directly,  ping reply: Destination host unreachable.

This time, the monitor has video out but not responsive, and also not responsive to keyboard input.

Last night keyboard was working and RGB lighting is on.

This morning, keyboard numlock doesn't lit on when numlock is pressed, also RGB lighting doesn't work.

Unplugged and plugged to different USB port, no response, no lights.

 

At this point how can I diagnose?

Do I have to hard shut down and do another parity check?

 

 

 

Edited by jena
Link to comment
8 hours ago, JorgeB said:

Enable syslog mirror to flash then post that log after a crash, together with the complete diagnostics: Tools -> Diagnostics

I did that when 2nd time happened. I have seen the syslog under /boot/logs/syslog once, which contained logs from Jul 12. 

 

This time, before hard shut down, I pulled my data HDD out while they were spun own (I felt them, no spin vibration) in an attempt to prevent parity check again.

I then hard shut down and put the HDD back in and boot it up again. 

Now I got error saying that my flash drive is not read/write.

I guess it might have a temporary lock up due to hard shut down or huge file copy from /var/log/syslog. 

The UNRAID web UI and SMB service all seem to be working and I canceled parity check.

I can see the file in /boot but log directory shows like this in ls -l command and not accessible. 

d????????? ? ?    ?            ?            ? logs/

 

The var/log/syslog is huge at 120-ish MB and I manually copied it to a share.

I attached part of "syslog_0717_1014"(the rest are all the same error "FAT-fs (sda1): error, corrupted directory (invalid entries)") and diagnose zip of this time (10:16am)

I looked into the var/log/syslog, they are all Jul 17 (today).

 

I did normal shut down via webUI and unplugged the flash drive and attempt to read it off it. 

There is logs folder, but nothing in it.

Plug flash drive back it, UNRAID boots up fine, I attached another syslog at 10:44am and and diagnose zip of this time (10:59am). 

 

I don't know if I could recover the syslog that contains Jul 12-Jul16. 

 

How can I permanently cancel parity check due to hard shut down?

I will do a full parity check after all the diagnose. 

neo-diagnostics-20210717-1016.zip syslog_0717_1014_part.txt syslog_0717_1044.txt neo-diagnostics-20210717-1059.zip

Link to comment
8 hours ago, JorgeB said:

This only shows flash drive issues, Is this after the crash?

 

Yes after the crash.

The syslog of the before crash (original problem) couldn't be found after the third crash despite I already enabled mirror syslog before that (after first crash).

Link to comment
14 minutes ago, JorgeB said:

Thank you very much.

I also just attached syslog in my last reply.

Could you help to take a look?

 

Link to comment
1 minute ago, JorgeB said:

Yes, syslog also points to that issue:

 


Jul 18 14:37:14 Neo kernel: macvlan_broadcast+0x10e/0x13c [macvlan]
Jul 18 14:37:14 Neo kernel: macvlan_process_broadcast+0xf8/0x143 [macvlan]

 

Great.

A relief for me finally.

Thank you so much. 

Link to comment
On 7/17/2021 at 12:00 PM, jena said:

I pulled my data HDD out while they were spun own (I felt them, no spin vibration) in an attempt to prevent parity check again.

That is more likely to cause a disk to be disabled than to actually help in any way. There is no way this could prevent unclean shutdown parity check since that is based on whether the array stopped status gets written to flash.

Link to comment
  • 2 weeks later...

Crashed again today.

 

on July 27, I followed the third option: "Keep docker containers in host or bridge mode which use the server IP address and ports as needed"

I changed all dockers to host or bridge mode.

Turned off "AdGuard" because it cannot run under host or bridge mode and disabled auto start.

My unraid ran for a week and crashed again (same symptom, no response) last night.

I attached the syslog. Please start from July 27, the day that I rebooted and made the change. 

 

 

syslog_20210803.txt

Edited by jena
Link to comment
  • jena changed the title to [Crash again]Cannot connect to UNRAID, no video out (no GPU passthru), no keyboard response, macvlan

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.