March 18, 20206 yr Hi All, For the past week or so on 6.8.x I've been getting hard crashes and a lot of call trace errors that seem to be related to my mellanox NIC. I've gone through various existing forum posts with similar issues pointing to Vlans (not using them on this server) and dockers on custom bridge with set IPs (my Unifi controller and Adguard DNS are on custom br0 with set static IPs). Interestingly, with a monitor and keyboard hooked up, even upon a full crash I can still log in, I can type commands but nothing runs. Only call traces to be found in the syslog, no obvious lead up to a crash. Call trace here My next step is to move those 2 containers to another dedicated NIC and see how it goes, then I will have to try removing the Mellanox card.
March 19, 20206 yr 1 hour ago, Faceman said: My next step is to move those 2 containers to another dedicated NIC and see how it goes, then I will have to try removing the Mellanox card. It is usually macvlan/broadcast call traces that are associated with docker containers on br0 with custom IP addresses. That is not present in the call trace you posted. Moving the containers to a different NIC or VLAN may help, but that does not appear to the cause of your issue. Your call trace definitely looks like it is related to the Mellanox card in some way. Sometimes putting it in a different PCIe slot can make a difference. I am well-versed with the macvlan call traces; not so much with the kind you are experiencing.
March 19, 20206 yr Author I moved to a different slot (one connected to the other CPU) and the problem still occurred, so I have now put in another CX-2 card I had and while I haven't seen a hard crash yet, i did catch that call trace. Next I guess will be to switch out for the integrated intel NICs for a while.
March 20, 20206 yr Author I've moved my custom IP dockers back to the default bridge and now I am getting no errors, I did see some MACVLAN stack trace errors pop up so I decided to get rid of custom br0, I guess there's an issue with it somewhere. I'll now be looking at how to use a separate NIC for some containers, really I only need the smb server itself and plex to use the 10g card, everything else could be on a single 1gb connection.
April 16, 20206 yr Author Bit of an update, I switched back to not assigning any custom IPs and now everything has been 100% stable for a month, so the issue is still up in the air.
Archived
This topic is now archived and is closed to further replies.