(SOLVED) Help with recurring server crash


asnt
Go to solution Solved by JorgeB,

Recommended Posts

I've been dealing with my server crashing several times last month. At first, I didn't pay much attention since my server is usually stable, and I never had any problems with it. But since it started crashing again and again, I started looking into what could be the reason.

 

Today I had another crash and decided to look at the logs. I saw a lot of errors that, unfortunately, I don't know what it means. Can someone help me? The crash today resulted in an unclean shutdown, and I am currently in the middle of a parity check. I tested my all my drives and smart data came back OK.

 

I have attached the diagnostics files. The errors I mentioned are shown around the 18:29:18 timestamp of the syslog file. Thank you

m93p-diagnostics-20230327-1842.zip

Edited by asnt
Marking as solved
Link to comment
On 3/28/2023 at 1:07 AM, JorgeB said:

Enable the syslog server and post that after a crash.

 

Thank you! I just enabled it (unfortunately not fast enough, since I had another crash before reading your reply).

 

On 3/29/2023 at 7:09 AM, hunter69 said:

If this is any help, I have the Plex docker.  I have been having weired reboot crashing.  I traced it to the Plex docker.  I have the linuxserver.io version.  I am in the process of figuring out what to do next.  I believe it was an update to the docker that broke it.

 

Thanks, that's good to know. I have the official Plex version, I'll keep it running until I have another crash, to see if I can get it logged as per JorgeB's suggestion. I might stop Plex after and see if the server is stable again. I am getting weird messages from Unraid that my docker size is over 70% and than it returns to normal. I used to get this messages when I was updating a docker, but never got it when the server is idle.

Link to comment

Well I know it is the plax docker.  I have done a lot of uninstalling and installing other plex dockers.  I am not making any progress resolving the crash.  If the plex docker is enabled it crashes during the default maintenance window 3-5am.  I installed plex on another computer and had it scan the same movie file and it did not crash.  So I think that eliminates corrupt files.  I am stumped on the next step.  I enabled the syslog server but I do not see any logs in the share i told it to use.  Have you made any progress?

Link to comment

There's nothing relevant logged, this usually points to a hardware problem, one thing you can try is to boot the server in safe mode with all docker/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.

Link to comment
On 3/31/2023 at 7:25 AM, hunter69 said:

Well I know it is the plax docker.  I have done a lot of uninstalling and installing other plex dockers.  I am not making any progress resolving the crash.  If the plex docker is enabled it crashes during the default maintenance window 3-5am.  I installed plex on another computer and had it scan the same movie file and it did not crash.  So I think that eliminates corrupt files.  I am stumped on the next step.  I enabled the syslog server but I do not see any logs in the share i told it to use.  Have you made any progress?

 

I didn't have a crash after I turned on the syslog server. I did update the plex docker two days ago, so if it was a problem in this docker, it is fixed.

 

I'll update here if something changes. I hope you can identify the problem with your server.

 

 

 

Link to comment

If I start the array in maintanence mode, it does not crash.  I wil say tht ping rates fluctuate.  i have 2-m.2 cache drives.  What would be the best/safest way to eliminate the m.2 and yet still be able to renabe the drives as cache in the furture if they prove not to be an issue?   One cache has my domains and appdata shares.

Link to comment

So I replaced the M.2.  The server continues to crash every 3 minutes.  I am down to motherboard or processor.  I am looking for a replacement motherboard.  Got any ideas to determine processor versus motherboard?  This is a nice versatile setup.  It is a Xeon 1290p.  Sad I am having these hardware problems after only 3 years of use. 

Link to comment
  • Solution
Mar 30 20:42:02 M93p kernel: macvlan_broadcast+0x10a/0x150 [macvlan]
Mar 30 20:42:02 M93p kernel: macvlan_process_broadcast+0xbc/0x12f [macvlan]

Macvlan call traces are usually the result of having dockers with a custom IP address and will end up crashing the server, switching to ipvlan should fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enabled, top right))

Link to comment
1 hour ago, JorgeB said:
Mar 30 20:42:02 M93p kernel: macvlan_broadcast+0x10a/0x150 [macvlan]
Mar 30 20:42:02 M93p kernel: macvlan_process_broadcast+0xbc/0x12f [macvlan]

Macvlan call traces are usually the result of having dockers with a custom IP address and will end up crashing the server, switching to ipvlan should fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enabled, top right))

 

Thank you! I switched to ipvlan and hopefully this willl fix the crashes.

  • Like 1
Link to comment

To update I finally figured it out.  To be short it was my fault.  I hae a LSI 9220-8i.  I was using a expander from long ago when I had over 10 drives.  Things worked well back then.  Today, Unraid did not like this expander.  In fact I figured it out after (I can't count how many hours of research) I saw a picture of the same expander with the caption "do not use with Unraid".  So I reconfigured my drives and everything returned to normal.  By the way from anyone who is doing research because of strange crashes, there was nothing in the logs to indicate this was the issue.

Link to comment

I am wrong the issue has not been resolved.  My unraid server continues to crash.  Here is what I observed:

 

Scenario 1- I have a LSI SAS9220-8i.  I eliminated this by moving all the drives to the onboard motherboard sata controller.  What I observed here is the server would crash every 3 minutes.  After researching I started thinking this issue could be caused by using all the onboard sata ports plus having 2 m.2 drives.  I have read that when using the m.2 slots coulkd affect some of the sata ports.  Am I correct or incorrect in my thinking?

 

Scenario 2- SO I moved and reconfigured my drive from the onboard controller to the LSI SAS9220-8i.  It stopped crashing, or so I thought.  When I reinstalled the plex docker, the server began crashing again.  Note this is where it all started.  Now I am wondering if this could be the issue.  I have the LSI SAS9220-8i in a PCI X1 slot.  Note I have had this card in the same slot for 3 years with nothing but stability.  I have owned the card since unraid version 4.  Could the issue be that the LSI card in a PCI X1 slot be the root cause.

 

Interesting  fact what I observed when the server crashed in each scenario

Scenario 1-  The server would crash as in I could not access the gui and could not ping the server.  I had a monitor on the server.   I could see the screen but could not type on it.

Scenario 2-  When the server crashes it powers off.

 

I am down to the following possible root cause

The LSI board

The Motherboard

The processor

 

I have changed ram, power supply and eliminated all unneccessary drives.

 

Any and all ideas are welcome

Link to comment
7 minutes ago, hunter69 said:

I have read that when using the m.2 slots coulkd affect some of the sata ports.  Am I correct or incorrect in my thinking?

Using SATA M.2 devices will usually disable SATA ports, this is not a problem with NMVe, nor a stability concern, the ports work or not, nothing more.

 

9 minutes ago, hunter69 said:

Could the issue be that the LSI card in a PCI X1 slot be the root cause.

x1 slot will limit bandwidth, but it should not cause any stability issues, again it works or not.

 

Could just be some hardware going bad, unfortunately not usually easy to tell which without starting to swap some parts around.

 

 

Link to comment

I'm down to big items the motherboard and processor.  I have a replacement motherboard on the way but if its not that, I could buy a different/replacement LSI card.  Other than that might begin to get real expensive.  What bothers me, is when I find a server is shut off, I know something is up with the power supply.  When I did the swap I had an older 650 psu from my old unraid server.  It lack 1 -4 pin 12 volt connector.  I had read depending on CPU the motherboard might not need that connector.  Well the server booted as normal and began crashing just the same.  Do you think I am pretty solid that it isn't the power supply.

Edited by hunter69
Link to comment
  • asnt changed the title to (SOLVED) Help with recurring server crash

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.