Jump to content

Kills network - unclean shutdown


Go to solution Solved by flurec,

Recommended Posts

Posted (edited)

tower-diagnostics-20240703-0910.zip

 

Starting new topic because of my network going down. Didn't see that mentioned in the other thread.

 

I have been having daily, at times twice daily, unclean shutdowns. Completely random times of the day. At this same time my entire home network goes down. I use Home Assistant and Ubiquiti (unifi controller) in a docker and I can see that the network goes down for around 10 minutes. The longest it will go without an unclean shutdown is 3 days but normally it is daily.

 

Memtest is fine. VMs turned off. Macvlan is the network type but I followed the alternative setup instructions for using that. One thing I have done is reserved the IP addresses for my dockers in opnsense by mac just to keep track of IPs on the network . I don't know if that is best practice.

 

Server is plugged into a UPS.

Edited by flurec
spelling
Link to comment

This might be a dumb question- Is there a way for me to know if the server is actually rebooting? Whatever is happening is happening on its own and randomly.

 

I guess my question it- If I get an unclean shutdown, without taking any prior action, does that mean it rebooted on its own or can an unclean shutdown be triggered without it actually rebooting?

 

What would be the order of replacing equipment? Memtest checked out so- PSU, ethernet card, CPU, then motherboard? Would that be a good approach? I built this and the CPU, HBA controller, and motherboard were purchased used.

Link to comment

Thanks- I looked at the uptime and it is since this morning when I had the last unclean shutdown. I will replace some hardware.

 

Is there anything I can look for in the logs that might point me in the right direction as far as a hardware problem?

Link to comment

Usually there's nothing logged with this issue, I do see btrfs detecting data corruption:

 

Jul  3 06:22:12 Tower kernel: BTRFS info (device sdk1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 1467, gen 0

 

This is usually RAM related, and memtest is only definitive if it finds errors, if you have multiple sticks try using the server with just one, if the same try with a different one, that will basically rule out bad RAM.

Link to comment
  • 2 weeks later...
  • Solution

So I feel really dumb. I have the unraid server and my ubiquiti AP plugged into the same UPS. That UPS does not indicate that the battery needs replacing and it appears to be functioning normally.

 

Well, looks like this UPS was resetting itself almost daily thus killing the server and AP and getting those unclean shutdowns.

 

Moral of the story is to bypass your UPS first. Thanks for your help Jorge.

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...