Unraid docker (i think) issues with shutdown / stability

Zxurian · November 14, 2020

so I've been having some issues lately that I can't pin down and hoping someone can help. Through searching forums, it looks as long it's a problem with Docker hanging. Symptoms include

* going to WebUI, Dashboard & docker tabs do not load and just spin, but other tabs work fine.

* trying to shut down, WebUI doesn't load, but I can still ssh in (running `diagnostics` and logs attached.)

* Entire system stops responding, and unable to access Network shares or docker URLs, terminal also does not respond, have to hard shut down, but comes back up fine

I have read that unclean shutdowns aren't good, but I've tried everything else, including just hitting the power button once for a graceful shutdown. 20m later, it's still on, so something is hanging it. Only option is a hard shutdown, then bringing it back up.

Can anyone give me pointers to look at what might be causing it?

media-1-diagnostics-20201114-1502.zip

Edited November 14, 2020 by Zxurian
didn't attach log

Squid · November 14, 2020

3 minutes ago, Zxurian said:

(running `diagnostics` and logs attached.)

Nothing is attached here

Zxurian · November 14, 2020

25 minutes ago, Squid said:

Nothing is attached here

'cause I'm an idiot. Just attached the log.

JorgeB · November 16, 2020

Not a network guy but there's something misconfigured:

Nov  7 20:02:53 media-1 kernel: br0: received packet on bond0 with own address as source address (addr:d4:ae:52:7d:0f:65, vlan:0

)

Also a network related crash after that.

Zxurian · November 16, 2020

So the "received packet on bond0 with own address" error I'm researching separately, and pretty sure it has to do with the fact that I'm using both NICs on the r710 in a balance-rr configuration. While it's NIC related, I've switched it to active-backup(1) for the time being, which shouldn't flood the syslog with packet entries.

Another issue I just ran into is my cache drive (btrfs) stopped responding overnight Logs had entries like the following

BTRFS: error (device sdg1) in btrfs_replay_log:2351: errno=-5 IO failure (Failed to recover log tree)

After some research today, I was able to recover all of the files off of it using your excellent FAQ. My instinct is that btrfs failed due to unclean shutdown due to my above issues. Hardware testing doesn't show any issues with the SSD's I'm using so I reformatted it, and am copying the files back to it now.

Would the network related crash you saw result in symptoms described above? I had thought no, but would be welcome to be wrong.

Zxurian · November 17, 2020

So my Unraid has been running fine for the past 24 hours (except for the brtfs cache issue mentioned above).

Given that the timeout / crash only happens sporadically, is the best bet to wait until it stops responding again, _then_ see if I can still ssh in and get the log and post?

If I can't ssh in (as I couldn't the last time), and am forced to hard shutdown, what you recommend the next steps be to get the most complete picture of why it might have timed out?

JorgeB · November 17, 2020

3 hours ago, Zxurian said:

Given that the timeout / crash only happens sporadically, is the best bet to wait until it stops responding again

I would recommend to backup and re-format cache now.

Zxurian · November 19, 2020

On 11/17/2020 at 3:03 AM, JorgeB said:

I would recommend to backup and re-format cache now.

Thanks, did that Monday, everything seems to be okay with it, haven't had issues yet at least. _If_ it gets to the point where I have to hard power off because Unraid isn't responding again, upon power on, what is the first thing I should do to get proper logs that might show why Unraid stopped responding.

Zxurian · November 24, 2020

so issue just happened again last night. Unraid system became completely unresponsive, no activity on any of the drive lights (sat for a full 5 minutes in front of the server). No WebGUI, and unable to telnet/ssh in either. Had to hard power it down this morning.

What should I do at this point to get the information required to figure out why it's hard locking?

JorgeB · November 24, 2020

You can try this:

https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=781601

Zxurian · November 29, 2020

On 11/24/2020 at 2:03 PM, JorgeB said:

You can try this:

https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=781601

thank you, setup a remote syslog-server not hosted on Unraid, so if I lose access again, I will reference those logs.

Unraid docker (i think) issues with shutdown / stability

Recommended Posts

Zxurian

Link to comment

Squid

Link to comment

Zxurian

Link to comment

JorgeB

Link to comment

Zxurian

Link to comment

Zxurian

Link to comment

JorgeB

Link to comment

Zxurian

Link to comment

Zxurian

Link to comment

JorgeB

Link to comment

Zxurian

Link to comment

Join the conversation