Jump to content

Unraid docker (i think) issues with shutdown / stability


Zxurian

Recommended Posts

so I've been having some issues lately that I can't pin down and hoping someone can help. Through searching forums, it looks as long it's a problem with Docker hanging. Symptoms include

* going to WebUI, Dashboard & docker tabs do not load and just spin, but other tabs work fine.

* trying to shut down, WebUI doesn't load, but I can still ssh in (running `diagnostics` and logs attached.)

* Entire system stops responding, and unable to access Network shares or docker URLs, terminal also does not respond, have to hard shut down, but comes back up fine

 

I have read that unclean shutdowns aren't good, but I've tried everything else, including just hitting the power button once for a graceful shutdown. 20m later, it's still on, so something is hanging it. Only option is a hard shutdown, then bringing it back up.

 

Can anyone give me pointers to look at what might be causing it?

media-1-diagnostics-20201114-1502.zip

Edited by Zxurian
didn't attach log
Link to comment

So the "received packet on bond0 with own address" error I'm researching separately, and pretty sure it has to do with the fact that I'm using both NICs on the r710 in a balance-rr configuration. While it's NIC related, I've switched it to active-backup(1) for the time being, which shouldn't flood the syslog with packet entries.

 

Another issue I just ran into is my cache drive (btrfs) stopped responding overnight Logs had entries like the following

BTRFS: error (device sdg1) in btrfs_replay_log:2351: errno=-5 IO failure (Failed to recover log tree)

After some research today, I was able to recover all of the files off of it using your excellent FAQ. My instinct is that btrfs failed due to unclean shutdown due to my above issues. Hardware testing doesn't show any issues with the SSD's I'm using so I reformatted it, and am copying the files back to it now.

 

Would the network related crash you saw result in symptoms described above? I had thought no, but would be welcome to be wrong.

Link to comment

So my Unraid has been running fine for the past 24 hours (except for the brtfs cache issue mentioned above).

Given that the timeout / crash only happens sporadically, is the best bet to wait until it stops responding again, _then_ see if I can still ssh in and get the log and post?

If I can't ssh in (as I couldn't the last time), and am forced to hard shutdown, what you recommend the next steps be to get the most complete picture of why it might have timed out?

Link to comment
On 11/17/2020 at 3:03 AM, JorgeB said:

I would recommend to backup and re-format cache now.

Thanks, did that Monday, everything seems to be okay with it, haven't had issues yet at least. _If_ it gets to the point where I have to hard power off because Unraid isn't responding again, upon power on, what is the first thing I should do to get proper logs that might show why Unraid stopped responding.

Link to comment

so issue just happened again last night. Unraid system became completely unresponsive, no activity on any of the drive lights (sat for a full 5 minutes in front of the server). No WebGUI, and unable to telnet/ssh in either. Had to hard power it down this morning.

 

What should I do at this point to get the information required to figure out why it's hard locking?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...