Zxurian Posted November 14, 2020 Share Posted November 14, 2020 (edited) so I've been having some issues lately that I can't pin down and hoping someone can help. Through searching forums, it looks as long it's a problem with Docker hanging. Symptoms include * going to WebUI, Dashboard & docker tabs do not load and just spin, but other tabs work fine. * trying to shut down, WebUI doesn't load, but I can still ssh in (running `diagnostics` and logs attached.) * Entire system stops responding, and unable to access Network shares or docker URLs, terminal also does not respond, have to hard shut down, but comes back up fine I have read that unclean shutdowns aren't good, but I've tried everything else, including just hitting the power button once for a graceful shutdown. 20m later, it's still on, so something is hanging it. Only option is a hard shutdown, then bringing it back up. Can anyone give me pointers to look at what might be causing it? media-1-diagnostics-20201114-1502.zip Edited November 14, 2020 by Zxurian didn't attach log Quote Link to comment
Squid Posted November 14, 2020 Share Posted November 14, 2020 3 minutes ago, Zxurian said: (running `diagnostics` and logs attached.) Nothing is attached here Quote Link to comment
Zxurian Posted November 14, 2020 Author Share Posted November 14, 2020 25 minutes ago, Squid said: Nothing is attached here 'cause I'm an idiot. Just attached the log. Quote Link to comment
JorgeB Posted November 16, 2020 Share Posted November 16, 2020 Not a network guy but there's something misconfigured: Nov 7 20:02:53 media-1 kernel: br0: received packet on bond0 with own address as source address (addr:d4:ae:52:7d:0f:65, vlan:0 ) Also a network related crash after that. Quote Link to comment
Zxurian Posted November 16, 2020 Author Share Posted November 16, 2020 So the "received packet on bond0 with own address" error I'm researching separately, and pretty sure it has to do with the fact that I'm using both NICs on the r710 in a balance-rr configuration. While it's NIC related, I've switched it to active-backup(1) for the time being, which shouldn't flood the syslog with packet entries. Another issue I just ran into is my cache drive (btrfs) stopped responding overnight Logs had entries like the following BTRFS: error (device sdg1) in btrfs_replay_log:2351: errno=-5 IO failure (Failed to recover log tree) After some research today, I was able to recover all of the files off of it using your excellent FAQ. My instinct is that btrfs failed due to unclean shutdown due to my above issues. Hardware testing doesn't show any issues with the SSD's I'm using so I reformatted it, and am copying the files back to it now. Would the network related crash you saw result in symptoms described above? I had thought no, but would be welcome to be wrong. Quote Link to comment
Zxurian Posted November 17, 2020 Author Share Posted November 17, 2020 So my Unraid has been running fine for the past 24 hours (except for the brtfs cache issue mentioned above). Given that the timeout / crash only happens sporadically, is the best bet to wait until it stops responding again, _then_ see if I can still ssh in and get the log and post? If I can't ssh in (as I couldn't the last time), and am forced to hard shutdown, what you recommend the next steps be to get the most complete picture of why it might have timed out? Quote Link to comment
JorgeB Posted November 17, 2020 Share Posted November 17, 2020 3 hours ago, Zxurian said: Given that the timeout / crash only happens sporadically, is the best bet to wait until it stops responding again I would recommend to backup and re-format cache now. Quote Link to comment
Zxurian Posted November 19, 2020 Author Share Posted November 19, 2020 On 11/17/2020 at 3:03 AM, JorgeB said: I would recommend to backup and re-format cache now. Thanks, did that Monday, everything seems to be okay with it, haven't had issues yet at least. _If_ it gets to the point where I have to hard power off because Unraid isn't responding again, upon power on, what is the first thing I should do to get proper logs that might show why Unraid stopped responding. Quote Link to comment
Zxurian Posted November 24, 2020 Author Share Posted November 24, 2020 so issue just happened again last night. Unraid system became completely unresponsive, no activity on any of the drive lights (sat for a full 5 minutes in front of the server). No WebGUI, and unable to telnet/ssh in either. Had to hard power it down this morning. What should I do at this point to get the information required to figure out why it's hard locking? Quote Link to comment
JorgeB Posted November 24, 2020 Share Posted November 24, 2020 You can try this: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=781601 Quote Link to comment
Zxurian Posted November 29, 2020 Author Share Posted November 29, 2020 On 11/24/2020 at 2:03 PM, JorgeB said: You can try this: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=781601 thank you, setup a remote syslog-server not hosted on Unraid, so if I lose access again, I will reference those logs. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.