[SOLVED] Non-responsive GUI at about 5am every day requiring reboot

February 17, 201610 yr

This, once again, is not really a support request more a support note for any of those people who run into the same issue. I have had issues with my setup over the past week that have ranged from:

- Issues with br0: received packet on eth0 with own address as source address (resolution available here: http://lime-technology.com/forum/index.php?topic=46516.0)

- Now to the GUI being unresponsive at about 5am every day and requiring a reboot to get it going again.

I solve the first issue as indicated in the link above. The second issue - UNRESPONSIVE GUI was weird. It gets reported on the forums ALOT so I didn't want to just add another "Unresponsive GUI post" and I knew my logs showed nothing as I am adept enough to decipher them myself. It seemed like some good old isolation testing and debugging was the order of the day.

I post this now in the hope that those with unresponsive GUI issues (and I know this can happen for a variety of reasons) can resolve their issue using my experience.

So here is the story .....

I couldn't see anything in the Log's that would cause it BUT I found when I woke every morning the GUI was unresponsive. Indeed the Docker services on the same machine were unresponsive too. The SMB and NFS shares were intermittently available but in the main they too were unavailable. I could Telnet into the box after a "few tries" but the connection would often drop or if I did a "cat" on the syslog it would take an age to output the log to the terminal (and even sometimes hang or crash the telnet connection). Still no errors in the log. VM's however seemed unaffected and I was able to make an RDP connection and use them with no issue at all.

I found if I tried the GUI multiple times sometimes it would display a page that was not formatted correctly (e.g. one column or no images or everything but drive info or some other combination). Sometimes the buttons on the GUI would work - sometimes not.

I started taking a systemic approach to this, ruling out things every day. I first disabled docker (day 1), then vm's (day 2) then both (day 3) then started making my way through the plugins (day 4, 5 and then 6). On day 6 the same thing happened and I was getting frustrated so I disabled all plugins and took my server down to just basic unRAID functionality. Still nothing in the logs and the issue STILL happened when I got up. My 10th reboot of the week. Note I was still able to reset cleanly by using the following from the command line:

powerdown - r - t 1

For whatever reason on the 7th day I started my troubleshooting from within my Windows 10 VM (as I was first of all doing something else completely unrelated) so I tried the unRAID GUI. Boom - it came up. FAST. Shocked I skipped back to the main machine (iMac) and same issue GUI unresponsive. I then tried the unRAID GUI from Ubuntu VM. Boom - it came up. FAST. Tried it on my iPhone. GUI unresponsive. Tried it from the iPad. GUI unresponsive. Tried it from the MacBook Air. GUI unresponsive. Safari? Surely not. Tried it from the Windows 10 VM (on the Backup Server). GUI unresponsive. Tried telnet from the Backup Server and did a "cat" of the syslog - connection dropped. WTF!??

The iMac and the Backup Server are on the same Gigabit switch (connected by CAT6e to the router) as the Main Server (which is having the issues) and all other clients are wireless direct to the router. My router runs dd-wrt so I decided to upgrade the firmware. The issue persisted.

I then thought about the other variables in the equation. Clearly the Server was running fine as it could be accessed by VM's running on the same machine - but Dockers were affected. Is this just something blocking a connection to the Main Server. Logs on the router and the Main and Backup Servers were clean. No packets were reported as being lost. Communication according to what I was reading in all the logs was fine.

Then I had/needed a coffee (it was too early for wine). Communication to the Main Server HAS to be the issue here. Light Bulb. The Switch? No the VM's work and the Dockers don't. But then I thought perhaps the communication between the VM's and the unRAID GUI were somehow going local and not via the switch and perhaps the Dockers were (for whatever reason) relaying out and then back via the switch.

So I swapped out the switch and put in a new one (that I had ready and waiting boxed up as a spare). BOOM. Everything was suddenly normal. Everything was working. THE SWITCH!!!! I wanted to be sure that this issue was NOT related to the other bonding issue I had. It turns out it wasn't as I was able to replicate that issue with the new switch.

So there it is. Faulty switch. No errors. No logs. Just silent issues caused by a faulty switch - which I could only diagnose by swapping it out with another one (as it is an unmanaged switch with no logs etc). Everything is now fine and dandy. I am so glad I was patient and worked through things methodically. I was thinking of being rash and doing things like buying a new router etc BUT what a waste of money that would be.

I hope this helps someone in the future and solves an issue they might be having.

Quote

[SOLVED] Non-responsive GUI at about 5am every day requiring reboot

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)