(SOLVED) unRAID webgui not responding, everything else okay


robobub

Recommended Posts

unRAID's GUI stopped responding to me, reporting "500 Internal Server Error" and lots of "upstream timed out (110: Connection timed out) " and "auth request unexpected status: 504 while sending to client" in the syslog.

 

All of my docker GUIs, shares, smb, ssh, etc. all still seem to work fine.  I did run diagnostics on the command line but it seems to be taking a very long time to run.  I am running a fair amount of stuff (duplicati doing a large backup, downloads, preclears, badblocks on drives that are unassigned which discovered issues) but it had been working fine for hours.  I'll post my syslog for now, and diagnostics when that process finishes.

 

Along with finding a cause, is there a way to recover the GUI without rebooting?

 

Issues started around Jan 14 13:30

this is a recent setup, started with 6.8.0 a few weeks ago, and upgraded to 6.8.1 a few days ago

 

 

 

Edited by robobub
Link to comment
17 minutes ago, Squid said:

You should post the entire diagnostics.zip file untouched.  In this particular case, I'm not in the mood to install an app on my work computer to be able to open up a gz

I uploaded a zip file just now.  diagnostics still hasn't completed and isn't using very much CPU, so it seems like it's another potentially related issue, so that's why I just grabbed the syslog. Let me know what other information I can provide to help debug both the GUI issue and diagnostic collection.  I'm comfortable with modifying some php code in /usr/local/sbin/diagnostics

Edited by robobub
Link to comment

Thanks for the suggestions.  I tried closing all browser sessions.  I even sent SIGSTOP to preclear and badblocks, paused all docker containers, and the issue remains. 

 

The system is very responsive via terminal and the docker container GUIs even before any of the above.  Preclear and badblocks have been running for days (I'm doing a lot of passes, as these are old drives and I'm paranoid) before this issue.    I do have a swapfile on my SSD on this machine until more memory arrives, but nothing is being swapped in and out (according to dstat).  

 

I've captured top before and after pausing everything (36% idle, 56% iowait and 72% idle, 25% iowait respectively), let me know if there's anything else I can provide.  

 

Is restarting the nginx service safe to do and a potential way of recovering? Though perhaps it's worth keeping in this state to figure out what happened.

 

top-2020-01-14.zip

 

 

 

 

 

 

 

 

Edited by robobub
Link to comment

So the issue is one of my failing drives, that I'm running tests on, has become unresponsive and doesn't respond to smartctl.  It is interesting though since it is still making a bit of progress on badblocks, and the smartctl query that's hung even has a timeout parameter, but that doesn't help.   So the GUI is also querying that drive and hanging.  Nothing in dmesg about the drive. 

 

Removing that drive restores everything without rebooting.

Edited by robobub
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.