Unable to load web interface or Hit Shares [SOLVED]


Recommended Posts

The script finished on disk 5 and didn't get hung up anywhere.

 

Any other ideas?

 

Thanks for running that test, seems like disk 5 is ok.  I have a suspicion of a deadlock occurring in shfs/FUSE (rare situations) but not able to reproduce it here.  It's been about a week, have you had any lockups since?

 

I think if docker is stopped before the mover starts and started back up after the mover finishes we could narrow down (or maybe eliminate) the issue.  We can help you script in the Docker stop/start in to the mover script if you want to try this?

Link to comment
  • Replies 117
  • Created
  • Last Reply

Top Posters In This Topic

The script finished on disk 5 and didn't get hung up anywhere.

 

Any other ideas?

 

Thanks for running that test, seems like disk 5 is ok.  I have a suspicion of a deadlock occurring in shfs/FUSE (rare situations) but not able to reproduce it here.  It's been about a week, have you had any lockups since?

 

I think if docker is stopped before the mover starts and started back up after the mover finishes we could narrow down (or maybe eliminate) the issue.  We can help you script in the Docker stop/start in to the mover script if you want to try this?

 

Thanks for following up.  I have turned off my mover script for now and only run it if my cache drive starts to fill up.  So no lockups technically.

 

I am actually in the process of migrating all of my disks to XFS because I am willing to try anything at this point.  Someone mentioned earlier in the thread that this could help.  I am happy to try the docker stop/start during mover idea.  Please let me know what I should modify in my mover script to facilitate this.

 

Thanks!

Link to comment

The script finished on disk 5 and didn't get hung up anywhere.

 

Any other ideas?

 

Thanks for running that test, seems like disk 5 is ok.  I have a suspicion of a deadlock occurring in shfs/FUSE (rare situations) but not able to reproduce it here.  It's been about a week, have you had any lockups since?

 

I think if docker is stopped before the mover starts and started back up after the mover finishes we could narrow down (or maybe eliminate) the issue.  We can help you script in the Docker stop/start in to the mover script if you want to try this?

 

Thanks for following up.  I have turned off my mover script for now and only run it if my cache drive starts to fill up.  So no lockups technically.

 

I am actually in the process of migrating all of my disks to XFS because I am willing to try anything at this point.  Someone mentioned earlier in the thread that this could help.  I am happy to try the docker stop/start during mover idea.  Please let me know what I should modify in my mover script to facilitate this.

 

Thanks!

 

When you're at a good point to experiment (XFS conversion is finished) you can edit /usr/local/sbin/mover (using nano or other editor).  Look for the

echo "mover started"

and add this line right after:

/usr/local/emhttp/plugins/dynamix.docker.manager/event/stopping_svcs

 

next, add this line of code at the end of the mover script:

/usr/local/emhttp/plugins/dynamix.docker.manager/event/started

 

Save the changes to the mover script and turn back on the mover schedule from the webGUI.  Let's see if that solves the lock-ups when the mover script is running...

Link to comment
  • 2 weeks later...

Thanks for posting your experience. I was pulling my hair out on this one.

 

I recently upgraded from v5.0.5 to v6.1.3, with all reiserfs disks. I'm not running any Dockers or extra plugins. As was stated throughout this thread, the behavior is that the mover starts and the server dies a slow death. This happens between every 1-4 days. When trying to shutdown from command-line, 2 disks come back as busy. When I do a ps aux, I see a very large number of smbd and the system load is huge (like 456.2).

 

I am unable to regain control of the machine to initiate a proper shutdown, so I have to press the power button and then run a parity check when it comes back.

 

Like @pickthenimp, I'm all reiserfs. It's good to hear that moving to XFS has fixed the issue. I guess I'll begin researching that. In the meantime, I've disabled the automatic mover and will run it manually to hopefully contain the issue until I can move to XFS.

 

Link to comment

EDIT:  This might be the wrong thread for the issue I am having.  I am really not sure so, whatever.....  :-)

 

I still don't have a clear indication of what is causing the problem to be honest.  There is a core issue, and then there is a residual issue caused by the core issue, I think, but I am stable now and not really testing anymore.

 

I can tell you that stopping the drive spin downs and turning off the cache drive (and thus the mover) did not solve the problem for me so I don't think the mover causes the issue.  I think it is affected by the residual issue.

 

However, I also haven't had a lockup for two weeks and I think that was due to one change made that doesn't involve unraid.  I disabled Sonarr (Which runs on another server, not unraid) from being able to rename and move files to unraid.  I had turned this on a while back so that I could stop using a different media sorter.  I went back to my old media sorter and it's been stable.  This is hardly conclusive, I know, but here is what I found:

 

The Core Issue

Something, in this case, for me, it was Sonarr, is writing over the network and triggers the core issue.  This causes the samba duplicate processes as it's write has failed and it's retrying over and over again.  Once the array is in this state it must be hard rebooted.  All attempts to stop the array manually or kill processes fail.  The hard reboot CAUSES THE RESIDUAL ISSUE.  Now, while the core issue has been triggered, samba stops responding to network requests but it also locks files on the array, so while in this state the mover will not run either.  It runs and hangs due to the general state of the array and it's processes.  When you find the machine in this state it SEEMS the mover has caused the issue but I it happens to me with the cache drive disabled completely.

 

The Residual Issue

After the hard reboot, the filesystem on some or all of the drives is in a bad state.  Transactions need to be replayed and sorted and the FS needs to be checked.  Unraid, not knowing what has happened before does not know to do this for you.  After every single crash since I have been troubleshooting, unraid simply reboots and tries to do a parity check, it does not check for issues with the transaction logs on the disk filesystems, yet if I stop the array, put it into maintenance mode and check the FS manually I get ton of replaying journal messages and they are ALWAYS on disks that were hung before the reboot (I can check that by looking at lsof output before rebooting).

 

So, if you DON'T do the filesystem checks after a reboot then the mover can hang AGAIN because of FS issues and it SEEMS like it is the same issue.  I verified this by shutting down all the apps I have on the net which write to the array and running the mover immediately after a reboot WITHOUT running FS checks.  But this lockup is not the same because samba is not misbehaving.

 

So, how did I test?:

 

  • I turned off disk spindown and the cache drive altogether at Tom's suggestion.  Those have been off since he asked so all of my testing since is with them in that state, eliminating the mover or disk spindown as the cause of the issue.
     
  • I shutdown both applications on my network that do writes to the array, at different times, to find the cuplrit.  Sonarr and Emby are the two programs and neither runs on the array itself.  Sonarr moves files to the array and Emby writes some metadata from time to time. 
     
  • I created different users for every machine on my network so I could trace the one causing the samba issues. 
     

 

What I found was:

  • The one causing/triggering the issues was always the server running both Emby and Sonarr.
  • Reads never seem to cause the problems.  Writes are what triggers the CORE issue and ONLY writes over the network.
  • With both Emby/Sonarr shutdown the array was stable.
  • After every hard reboot there needed to be FS checks run first or even writing data to the array using windows explorer would cause lockups and failures.

 

Eventually I ran Emby only for a while and it was still stable.  When I turned Sonarr back on the problem came back. So, I reconfigured Sonarr to stop writing to the array at all and started using MetaBrowser, which I used to use, to sort and write my TV progs to the array.

 

In this configuration my array has been up for an hour shy of two full weeks.

 

The only conclusions I can draw are:

 

Writes over the network are causing the issue, but not all writes.

 

-----------------------------------------------------------------------------------------------------

I have no idea if it's Sonarr, or the WAY Sonarr is doing it's writes, or if it's Samba that is the problem. (I think we can assume that it's not Sonarr exclusively that is the issue because not everyone having problems is running it.)

 

I have no idea if Reiserfs is a factor in the CORE issue. (Suggestions in this thread seem to point that way.)

 

I don't know that this post does anything but add more confusion to the issue.  :-/

 

 

Link to comment

EDIT:  This might be the wrong thread for the issue I am having.  I am really not sure so, whatever.....  :-)

-----------------------------------------------------------------------------------------------------

I have no idea if it's Sonarr, or the WAY Sonarr is doing it's writes, or if it's Samba that is the problem. (I think we can assume that it's not Sonarr exclusively that is the issue because not everyone having problems is running it.)

 

I have no idea if Reiserfs is a factor in the CORE issue. (Suggestions in this thread seem to point that way.)

 

I don't know that this post does anything but add more confusion to the issue.  :-/

 

Thanks for the info. I do have Sonarr managing my TV so I'll keep this in mind.

Link to comment
I do have Sonarr managing my TV so I'll keep this in mind.

 

Cool. 

 

To note though:  If Sonarr is running as a plugin then all it's writes would not be to network shares but local shares and I don't think that would trigger the issue I was/are having.  I think it's specific to Sonarr writing to unraid over the network.  I THINK.

Link to comment

I do have Sonarr managing my TV so I'll keep this in mind.

 

Cool. 

 

To note though:  If Sonarr is running as a plugin then all it's writes would not be to network shares but local shares and I don't think that would trigger the issue I was/are having.  I think it's specific to Sonarr writing to unraid over the network.  I THINK.

 

Understood. My Sonarr is running on a different host. Thx

Link to comment
  • 2 weeks later...

...

 

I can tell you that stopping the drive spin downs and turning off the cache drive (and thus the mover) did not solve the problem for me so I don't think the mover causes the issue.  I think it is affected by the residual issue.

 

...

 

I'm continuing to troubleshoot this and like you, I have disabled my cache drive. After disabling, I experienced my first hang yesterday afternoon... could not browse my shares from my Mac nor my Ubuntu Plex server so I telnet in and the system load is @ 12 and climbing. Same exact issue as what I was experiencing while the mover had been running. If I let it sit, the load continues into the 500+ range (never let it go longer... I imagine it would). So I can confirm your comments that it does not appear to be directly related to the mover process.

 

Questions for you:

 

What combination of RFS/XFS/BTRFS disks do you have in your array? (Edit: I saw another post of yours where you mention you have all RFS) @pickthenimp mentioned no longer having this problem after migrating to XFS, and I'm strongly considering that. I don't really want to go back to V5... I'd rather get V6 working as its supposed to.

 

Did you experience this issue in v5 as well? I had v5.0.5 setup on the exact same hardware with the same supporting apps on other hosts (i.e. Sonarr) and never encountered this.

 

 

Thanks

Link to comment
  • 2 months later...

I am super elated to report I've had 9 days of uptime with no lockups.  Thanks to all for the help and suggestions.

 

Ultimately, converting all of my drives to xfs solved the issue.

 

pickthenimp, have you had any other issues since your last post? I switched over to v6 around New Years (currently on 6.1.7, but going to upgrade as soon as I reboot the server), and I've had nothing but lock ups since. Most recently I went a full two weeks without a lock up, but it just finally bit it again last night. My lock ups coincided exactly with my switch to v6, so I'm certain that's the problem. If moving to XFS fixed your problem for good, then that's the route I would like to take as well, but I don't to go through the hassle without knowing for sure first.

 

My server does not have a cache drive, so I'm certain my issue is not related to the mover script as you suspected yours was.

Link to comment

 

I am super elated to report I've had 9 days of uptime with no lockups.  Thanks to all for the help and suggestions.

 

Ultimately, converting all of my drives to xfs solved the issue.

 

pickthenimp, have you had any other issues since your last post? I switched over to v6 around New Years (currently on 6.1.7, but going to upgrade as soon as I reboot the server), and I've had nothing but lock ups since. Most recently I went a full two weeks without a lock up, but it just finally bit it again last night. My lock ups coincided exactly with my switch to v6, so I'm certain that's the problem. If moving to XFS fixed your problem for good, then that's the route I would like to take as well, but I don't to go through the hassle without knowing for sure first.

 

My server does not have a cache drive, so I'm certain my issue is not related to the mover script as you suspected yours was.

 

Correct. No issues since upgrading to XFS. It's a pita but worth it.

Link to comment

 

I am super elated to report I've had 9 days of uptime with no lockups.  Thanks to all for the help and suggestions.

 

Ultimately, converting all of my drives to xfs solved the issue.

 

pickthenimp, have you had any other issues since your last post? I switched over to v6 around New Years (currently on 6.1.7, but going to upgrade as soon as I reboot the server), and I've had nothing but lock ups since. Most recently I went a full two weeks without a lock up, but it just finally bit it again last night. My lock ups coincided exactly with my switch to v6, so I'm certain that's the problem. If moving to XFS fixed your problem for good, then that's the route I would like to take as well, but I don't to go through the hassle without knowing for sure first.

 

My server does not have a cache drive, so I'm certain my issue is not related to the mover script as you suspected yours was.

 

Correct. No issues since upgrading to XFS. It's a pita but worth it.

 

Thanks for the quick response! I guess I know what I'm doing this weekend now. At least I have a 4 TB (same as my largest drive) that's precleared and ready to go...

Link to comment

I'm not sure how many disks you have or how long this will take you, but you may want to (temporarily?) install a large-ish XFS cache drive and disable your mover or set it to not run for a while. While I still had all RFS data disks, I found that I did not encounter the issue while writing solely to the XFS cache. It was only when writing to the RFS array either directly or via the mover that I had the issue. I confirmed this by using a 500GB drive as my cache with mover disabled for about a month during which I had 0 system hangs. After setting the mover back to run daily, the system hung within two days.

 

I can also confirm that moving from all RFS (each ~90% full) to XFS fixed my issue as well. It took me from Jan 10th - Jan 27th to finish all my disks, but since then I have not had a single occurrence of the problem. I am back to the stability I had with v5.0.5 and loving it!

 

I also extend my thanks to @pickthenimp  :)

Link to comment
  • 2 months later...

 

I am super elated to report I've had 9 days of uptime with no lockups.  Thanks to all for the help and suggestions.

 

Ultimately, converting all of my drives to xfs solved the issue.

 

pickthenimp, have you had any other issues since your last post? I switched over to v6 around New Years (currently on 6.1.7, but going to upgrade as soon as I reboot the server), and I've had nothing but lock ups since. Most recently I went a full two weeks without a lock up, but it just finally bit it again last night. My lock ups coincided exactly with my switch to v6, so I'm certain that's the problem. If moving to XFS fixed your problem for good, then that's the route I would like to take as well, but I don't to go through the hassle without knowing for sure first.

 

My server does not have a cache drive, so I'm certain my issue is not related to the mover script as you suspected yours was.

 

Correct. No issues since upgrading to XFS. It's a pita but worth it.

 

Thanks for the quick response! I guess I know what I'm doing this weekend now. At least I have a 4 TB (same as my largest drive) that's precleared and ready to go...

 

Just to update for anyone else having this issue, after migrating each disk to XFS (a process that took about a week and a half for 8 data disks, the smallest of which was 2TB) I no longer have this issue. My server has been running uninterrupted for over a month. The only reason I took it down in between the change over and a month ago was to switch out a drive with a larger one. I've done several parity checks in the interim as well, with no errors to report.

 

As far as I'm concerned, the issue was with RFS. I'm not sure if it was due to me having a few drives in my array that were >90% full, but moving to XFS has fixed my problem.

Link to comment
  • 1 month later...

While problems often seem alike, they rarely are.  Please try not to hijack other users support threads.  I have split 2 topics away from here, where it will be easier for you to receive individual help, and it's less confusing for us(!) -

  Re: Unable to load web interface or Hit Shares - CyberMew

  Re: Unable to load web interface or Hit Shares - mnever33

 

As a reminder, please see the guidelines here -> Need help? Read me first!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.