Unable to load web interface or Hit Shares [SOLVED]


Recommended Posts

Something is causing you to run out of memory according to the screenshot. Since you have tried different hardware, and most others are not experiencing this issue with their software configuration, seems like it must be related to the plugins or dockers you are running.

 

To be clear this particular memory loss issue crept up when I moved to different hardware (including different ram).  My previous 20+ crashes were from the mover hanging and one theory was not enough resources.  Moving hardware is my attempt to prove hardware was related to the mover crashing.  All previous crashes I could still get in via telnet whereas this time it was locked up completely.

Maybe when you had less memory it crashed quicker, such as when mover rsync started needing memory. Now it takes longer, maybe something is leaking memory but it just takes longer to use it up. Different processes can get killed due to out-of-memory, maybe emhttp, maybe smb, maybe telnetd, so the symptoms could vary.

 

OOM killing emhttp used to be a problem on 32bit v5. There were some workarounds for it back then. I think with 64bit v6 and enough RAM it should be pretty hard to use up memory unless some app is broken.

 

Interesting.  I feel like I am using pretty popular dockers/plugins that everyone else seems to be running smoothly.  See below.

 

pubZ88p.png

 

QIXIpsC.png

Link to comment
  • Replies 117
  • Created
  • Last Reply

Top Posters In This Topic

I think I witnessed a crash as it was happening last night.  My shares became unresponsive from my windows box but was still streaming a movie from my mac mini running Plex.  I quickly grabbed the disagnostics file from web gui.  A few of my docker web interfaces stopped responding.  Then my web interface for unraid stopped responding.  And it then it completely locked up.  Diagnostics attached.

 

Anything else I can do to troubleshoot?  I've gone close to 2 months now without keeping this thing up for more than a few days.

nas-diagnostics-20151010-2232.zip

Link to comment

I think I witnessed a crash as it was happening last night.  My shares became unresponsive from my windows box but was still streaming a movie from my mac mini running Plex.  I quickly grabbed the disagnostics file from web gui.  A few of my docker web interfaces stopped responding.  Then my web interface for unraid stopped responding.  And it then it completely locked up.  Diagnostics attached.

 

Anything else I can do to troubleshoot?  I've gone close to 2 months now without keeping this thing up for more than a few days.

 

Looks like there are some S.M.A.R.T. command timeout errors for disk 5 (ST3000DM001-9YN166_Z1F12JLY).  Also, your prior post revealed the mover never finished but disk 5 was never spun down with the rest of the disks.  The disk might be going bad or there some possible reiserfs corruption somewhere (even though reiserfsck wasn't able to detect anything).

 

Try running this script from your console, it'll iterate through all files on disk5 and attempt to fully read the file contents.  If there is reiserfs corruption you'll likely be able to see which file it stopped on (this will take a while to run):

find /mnt/disk5 -type f -exec rsync --progress {} /dev/null \;

Link to comment

I think I witnessed a crash as it was happening last night.  My shares became unresponsive from my windows box but was still streaming a movie from my mac mini running Plex.  I quickly grabbed the disagnostics file from web gui.  A few of my docker web interfaces stopped responding.  Then my web interface for unraid stopped responding.  And it then it completely locked up.  Diagnostics attached.

 

Anything else I can do to troubleshoot?  I've gone close to 2 months now without keeping this thing up for more than a few days.

 

Looks like there are some S.M.A.R.T. command timeout errors for disk 5 (ST3000DM001-9YN166_Z1F12JLY).  Also, your prior post revealed the mover never finished but disk 5 was never spun down with the rest of the disks.  The disk might be going bad or there some possible reiserfs corruption somewhere (even though reiserfsck wasn't able to detect anything).

 

Try running this script from your console, it'll iterate through all files on disk5 and attempt to fully read the file contents.  If there is reiserfs corruption you'll likely be able to see which file it stopped on (this will take a while to run):

find /mnt/disk5 -type f -exec rsync --progress {} /dev/null \;

 

Thanks for digging in eschultz!  I am running this script now.

Link to comment

I let it run overnight and it stopped at some point claiming no space left on disk...  Even though disk5 has 650+GB free?

 

 155,818 100%  117.35MB/s    0:00:00 (xfr#1, to-chk=0/1)
rsync: write failed on "/dev/null": No space left on device (28)
rsync error: error in file IO (code 11) at receiver.c(389) [receiver=3.1.0]
backdrop1.jpg
         76,927 100%   42.11MB/s    0:00:00 (xfr#1, to-chk=0/1)
rsync: write failed on "/dev/null": No space left on device (28)
rsync error: error in file IO (code 11) at receiver.c(389) [receiver=3.1.0]
backdrop2.jpg
        304,063 100%  258.73MB/s    0:00:00 (xfr#1, to-chk=0/1)
rsync: write failed on "/dev/null": No space left on device (28)
rsync error: error in file IO (code 11) at receiver.c(389) [receiver=3.1.0]
backdrop3.jpg
        392,449 100%  171.51MB/s    0:00:00 (xfr#1, to-chk=0/1)
rsync: write failed on "/dev/null": No space left on device (28)
rsync error: error in file IO (code 11) at receiver.c(389) [receiver=3.1.0]
folder.jpg
        551,868 100%  495.05MB/s    0:00:00 (xfr#1, to-chk=0/1)
rsync: write failed on "/dev/null": No space left on device (28)
rsync error: error in file IO (code 11) at receiver.c(389) [receiver=3.1.0]

 

what does that mean?

Link to comment

I let it run overnight and it stopped at some point claiming no space left on disk...  Even though disk5 has 650+GB free?

 

 155,818 100%  117.35MB/s    0:00:00 (xfr#1, to-chk=0/1)
rsync: write failed on "/dev/null": No space left on device (28)
rsync error: error in file IO (code 11) at receiver.c(389) [receiver=3.1.0]
backdrop1.jpg
         76,927 100%   42.11MB/s    0:00:00 (xfr#1, to-chk=0/1)
rsync: write failed on "/dev/null": No space left on device (28)
rsync error: error in file IO (code 11) at receiver.c(389) [receiver=3.1.0]
backdrop2.jpg
        304,063 100%  258.73MB/s    0:00:00 (xfr#1, to-chk=0/1)
rsync: write failed on "/dev/null": No space left on device (28)
rsync error: error in file IO (code 11) at receiver.c(389) [receiver=3.1.0]
backdrop3.jpg
        392,449 100%  171.51MB/s    0:00:00 (xfr#1, to-chk=0/1)
rsync: write failed on "/dev/null": No space left on device (28)
rsync error: error in file IO (code 11) at receiver.c(389) [receiver=3.1.0]
folder.jpg
        551,868 100%  495.05MB/s    0:00:00 (xfr#1, to-chk=0/1)
rsync: write failed on "/dev/null": No space left on device (28)
rsync error: error in file IO (code 11) at receiver.c(389) [receiver=3.1.0]

 

what does that mean?

Log in to the server.

 

What's the output of

ls -l /dev/null

 

/dev/null is a character device that should be impossible to fill up unless somehow an app managed to accidentally create a file called /dev/null.

Link to comment

output of ls is: -rw-rw-rw- 1 root root 0 Oct 12 19:16 /dev/null

Something running on your system has inadvertently deleted /dev/null  (it should be crw-rw-rw).  The next time a bash command outputted to /dev/null (very common occurance), it recreated it as a file.

 

If its just a fluke, then a reset will fix it.  If its a recurring issue, then you've got to determine where the problem lies.  Did you accidentally do it somehow with a mistyped command?  Is it a plugin or script that you're running?  If the latter, you're going to have to uninstall everything and then add them back one at a time, run it for a bit and keep checking the output of that command.

 

 

Or, alternatively you can enter this:

 

rm /dev/null
mknod -m 666 /dev/null c 1 3

If the mknod tells you that the file exists, it means that in between the rm command and the mknod command the file got recreated.  You'll have to keep repeating the series of commands until it works.

 

After success, the ls output should be similar to this:

crw-rw-rw- 1 root root 1, 3 Oct 12 17:15 /dev/null

 

Link to comment

 

 

Did you accidentally do it somehow with a mistyped command?  Is it a plugin or script that you're running? 

 

No sir, not that I know of.  I run pretty normal/stock plugins and just a few dockers.  I can't think of any reason that file would have been modified.

 

I have recreated that file and ls -l is now outputting:

crw-rw-rw- 1 root root 1, 3 Oct 12 20:41 /dev/null

 

I will keep an eye on it.

 

Could this have been causing all of my issues with locking up and crashing, or just something you noticed as not being right?

Link to comment

 

 

Did you accidentally do it somehow with a mistyped command?  Is it a plugin or script that you're running? 

 

No sir, not that I know of.  I run pretty normal/stock plugins and just a few dockers.  I can't think of any reason that file would have been modified.

 

I have recreated that file and ls -l is now outputting:

crw-rw-rw- 1 root root 1, 3 Oct 12 20:41 /dev/null

 

I will keep an eye on it.

 

Could this have been causing all of my issues with locking up and crashing, or just something you noticed as not being right?

Not sure... but it was definitely not correct.  Only way for that last errors that you posted was the device being deleted.  And then automatically recreated as a file.  You may have filled up your memory with it, and it *may* have caused the problems.  Time will tell.
Link to comment

Won't hurt.  Actually, run it - let it go for a bit then stop it and run the check on the ls -l /dev/null again.  I know squat about rsync and am curious if Eric made a typo there that caused the issue.

 

Killed that find after about 30 minutes, re ran ls- l on /dev/null:

-rw-rw-rw- 1 root root 984 Oct 12 21:51 /dev/null

 

Something not right with that find?  I will remove null again and re-add it like before and check on it again

Link to comment

let it set overnight and  ls -l /dev/null  still looks good...must have been something wonky with that find script.

 

Any other troubleshooting steps you can recommend?

 

Again much appreciation in advance.

 

Sorry about that, the rsync command was changing /dev/null unexpectedly.  To be fair, I just checked and my /dev/null was messed up too :)

 

Here's a alternative script to try that won't mess up /dev/null (you just wont see pretty progress bars):

find /mnt/disk5 -type f -print -exec cp {} /dev/null \;

 

Squid: Thanks for stepping in to help him restore his /dev/null device

Link to comment

let it set overnight and  ls -l /dev/null  still looks good...must have been something wonky with that find script.

 

Any other troubleshooting steps you can recommend?

 

Again much appreciation in advance.

 

Sorry about that, the rsync command was changing /dev/null unexpectedly.  To be fair, I just checked and my /dev/null was messed up too :)

 

Here's a alternative script to try that won't mess up /dev/null (you just wont see pretty progress bars):

find /mnt/disk5 -type f -print -exec cp {} /dev/null \;

 

Squid: Thanks for stepping in to help him restore his /dev/null device

 

Thanks, running now.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.