pickthenimp Posted October 5, 2015 Author Share Posted October 5, 2015 Something is causing you to run out of memory according to the screenshot. Since you have tried different hardware, and most others are not experiencing this issue with their software configuration, seems like it must be related to the plugins or dockers you are running. To be clear this particular memory loss issue crept up when I moved to different hardware (including different ram). My previous 20+ crashes were from the mover hanging and one theory was not enough resources. Moving hardware is my attempt to prove hardware was related to the mover crashing. All previous crashes I could still get in via telnet whereas this time it was locked up completely. Maybe when you had less memory it crashed quicker, such as when mover rsync started needing memory. Now it takes longer, maybe something is leaking memory but it just takes longer to use it up. Different processes can get killed due to out-of-memory, maybe emhttp, maybe smb, maybe telnetd, so the symptoms could vary. OOM killing emhttp used to be a problem on 32bit v5. There were some workarounds for it back then. I think with 64bit v6 and enough RAM it should be pretty hard to use up memory unless some app is broken. Interesting. I feel like I am using pretty popular dockers/plugins that everyone else seems to be running smoothly. See below. Quote Link to comment
trurl Posted October 5, 2015 Share Posted October 5, 2015 What does the Dashboard say about memory usage on the new hardware? Have you noticed it getting used up? Quote Link to comment
pickthenimp Posted October 6, 2015 Author Share Posted October 6, 2015 It's been holding steady at 7% memory usage for a day now. Will keep an eye on it Quote Link to comment
pickthenimp Posted October 7, 2015 Author Share Posted October 7, 2015 Another crash/hang without the mover finishing. This time on new beefier hardware (16gb of ram, 2x xeon cpu). Can't load webui, running diagnostics command from telnet just hangs. Managed to grab syslog which is attached. Htop screen capture here: http://i.imgur.com/FmaGmrn.png Any other advice? syslog.txt.zip Quote Link to comment
pickthenimp Posted October 11, 2015 Author Share Posted October 11, 2015 I think I witnessed a crash as it was happening last night. My shares became unresponsive from my windows box but was still streaming a movie from my mac mini running Plex. I quickly grabbed the disagnostics file from web gui. A few of my docker web interfaces stopped responding. Then my web interface for unraid stopped responding. And it then it completely locked up. Diagnostics attached. Anything else I can do to troubleshoot? I've gone close to 2 months now without keeping this thing up for more than a few days. nas-diagnostics-20151010-2232.zip Quote Link to comment
eschultz Posted October 12, 2015 Share Posted October 12, 2015 I think I witnessed a crash as it was happening last night. My shares became unresponsive from my windows box but was still streaming a movie from my mac mini running Plex. I quickly grabbed the disagnostics file from web gui. A few of my docker web interfaces stopped responding. Then my web interface for unraid stopped responding. And it then it completely locked up. Diagnostics attached. Anything else I can do to troubleshoot? I've gone close to 2 months now without keeping this thing up for more than a few days. Looks like there are some S.M.A.R.T. command timeout errors for disk 5 (ST3000DM001-9YN166_Z1F12JLY). Also, your prior post revealed the mover never finished but disk 5 was never spun down with the rest of the disks. The disk might be going bad or there some possible reiserfs corruption somewhere (even though reiserfsck wasn't able to detect anything). Try running this script from your console, it'll iterate through all files on disk5 and attempt to fully read the file contents. If there is reiserfs corruption you'll likely be able to see which file it stopped on (this will take a while to run): find /mnt/disk5 -type f -exec rsync --progress {} /dev/null \; Quote Link to comment
pickthenimp Posted October 12, 2015 Author Share Posted October 12, 2015 I think I witnessed a crash as it was happening last night. My shares became unresponsive from my windows box but was still streaming a movie from my mac mini running Plex. I quickly grabbed the disagnostics file from web gui. A few of my docker web interfaces stopped responding. Then my web interface for unraid stopped responding. And it then it completely locked up. Diagnostics attached. Anything else I can do to troubleshoot? I've gone close to 2 months now without keeping this thing up for more than a few days. Looks like there are some S.M.A.R.T. command timeout errors for disk 5 (ST3000DM001-9YN166_Z1F12JLY). Also, your prior post revealed the mover never finished but disk 5 was never spun down with the rest of the disks. The disk might be going bad or there some possible reiserfs corruption somewhere (even though reiserfsck wasn't able to detect anything). Try running this script from your console, it'll iterate through all files on disk5 and attempt to fully read the file contents. If there is reiserfs corruption you'll likely be able to see which file it stopped on (this will take a while to run): find /mnt/disk5 -type f -exec rsync --progress {} /dev/null \; Thanks for digging in eschultz! I am running this script now. Quote Link to comment
pickthenimp Posted October 12, 2015 Author Share Posted October 12, 2015 I let it run overnight and it stopped at some point claiming no space left on disk... Even though disk5 has 650+GB free? 155,818 100% 117.35MB/s 0:00:00 (xfr#1, to-chk=0/1) rsync: write failed on "/dev/null": No space left on device (28) rsync error: error in file IO (code 11) at receiver.c(389) [receiver=3.1.0] backdrop1.jpg 76,927 100% 42.11MB/s 0:00:00 (xfr#1, to-chk=0/1) rsync: write failed on "/dev/null": No space left on device (28) rsync error: error in file IO (code 11) at receiver.c(389) [receiver=3.1.0] backdrop2.jpg 304,063 100% 258.73MB/s 0:00:00 (xfr#1, to-chk=0/1) rsync: write failed on "/dev/null": No space left on device (28) rsync error: error in file IO (code 11) at receiver.c(389) [receiver=3.1.0] backdrop3.jpg 392,449 100% 171.51MB/s 0:00:00 (xfr#1, to-chk=0/1) rsync: write failed on "/dev/null": No space left on device (28) rsync error: error in file IO (code 11) at receiver.c(389) [receiver=3.1.0] folder.jpg 551,868 100% 495.05MB/s 0:00:00 (xfr#1, to-chk=0/1) rsync: write failed on "/dev/null": No space left on device (28) rsync error: error in file IO (code 11) at receiver.c(389) [receiver=3.1.0] what does that mean? Quote Link to comment
pickthenimp Posted October 12, 2015 Author Share Posted October 12, 2015 Not sure if this is relevant, but I also got this notification from my unraid last night via email. /bin/sh: line 1: 3132 Bus error /usr/local/emhttp/plugins/dynamix/scripts/monitor &>/dev/null Quote Link to comment
Squid Posted October 12, 2015 Share Posted October 12, 2015 I let it run overnight and it stopped at some point claiming no space left on disk... Even though disk5 has 650+GB free? 155,818 100% 117.35MB/s 0:00:00 (xfr#1, to-chk=0/1) rsync: write failed on "/dev/null": No space left on device (28) rsync error: error in file IO (code 11) at receiver.c(389) [receiver=3.1.0] backdrop1.jpg 76,927 100% 42.11MB/s 0:00:00 (xfr#1, to-chk=0/1) rsync: write failed on "/dev/null": No space left on device (28) rsync error: error in file IO (code 11) at receiver.c(389) [receiver=3.1.0] backdrop2.jpg 304,063 100% 258.73MB/s 0:00:00 (xfr#1, to-chk=0/1) rsync: write failed on "/dev/null": No space left on device (28) rsync error: error in file IO (code 11) at receiver.c(389) [receiver=3.1.0] backdrop3.jpg 392,449 100% 171.51MB/s 0:00:00 (xfr#1, to-chk=0/1) rsync: write failed on "/dev/null": No space left on device (28) rsync error: error in file IO (code 11) at receiver.c(389) [receiver=3.1.0] folder.jpg 551,868 100% 495.05MB/s 0:00:00 (xfr#1, to-chk=0/1) rsync: write failed on "/dev/null": No space left on device (28) rsync error: error in file IO (code 11) at receiver.c(389) [receiver=3.1.0] what does that mean? Log in to the server. What's the output of ls -l /dev/null /dev/null is a character device that should be impossible to fill up unless somehow an app managed to accidentally create a file called /dev/null. Quote Link to comment
pickthenimp Posted October 13, 2015 Author Share Posted October 13, 2015 output of ls is: -rw-rw-rw- 1 root root 0 Oct 12 19:16 /dev/null Quote Link to comment
Squid Posted October 13, 2015 Share Posted October 13, 2015 output of ls is: -rw-rw-rw- 1 root root 0 Oct 12 19:16 /dev/null Something running on your system has inadvertently deleted /dev/null (it should be crw-rw-rw). The next time a bash command outputted to /dev/null (very common occurance), it recreated it as a file. If its just a fluke, then a reset will fix it. If its a recurring issue, then you've got to determine where the problem lies. Did you accidentally do it somehow with a mistyped command? Is it a plugin or script that you're running? If the latter, you're going to have to uninstall everything and then add them back one at a time, run it for a bit and keep checking the output of that command. Or, alternatively you can enter this: rm /dev/null mknod -m 666 /dev/null c 1 3 If the mknod tells you that the file exists, it means that in between the rm command and the mknod command the file got recreated. You'll have to keep repeating the series of commands until it works. After success, the ls output should be similar to this: crw-rw-rw- 1 root root 1, 3 Oct 12 17:15 /dev/null Quote Link to comment
pickthenimp Posted October 13, 2015 Author Share Posted October 13, 2015 Did you accidentally do it somehow with a mistyped command? Is it a plugin or script that you're running? No sir, not that I know of. I run pretty normal/stock plugins and just a few dockers. I can't think of any reason that file would have been modified. I have recreated that file and ls -l is now outputting: crw-rw-rw- 1 root root 1, 3 Oct 12 20:41 /dev/null I will keep an eye on it. Could this have been causing all of my issues with locking up and crashing, or just something you noticed as not being right? Quote Link to comment
Squid Posted October 13, 2015 Share Posted October 13, 2015 Did you accidentally do it somehow with a mistyped command? Is it a plugin or script that you're running? No sir, not that I know of. I run pretty normal/stock plugins and just a few dockers. I can't think of any reason that file would have been modified. I have recreated that file and ls -l is now outputting: crw-rw-rw- 1 root root 1, 3 Oct 12 20:41 /dev/null I will keep an eye on it. Could this have been causing all of my issues with locking up and crashing, or just something you noticed as not being right? Not sure... but it was definitely not correct. Only way for that last errors that you posted was the device being deleted. And then automatically recreated as a file. You may have filled up your memory with it, and it *may* have caused the problems. Time will tell. Quote Link to comment
pickthenimp Posted October 13, 2015 Author Share Posted October 13, 2015 Should I re-run this script again to see if I have issues with disk 5? find /mnt/disk5 -type f -exec rsync --progress {} /dev/null \; Quote Link to comment
Squid Posted October 13, 2015 Share Posted October 13, 2015 Won't hurt. Actually, run it - let it go for a bit then stop it and run the check on the ls -l /dev/null again. I know squat about rsync and am curious if Eric made a typo there that caused the issue. Quote Link to comment
pickthenimp Posted October 13, 2015 Author Share Posted October 13, 2015 Won't hurt. Actually, run it - let it go for a bit then stop it and run the check on the ls -l /dev/null again. I know squat about rsync and am curious if Eric made a typo there that caused the issue. Killed that find after about 30 minutes, re ran ls- l on /dev/null: -rw-rw-rw- 1 root root 984 Oct 12 21:51 /dev/null Something not right with that find? I will remove null again and re-add it like before and check on it again Quote Link to comment
Squid Posted October 13, 2015 Share Posted October 13, 2015 or reset to let it fix itself. And let it sit for awhile before running the rsync (checking before) Quote Link to comment
pickthenimp Posted October 13, 2015 Author Share Posted October 13, 2015 or reset to let it fix itself. And let it sit for awhile before running the rsync (checking before) Sorry by reset do you mean reboot? Quote Link to comment
pickthenimp Posted October 13, 2015 Author Share Posted October 13, 2015 let it set overnight and ls -l /dev/null still looks good...must have been something wonky with that find script. Any other troubleshooting steps you can recommend? Again much appreciation in advance. Quote Link to comment
Squid Posted October 13, 2015 Share Posted October 13, 2015 let it set overnight and ls -l /dev/null still looks good...must have been something wonky with that find script. Any other troubleshooting steps you can recommend? Again much appreciation in advance. pm Eric and get him back involved... Quote Link to comment
eschultz Posted October 13, 2015 Share Posted October 13, 2015 let it set overnight and ls -l /dev/null still looks good...must have been something wonky with that find script. Any other troubleshooting steps you can recommend? Again much appreciation in advance. Sorry about that, the rsync command was changing /dev/null unexpectedly. To be fair, I just checked and my /dev/null was messed up too Here's a alternative script to try that won't mess up /dev/null (you just wont see pretty progress bars): find /mnt/disk5 -type f -print -exec cp {} /dev/null \; Squid: Thanks for stepping in to help him restore his /dev/null device Quote Link to comment
pickthenimp Posted October 13, 2015 Author Share Posted October 13, 2015 let it set overnight and ls -l /dev/null still looks good...must have been something wonky with that find script. Any other troubleshooting steps you can recommend? Again much appreciation in advance. Sorry about that, the rsync command was changing /dev/null unexpectedly. To be fair, I just checked and my /dev/null was messed up too Here's a alternative script to try that won't mess up /dev/null (you just wont see pretty progress bars): find /mnt/disk5 -type f -print -exec cp {} /dev/null \; Squid: Thanks for stepping in to help him restore his /dev/null device Thanks, running now. Quote Link to comment
pickthenimp Posted October 13, 2015 Author Share Posted October 13, 2015 The script finished on disk 5 and didn't get hung up anywhere. Any other ideas? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.