Gordon Shumway Posted August 17, 2017 Share Posted August 17, 2017 I am running unRAID v6.3.5 (permanent license). The PC running unRAID was built about 45 days ago and made it through a 30 day trial license all the way until today with no trouble. A few minutes ago I was using VLC to watch a video (.avi) on my Win2k10 desktop PC that was playing from a mounted unRAID CIFS share when the video started pixelating then hung. VLC crashed trying to read the file. The desktop computer lost access to all unRAID shares. There are about a dozen unRAID shares mounted to the desktop some CIFS, some both CIFS and NFS. The NFS shares are mounted to several different VMware instances of Ubuntu 16.x on a HP DL380 G6 server. At the same time this happened I was gunzipping a large file (300GB+) on an Ubuntu 16.0.4 VM running on the G6. The .gz file was on an unRAID share mounted to the VM (R/W) via NFS. It hung. At the same time, on a different Ubuntu VM on the G6 server, SABNZBD hung while downloading to an unRAID NFS share. Basically all unRAID shares hung to all my machines. There were no dockers running. There are no VM's defined on the unRAID. I was able to ping the unRAID server but I was unable to http or ssh to the server. I unplugged the network cable but there was no effect so I had to power cycle the server. After power cycling I was able to mount shares and everything came back online. While trolling through /var/log/syslog I noticed that it only started at the time of the crash/reboot. There didn't appear to be any previous versions of syslog. I did run "diagnostics" and attached it to this post. Can anyone please advise where I can find older syslogs or some indication that might show what happened here? Thanks! unraid1-diagnostics-20170816-1951.zip Link to comment
itimpi Posted August 17, 2017 Share Posted August 17, 2017 Since unRAID runs purely from RAM in normal operation you do not normally have any historical syslogs. if this persists then install the Fix Common Problems plugin and put it into troubleshooting mode. That will keep writing syslogs to the flash drive to help with trying to pin down the cause. Link to comment
Gordon Shumway Posted August 19, 2017 Author Share Posted August 19, 2017 Partial crash. System stayed up but shares went offline. I have attached the latest diagnostics file (I ran before doing a clean reboot from the gui). Seems to be a lot of chatter about running out of memory/swap. The only traffic the box should be seeing is CIFS/NFS. Any advice or observations would be appreciated. Please advise if any other files are needed. There are quite a few diagnostics files in /boot/logs. Thanks. unraid1-diagnostics-20170819-1538.zip Link to comment
JorgeB Posted August 19, 2017 Share Posted August 19, 2017 Kernel used in v6.3.5 is very prone to OOM errors, even when the server is doing transfers only, try lowering the amount of RAM used for cache, works for me and many other with similar issues: sysctl vm.dirty_ratio=2 sysctl vm.dirty_background_ratio=1 These won't survive a reboot so if it helps you need to make them permanent using your go file or the tips and tweaks plugin. Link to comment
Gordon Shumway Posted August 19, 2017 Author Share Posted August 19, 2017 It seems odd that there is no system swap file to begin with. Can I create one on an SSD drive? Would this "fix" the problem? Link to comment
JorgeB Posted August 19, 2017 Share Posted August 19, 2017 I believe there's a system swap plugin but I never used it since those settings fixed my problems on v.6.3.5, v.6.4rc uses a newer kernel and never had OOM errors using the default settings. Link to comment
Gordon Shumway Posted August 19, 2017 Author Share Posted August 19, 2017 Not to go too far off topic, but how soon before 6.4 is GA? Also, how much of a pain is it to switch to a release candidate, then switch back to GA when it is available, or back to 6.3.5? Link to comment
JorgeB Posted August 19, 2017 Share Posted August 19, 2017 Just now, Gordon Shumway said: Not to go too far off topic, but how soon before 6.4 is GA? Soon™ 1 minute ago, Gordon Shumway said: Also, how much of a pain is it to switch the a release candidate, then switch back to GA when it is available, or back to 6.3.5? Very easy, most of that can now be done using the webGUI but you can also just overwrite 3 or 5 files on the flash drive, depending on the release installed. Link to comment
Gordon Shumway Posted August 19, 2017 Author Share Posted August 19, 2017 I won't hold my breath waiting for the new release, but I did enter the commands you provided. According to the Dashboard Memory is at 7%. I'll keep an eye on it and if it starts climbing I'll consider installing the swap plugin, unless the new version is available first. Thanks again! Link to comment
JorgeB Posted August 19, 2017 Share Posted August 19, 2017 The problem is RAM used for cache, it won't appear on the dashboard, but with those settings it won't use more than 2% of the free RAM Link to comment
Gordon Shumway Posted August 23, 2017 Author Share Posted August 23, 2017 Your recommendation seems to have helped with the RAM issue but this morning it crashed as all CPU's were at 100% and the system started killing processes. I was able to ssh in and run a quick top and saw that load averages were 17,10, and 4 but nothing was running (that I know of). I also noticed some of the top processes running were "docker" and "php", which was odd as I mentioned before I don't have any VM's defined and have 2 dockers defined but neither are running. I have to delete the plex docker (running it on another machine) and I leave Krusader off until I need it. When I logged in I tried to launch diagnostics but the system killed it. I was able to run "poweroff" which took about 5 minutes and after I powered the machine on I ran a new "diagnostics" (attached). Any assistance would be appreciated. Thanks. GS unraid1-diagnostics-20170823-0723.zip Link to comment
Zonediver Posted August 23, 2017 Share Posted August 23, 2017 I have the same problem mentioned here, but noone responded so far. I hope this is related or similar to your problem. It happens since i upgraded my RAM from 12 to 16GB and only when i move files bigger then 2GB. Link to comment
Gordon Shumway Posted August 23, 2017 Author Share Posted August 23, 2017 Just for fun I deleted the two dockers (Krusader and Plex) and ran the CA to clean up the docker remains. Hopefully the new version will be coming out soon or someone can give us a clue what's killing our machines. Sorry to say folks, but I didn't (and still don't) have these problems with my old Synology... Link to comment
SSD Posted August 23, 2017 Share Posted August 23, 2017 50 minutes ago, Gordon Shumway said: Just for fun I deleted the two dockers (Krusader and Plex) and ran the CA to clean up the docker remains. Hopefully the new version will be coming out soon or someone can give us a clue what's killing our machines. Sorry to say folks, but I didn't (and still don't) have these problems with my old Synology... Synology is totally in control of the hardware platform. Yours is not a common problem which probably means either broken hardware or slight incompatibility of some kind. Not fair to compare a very closed platform like Synology with an open platform like unRAID. You get better stability perhaps (although closed systems can have problems too), but pay a lot more and get only a small amount of the function. Wonderful thing - you get to choose I'd take the system down to the basics. No dockers. Just NAS. See if it runs stable for several days. Then introduce one Docker. Same thing. Continue to add function until the problem occurs. An approach like this may provide some useful data. Link to comment
Gordon Shumway Posted August 23, 2017 Author Share Posted August 23, 2017 Just teasing about the Synology vs. unRAID. If Synology was that great I wouldn't be here now... The hardware is brand-new out-of-the-box so I tend to doubt it is failing this early in it's life cycle, but anything is possible. I did remove the two dockers this morning and cleaned them up with the Community App (don't recall them name and not connected to my home at the moment). I'll bounce the box tonight to make sure everything is clean, then keep it under observation. Thanks. Link to comment
JonathanM Posted August 23, 2017 Share Posted August 23, 2017 1 hour ago, Gordon Shumway said: The hardware is brand-new out-of-the-box so I tend to doubt it is failing this early in it's life cycle, but anything is possible. Google "bathtub curve" Link to comment
Gordon Shumway Posted August 30, 2017 Author Share Posted August 30, 2017 I ran the sysctl commands that Johnnie.black recommended and also removed the Krusader and Plex dockers that were not in use and have yet to have a problem. A week ago I had to shutdown EVERYTHING as the local power company was running new lines and had to shutdown my entire block. I have not seen a CPU or memory spike since then and also have not re-entered the "sysctl" commands so I will have to conclude that removing the unused dockers seems to have made the trouble go away. For now. Thanks everyone. Link to comment
Gordon Shumway Posted October 19, 2017 Author Share Posted October 19, 2017 I'm baaaaack... About 15 minutes ago my Windows desktop lost connection to an unraid share. I logged into my unraid server and found that I was "out of memory". I stopped the array and tried to restart it but got errors telling me to run "fix common problems". When I launched that app it complained that my server was out of memory (duh), and that call traces had been found. I have attached the lines of the syslog file from just before the server ran out of memory. I have not yet actually rebooted the machine and will probably hold off until tomorrow morning in case anyone wants to see anything that might disappear after a reboot. GS 2017-10-18_unraid1_syslog.txt Link to comment
JorgeB Posted October 19, 2017 Share Posted October 19, 2017 It's unclear from your post, are you still using the recommended lower RAM cache settings? Link to comment
Gordon Shumway Posted October 19, 2017 Author Share Posted October 19, 2017 For some reason that I cannot remember, I rebooted the box 4 days ago and did not re-enter the settings. I have just rebooted, entered them, then started the array. Was there anything in the log that indicated why I was low on memory to begin with? Link to comment
JorgeB Posted October 19, 2017 Share Posted October 19, 2017 You can use the tips and tweaks plugin to apply the settings at boot time. 21 minutes ago, Gordon Shumway said: Was there anything in the log that indicated why I was low on memory to begin with? Besides the out of memory errors? Link to comment
Gordon Shumway Posted October 19, 2017 Author Share Posted October 19, 2017 Yeah, I'm curious what is causing the memory loss. I understand that shutting down the filesystem is the effect, but what is the cause? Link to comment
JorgeB Posted October 19, 2017 Share Posted October 19, 2017 My understanding is that it's mostly a kernel problem, it should be fixed once you upgrade to v6.4, you can try the latest v6.4-rc, this specific issue should not happen anymore. Link to comment
Gordon Shumway Posted October 19, 2017 Author Share Posted October 19, 2017 Thanks. I'll read up on the how-to tonight. Link to comment
Gordon Shumway Posted October 20, 2017 Author Share Posted October 20, 2017 Is there a guide to downloading and installing the non stable versions? Also, where are the RC's kept... can't find anything in the downloads section. Thanks. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.