jcreynoldsii Posted April 27, 2020 Share Posted April 27, 2020 (edited) I have been experiencing server hard crashes for quite a while now. I have gone through with various suggestions, increase ram, ramtests, writing log files to share etc. Nothing had really helped pinpoint the cause of the issue. I was originally thinking it was Plex transcoding related, not entirely sure if that is the problem now or not. I had another hard crash over the weekend. Attached is the log. Any help is appreciated. The server was unreachable at around the 2PM time frame on the 24th. Last entry was: Apr 24 14:00:13 HomeServer kernel: mdcmd (97): spindown 5 Thanks in advance, Edit: added diagnostic dump. Jay syslog-192.168.1.2.log homeserver-diagnostics-20200427-1109.zip Edited April 27, 2020 by jcreynoldsii Quote Link to comment
jcreynoldsii Posted May 22, 2020 Author Share Posted May 22, 2020 I experienced a couple more and each time the log doesn't show anything. Attached is the one from today. Came home to a non responsive server. I got to find a solution or at least a way to shutdown gracefully, the quick press of the power button does not work. I thought I'd be slick and pull the power on my UPS to initiate a controlled shutdown (didn't work). My system is hooked up to a TV via hdmi after a period of time the screen is black and will not wake up, which is weird why wouldn't the console stay up? This is annoying as hell. Any help is greatly appreciated. I found these 4 lines interesting on the reboot of the server: /var/tmp/go: line 4: /boot/unmenu/uu: Permission denied sh: ./apcupsd-3.14.10-x86_64-1_rlw.txz.auto_install: Permission denied sh: ./powerdown-2.06-noarch-unRAID.tgz.auto_install: Permission denied sh: ./screen-4.0.3-x86_64-4.txz.auto_install: Permission denied Each of these items directly relates to a few problems I spoke of above. Jay syslog-192.168.1.2.log Quote Link to comment
trurl Posted May 22, 2020 Share Posted May 22, 2020 Why have you allocated 50G to docker image? Have you had problems filling it? 20G should be more than enough and when I see someone with a docker image larger I suspect they have something misconfigured with their dockers. Quote Link to comment
jcreynoldsii Posted May 22, 2020 Author Share Posted May 22, 2020 (edited) 2 hours ago, trurl said: Why have you allocated 50G to docker image? Have you had problems filling it? 20G should be more than enough and when I see someone with a docker image larger I suspect they have something misconfigured with their dockers. I use quite a few dockers and in the past I hit the default max docker image file size, so when I needed to increase it I set it so that I would have to keep doing it over and over again. Edited May 22, 2020 by jcreynoldsii Quote Link to comment
itimpi Posted May 22, 2020 Share Posted May 22, 2020 2 hours ago, jcreynoldsii said: I experienced a couple more and each time the log doesn't show anything. Attached is the one from today. Came home to a non responsive server. I got to find a solution or at least a way to shutdown gracefully, the quick press of the power button does not work. I thought I'd be slick and pull the power on my UPS to initiate a controlled shutdown (didn't work). My system is hooked up to a TV via hdmi after a period of time the screen is black and will not wake up, which is weird why wouldn't the console stay up? This is annoying as hell. Any help is greatly appreciated. I found these 4 lines interesting on the reboot of the server: /var/tmp/go: line 4: /boot/unmenu/uu: Permission denied sh: ./apcupsd-3.14.10-x86_64-1_rlw.txz.auto_install: Permission denied sh: ./powerdown-2.06-noarch-unRAID.tgz.auto_install: Permission denied sh: ./screen-4.0.3-x86_64-4.txz.auto_install: Permission denied Each of these items directly relates to a few problems I spoke of above. Jay syslog-192.168.1.2.log 535.38 kB · 1 download Why do you have any reference to unmenu in your config/go file on the flash drive? Unmenu was a v5 feature that is not compatible with the current v6. Any reference to it should be removed as trying to use it will just cause problems. Quote Link to comment
jcreynoldsii Posted May 22, 2020 Author Share Posted May 22, 2020 Good question, nothing i ever did. I used v5 before v6, perhaps stuff that didnt cleaned up? Quote Link to comment
itimpi Posted May 22, 2020 Share Posted May 22, 2020 4 minutes ago, jcreynoldsii said: Good question, nothing i ever did. I used v5 before v6, perhaps stuff that didnt cleaned up? You might want to check your ‘go’ file in case there are any other references to incompatible/obsolete features? also do you have an ‘extras’ folder on the flash drive. If so that should probably be removed as it may be installing incompatible packages. In v6 any extra packages are normally installed via the Nerdpack or DevPack plugins. Quote Link to comment
jcreynoldsii Posted May 22, 2020 Author Share Posted May 22, 2020 (edited) As far as my docker file, i am currently using ~23gig. 23 minutes ago, itimpi said: You might want to check your ‘go’ file in case there are any other references to incompatible/obsolete features? also do you have an ‘extras’ folder on the flash drive. If so that should probably be removed as it may be installing incompatible packages. In v6 any extra packages are normally installed via the Nerdpack or DevPack plugins. I dont have an extras folder on the flash drive. I just got the nerdpack tonight and I have yet to install anything from it. Content of Go File: #!/bin/bash # Start the Management Utility /usr/local/sbin/emhttp & /boot/unmenu/uu # resize log partition mount -o remount,size=384m /var/log cd /boot/packages && find . -name '*.auto_install' -type f -print | sort | xargs -n1 sh -c Those packages are inside of the /boot/packages dir. Edited May 22, 2020 by jcreynoldsii Quote Link to comment
itimpi Posted May 22, 2020 Share Posted May 22, 2020 That /boot/packages line is not part of a standard go file. It is probably what causes the other error messages you mentioned as tightening of Unraid security a few releases ago means that files on the flash dtive can no longer have ‘execute’ permission set on them. Quote Link to comment
jcreynoldsii Posted May 22, 2020 Author Share Posted May 22, 2020 So should delete that line out of there? Quote Link to comment
jcreynoldsii Posted May 22, 2020 Author Share Posted May 22, 2020 Checked my log and there are several peculiar entries from dockers: May 21 19:36:21 HomeServer kernel: docker0: port 9(vethfafdac9) entered blocking state May 21 19:36:21 HomeServer kernel: docker0: port 9(vethfafdac9) entered forwarding state May 21 19:36:21 HomeServer kernel: docker0: port 9(vethfafdac9) entered disabled state May 21 19:36:21 HomeServer kernel: eth0: renamed from veth7d7cd71 May 21 19:36:21 HomeServer kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethfafdac9: link becomes ready May 21 19:36:21 HomeServer kernel: docker0: port 9(vethfafdac9) entered blocking state May 21 19:36:21 HomeServer kernel: docker0: port 9(vethfafdac9) entered forwarding state May 21 19:38:07 HomeServer kernel: vethafb7e9d: renamed from eth0 May 21 19:38:07 HomeServer kernel: docker0: port 10(veth037a102) entered disabled state May 21 19:38:07 HomeServer kernel: docker0: port 10(veth037a102) entered disabled state May 21 19:38:07 HomeServer kernel: device veth037a102 left promiscuous mode May 21 19:38:07 HomeServer kernel: docker0: port 10(veth037a102) entered disabled state May 21 19:38:09 HomeServer kernel: docker0: port 10(vethbe1888a) entered blocking state May 21 19:38:09 HomeServer kernel: docker0: port 10(vethbe1888a) entered disabled state May 21 19:38:09 HomeServer kernel: device vethbe1888a entered promiscuous mode May 21 19:38:09 HomeServer kernel: IPv6: ADDRCONF(NETDEV_UP): vethbe1888a: link is not ready May 21 19:38:09 HomeServer kernel: docker0: port 10(vethbe1888a) entered blocking state May 21 19:38:09 HomeServer kernel: docker0: port 10(vethbe1888a) entered forwarding state May 21 19:38:09 HomeServer kernel: docker0: port 10(vethbe1888a) entered disabled state May 21 19:38:10 HomeServer kernel: eth0: renamed from veth0d0dee2 May 21 19:38:10 HomeServer kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethbe1888a: link becomes ready May 21 19:38:10 HomeServer kernel: docker0: port 10(vethbe1888a) entered blocking state May 21 19:38:10 HomeServer kernel: docker0: port 10(vethbe1888a) entered forwarding state May 21 19:38:20 HomeServer kernel: vethadbaa22: renamed from eth0 May 21 19:38:20 HomeServer kernel: docker0: port 11(vethd09af7a) entered disabled state May 21 19:38:20 HomeServer kernel: docker0: port 11(vethd09af7a) entered disabled state May 21 19:38:20 HomeServer kernel: device vethd09af7a left promiscuous mode May 21 19:38:20 HomeServer kernel: docker0: port 11(vethd09af7a) entered disabled state May 21 19:38:22 HomeServer kernel: docker0: port 11(veth0dcbe21) entered blocking state May 21 19:38:22 HomeServer kernel: docker0: port 11(veth0dcbe21) entered disabled state May 21 19:38:22 HomeServer kernel: device veth0dcbe21 entered promiscuous mode May 21 19:38:22 HomeServer kernel: IPv6: ADDRCONF(NETDEV_UP): veth0dcbe21: link is not ready May 21 19:38:22 HomeServer kernel: eth0: renamed from veth9b5b4d4 May 21 19:38:22 HomeServer kernel: IPv6: ADDRCONF(NETDEV_CHANGE): veth0dcbe21: link becomes ready May 21 19:38:22 HomeServer kernel: docker0: port 11(veth0dcbe21) entered blocking state May 21 19:38:22 HomeServer kernel: docker0: port 11(veth0dcbe21) entered forwarding state How do I correlate the port to a specific docker? Quote Link to comment
trurl Posted May 22, 2020 Share Posted May 22, 2020 7 hours ago, itimpi said: That /boot/packages line is not part of a standard go file. Also that /boot/unmenu/uu line should be removed. Quote Link to comment
jcreynoldsii Posted May 22, 2020 Author Share Posted May 22, 2020 3 hours ago, trurl said: Also that /boot/unmenu/uu line should be removed. This look better? #!/bin/bash # Start the Management Utility /usr/local/sbin/emhttp & # resize log partition mount -o remount,size=384m /var/log How can I troubleshoot the crashes? The log has nothing tangible in there. I would like to be able to gracefully shutdown unraid, however the console keyboard doesnt work, web gui and web interfaces gone, the quick press of the power button doesn't work and neither does pulling UPS from mains power. It seems like unraid gets hung 100%. Quote Link to comment
jcreynoldsii Posted May 23, 2020 Author Share Posted May 23, 2020 Woke up to another hard crash this morning. Had to do a unclean reboot at 8:38 AM. syslog-192.168.1.2.log homeserver-diagnostics-20200523-0840.zip Quote Link to comment
jcreynoldsii Posted May 24, 2020 Author Share Posted May 24, 2020 (edited) Yet another one this morning. Last message before rebooting it this morning was: May 24 02:30:07 HomeServer crond[1722]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null Edit: Mover is setup to go off at 2:30 am, however Mover logging was not enabled, I have since enabled it. I shutdown all non-essential dockers this morning so this system can finish a parity check, it has crashed two days in a row while performing the parity check due to the unclean hard reboots for the day prior. syslog-192.168.1.2.log homeserver-diagnostics-20200524-1048.zip Edited May 24, 2020 by jcreynoldsii Quote Link to comment
trurl Posted May 24, 2020 Share Posted May 24, 2020 Not related to crash, but something to be aware of so you can clean it up. Your syslog is being spammed with these: May 24 10:47:22 HomeServer root: error: /plugins/unassigned.devices/UnassignedDevices.php: wrong csrf_token May 24 10:47:24 HomeServer root: error: /plugins/unassigned.devices/UnassignedDevices.php: wrong csrf_token Here is the FAQ on that: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=545988 Also probably unrelated to your crashes, but your docker/VM related shares are not configured ideally, and your docker image is much larger than should be needed. 20G is my usual recommendation and should be more than enough unless you have something misconfigured. Have you had problems filling docker image? On 4/27/2020 at 12:08 PM, jcreynoldsii said: I was originally thinking it was Plex transcoding related How is that configured? Misconfigured dockers can fill docker image, or they can fill RAM. Any mapped host path that is not an actual disk or user share is in RAM just like the rest of the OS, and anything that writes to the container path of that mapping is writing to RAM. Any application path that does not correspond exactly to a container path, including upper/lower case and including the leading '/', isn't a mapped path, so if the application writes to that path it is writing into the docker image. Quote Link to comment
jcreynoldsii Posted May 24, 2020 Author Share Posted May 24, 2020 (edited) My docker is image size is set to 50GB, I am using about 23GB of that. At one point in the past I ran out of docker image size so I increased it. All of my dockers reside on the cache/appdata directory and I believe that all of my mapped directories reside on shares. Plex docker configuration: All of my dockers reside on the cache/appdata directory. I am wondering if the mover app isn't what got me, considering the last two days I woke up to a hard crashed system. Edited May 24, 2020 by jcreynoldsii Quote Link to comment
trurl Posted May 24, 2020 Share Posted May 24, 2020 2 hours ago, jcreynoldsii said: My docker is image size is set to 50GB, I am using about 23GB of that. At one point in the past I ran out of docker image size so I increased it. To me this suggests a problem with one or more path settings within one or more applications you have running as a docker container. I have 15 dockers installed and they are only using 44% of 20G, and they are stable at that amount of usage. That plex screenshot isn't nearly as useful as the docker run command that results from it. See this very first link in the Docker FAQ for instructions on how to get that docker run command and post it: What other dockers do you run? Quote Link to comment
jcreynoldsii Posted May 24, 2020 Author Share Posted May 24, 2020 I have a total of 22 containers installed, and out of those 22 I run 18 full time. Attached is a list of those containers along with their run commands. unRaid Dockers - Sheet1.pdf Quote Link to comment
trurl Posted May 24, 2020 Share Posted May 24, 2020 I only use a few of those, and I'm not likely to do the research required to figure out how to use the others and what if anything might be misconfigured in them. Maybe someone else will contribute. Plex is one I do use, and it doesn't write as much as for example some downloading application. Other than its library (appdata) the only things it is likely to write are DVR and transcodes. Do you use plex DVR? You don't have a mapping for transcodes. What does the application itself use for the transcode directory? If you go to Settings - Docker and disable docker altogether, do you still get crashes? Quote Link to comment
jcreynoldsii Posted May 24, 2020 Author Share Posted May 24, 2020 (edited) No I don't use Plex DVR and I don't have anything mapped for trans-coding. Should I make a directory on a share drive for it? Or should I transcode to RAM? I'd have to give it a shot on shutting down docker, I planned on doing that tonight before turning in for the night. I don't want to run with it off during day as I use PiHole for DNS and without a secondary PiHole running it pretty much will shutdown the internet in the house. The reason I initially suspected Plex to be my culprit is because I would experience a lot of hard crashes while casting Plex to the chromecasts in the house. As of lately that hasn't been the case. The last two nights have had hard crashes for no apparent reason. Edited May 24, 2020 by jcreynoldsii Quote Link to comment
trurl Posted May 25, 2020 Share Posted May 25, 2020 5 hours ago, jcreynoldsii said: transcode to RAM? That's what I do. Here is a link in that thread to a post that is a good starting point to getting this setup like I have it. Just read from there to the end of the page and all the things you need to do is explained: Quote Link to comment
jcreynoldsii Posted May 25, 2020 Author Share Posted May 25, 2020 (edited) Used User Scripts and made the following script: #!/bin/bash mkdir /tmp/PlexRam mount -t tmpfs -o size=4g tmpfs /tmp/PlexRam Updated the docker command and the transcode setting in Plex Edited May 25, 2020 by jcreynoldsii Quote Link to comment
jcreynoldsii Posted May 25, 2020 Author Share Posted May 25, 2020 No crashes over night, I did however changed the mover schedule so that it would not run. I did this simply because I wanted the parity check to finish as it had already found and corrected errors. Presumably because it hard crashed yesterday during the parity check from the previous hard crash. Date Duration Speed Status Errors 2020-05-25, 08:25:18 21 hr, 39 min, 34 sec 102.6 MB/s OK 129 I ran only the bare minimum of dockers last night, PiHole, Unifi-Controller and Plex. I will slowly introduce dockers over the next few days to see if these crashes are docker related. In the event that doesn't work I think I may change frequency of mover back to every night. Also I have a drive reporting the following, it seems to be going in and out of good/bad, any insight on this? 199 UDMA CRC error count 0x000a 200 200 000 Old age Always Never 1 It does however pass the overall smart health check. Attached is the smart report. Hitachi_HDS723030ALA640_MK0311YHK1SMBA-20200525-1019.txt Quote Link to comment
trurl Posted May 25, 2020 Share Posted May 25, 2020 CRC errors are typically connection issues not drive problems. You can click on that SMART warning on the Dashboard and acknowledge it and it won't warn you again unless it increases. The small number of parity errors is typical of unclean shutdowns. But the only acceptable number of parity errors is exactly zero, so they must be corrected. If there are still parity errors after a correcting parity check then you still have problems to diagnose, so after parity errors get corrected we usually recommend following with a non-correcting parity check to verify that parity has zero errors now. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.