SPOautos Posted February 2, 2021 Share Posted February 2, 2021 My syslog keeps getting full and locking up the server. It usually is at like 1% then all the sudden out of the blue it fills up. I think it might coincide with a parity check running. I have a 12TB parity drive and it seems like every time this has happened it I noticed it had just finished a parity drive within a day. I setup a "Syslog server" after the last time so I wouldn't loose the file (normally the server freezes and when I reboot the syslog is erased) Well I'm trying to upload the file but it is 13.5GB and when I try to attach it here I keep getting a error -200 and it wont let me attach it....guessing because of the size. Any ideas on what to do??? Thanks!!! Quote Link to comment
ChatNoir Posted February 2, 2021 Share Posted February 2, 2021 (edited) If your log is getting full, there is most certainly some stuff spamming the log. Either errors or mover logging. You should identify it and if it is errors, attach an extract of the log with that. The larger the better, ideally when the error starts. Did you try to zip the syslog ? If the same error pops again and again, you might have a pretty good compression ratio and be able to attach the file ? Edited February 2, 2021 by ChatNoir Quote Link to comment
SPOautos Posted February 6, 2021 Author Share Posted February 6, 2021 (edited) On 2/2/2021 at 12:39 AM, ChatNoir said: If your log is getting full, there is most certainly some stuff spamming the log. Either errors or mover logging. You should identify it and if it is errors, attach an extract of the log with that. The larger the better, ideally when the error starts. Did you try to zip the syslog ? If the same error pops again and again, you might have a pretty good compression ratio and be able to attach the file ? I'm sorry for the delay but I've not been able to get back to the forum until now. So I zipped the file and it is 112MB but it still wont upload. It gives me a error -200 which I guess it for being too big but I don't really know. Without compression the file is 13.4 GB so from there down to 112mb seems pretty good. Any other ideas? Should I try to use some specific zip tool that may make it smaller? With the file being 13.4 GB I cant even open it in Notepad because it says the file is too large for Notepad. Is there some other app that I can use to open a log file that will take large files? Also, I do have my mover scheduled to be often....hourly....however my log file is 1% all the time, then in one day, right after a parity check it becomes maxed out at 100%. I have my parity check running monthly and this seems to always happen right after the parity check is finished. Edited February 6, 2021 by SPOautos Quote Link to comment
John_M Posted February 7, 2021 Share Posted February 7, 2021 If your syslog is growing out of control but your server hasn't yet crashed you could type the following at the command line to display the last, say 20, lines on the screen: tail -n 20 /var/log/syslog Or, to extract the last 500 lines to a separate file on your flash device: tail -n 500 /var/log/syslog > /boot/syslog-tail Quote Link to comment
SPOautos Posted February 8, 2021 Author Share Posted February 8, 2021 On 2/6/2021 at 9:29 PM, John_M said: If your syslog is growing out of control but your server hasn't yet crashed you could type the following at the command line to display the last, say 20, lines on the screen: tail -n 20 /var/log/syslog Or, to extract the last 500 lines to a separate file on your flash device: tail -n 500 /var/log/syslog > /boot/syslog-tail Thank you, I'll probably do that. It typically isnt growing out of control but for some reason about once a month, all the sudden, it fills up very fast all at once and the server crashes. It must be happening in a matter of hours and I've never been able to catch it in process, I just see the server crash. I also have parity check run once a month and it always seems to happen in conjunction with that. Quote Link to comment
trurl Posted February 8, 2021 Share Posted February 8, 2021 You should post diagnostics before this happens just to give us a baseline of how your system is configured with it working normally. 1 Quote Link to comment
SPOautos Posted February 16, 2021 Author Share Posted February 16, 2021 On 2/8/2021 at 9:19 AM, trurl said: You should post diagnostics before this happens just to give us a baseline of how your system is configured with it working normally. Here is what it looks like right now with the server running good. I updated my containers today and everything is running great. The log is showing 1% on the Dashboard, which is what it normally shows except when it goes nuts every so often and maxes out. Do you see anything in here that looks unusual? tower-diagnostics-20210215-2145.zip Quote Link to comment
ChatNoir Posted February 16, 2021 Share Posted February 16, 2021 You have errors in your log : Feb 9 07:46:56 Tower kernel: pcieport 0000:00:03.0: AER: Multiple Corrected error received: 0000:00:03.0 Feb 9 07:46:56 Tower kernel: pcieport 0000:00:03.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID) Feb 9 07:46:56 Tower kernel: pcieport 0000:00:03.0: device [8086:2f08] error status/mask=00000080/00002000 Feb 9 07:46:56 Tower kernel: pcieport 0000:00:03.0: [ 7] BadDLLP but the frequency does not look like it should fill the log. I also see this type of errors Feb 4 19:13:45 Tower kernel: igb 0000:0c:00.0 eth1: igb: eth1 NIC Link is Down Feb 4 19:13:45 Tower kernel: e1000e: eth0 NIC Link is Down Feb 4 19:13:45 Tower kernel: bond0: link status definitely down for interface eth0, disabling it Feb 4 19:13:45 Tower kernel: bond0: link status definitely down for interface eth1, disabling it Feb 4 19:13:45 Tower kernel: bond0: now running without any active interface! Feb 4 19:13:45 Tower kernel: br0: port 1(bond0) entered disabled state Feb 4 19:13:46 Tower dhcpcd[2809]: br0: carrier lost Feb 4 19:13:46 Tower dhcpcd[2809]: br0: deleting route to 192.168.1.0/24 Feb 4 19:13:46 Tower dhcpcd[2809]: br0: deleting default route via 192.168.1.1 Feb 4 19:13:46 Tower rsyslogd: omfwd/udp: socket 6: sendto() error: Network is unreachable [v8.1908.0 try https://www.rsyslog.com/e/2354 ] Feb 4 19:13:46 Tower rsyslogd: omfwd: socket 6: error 101 sending via udp: Network is unreachable [v8.1908.0 try https://www.rsyslog.com/e/2354 ] Feb 4 19:13:46 Tower rsyslogd: action 'action-2-builtin:omfwd' suspended (module 'builtin:omfwd'), retry 0. There should be messages before this one giving the reason for suspension. [v8.1908.0 try https://www.rsyslog.com/e/2007 ] Feb 4 19:13:46 Tower rsyslogd: action 'action-2-builtin:omfwd' resumed (module 'builtin:omfwd') [v8.1908.0 try https://www.rsyslog.com/e/2359 ] Feb 4 19:13:46 Tower rsyslogd: omfwd/udp: socket 2: sendto() error: Network is unreachable [v8.1908.0 try https://www.rsyslog.com/e/2354 ] Feb 4 19:13:46 Tower rsyslogd: omfwd: socket 2: error 101 sending via udp: Network is unreachable [v8.1908.0 try https://www.rsyslog.com/e/2354 ] Feb 4 19:13:46 Tower rsyslogd: action 'action-2-builtin:omfwd' suspended (module 'builtin:omfwd'), retry 0. There should be messages before this one giving the reason for suspension. [v8.1908.0 try https://www.rsyslog.com/e/2007 ] Feb 4 19:13:46 Tower rsyslogd: action 'action-2-builtin:omfwd' resumed (module 'builtin:omfwd') [v8.1908.0 try https://www.rsyslog.com/e/2359 ] Feb 4 19:13:46 Tower rsyslogd: omfwd/udp: socket 2: sendto() error: Network is unreachable [v8.1908.0 try https://www.rsyslog.com/e/2354 ] Feb 4 19:13:46 Tower rsyslogd: omfwd: socket 2: error 101 sending via udp: Network is unreachable [v8.1908.0 try https://www.rsyslog.com/e/2354 ] Feb 4 19:13:46 Tower rsyslogd: action 'action-2-builtin:omfwd' suspended (module 'builtin:omfwd'), retry 0. There should be messages before this one giving the reason for suspension. [v8.1908.0 try https://www.rsyslog.com/e/2007 ] I guess that if the network stayed down long enough it could spam the log to death ? You should look at those issues. Quote Link to comment
SPOautos Posted February 16, 2021 Author Share Posted February 16, 2021 (edited) 6 hours ago, ChatNoir said: You have errors in your log : Feb 9 07:46:56 Tower kernel: pcieport 0000:00:03.0: AER: Multiple Corrected error received: 0000:00:03.0 Feb 9 07:46:56 Tower kernel: pcieport 0000:00:03.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID) Feb 9 07:46:56 Tower kernel: pcieport 0000:00:03.0: device [8086:2f08] error status/mask=00000080/00002000 Feb 9 07:46:56 Tower kernel: pcieport 0000:00:03.0: [ 7] BadDLLP but the frequency does not look like it should fill the log. I also see this type of errors Feb 4 19:13:45 Tower kernel: igb 0000:0c:00.0 eth1: igb: eth1 NIC Link is Down Feb 4 19:13:45 Tower kernel: e1000e: eth0 NIC Link is Down Feb 4 19:13:45 Tower kernel: bond0: link status definitely down for interface eth0, disabling it Feb 4 19:13:45 Tower kernel: bond0: link status definitely down for interface eth1, disabling it Feb 4 19:13:45 Tower kernel: bond0: now running without any active interface! Feb 4 19:13:45 Tower kernel: br0: port 1(bond0) entered disabled state Feb 4 19:13:46 Tower dhcpcd[2809]: br0: carrier lost Feb 4 19:13:46 Tower dhcpcd[2809]: br0: deleting route to 192.168.1.0/24 Feb 4 19:13:46 Tower dhcpcd[2809]: br0: deleting default route via 192.168.1.1 Feb 4 19:13:46 Tower rsyslogd: omfwd/udp: socket 6: sendto() error: Network is unreachable [v8.1908.0 try https://www.rsyslog.com/e/2354 ] Feb 4 19:13:46 Tower rsyslogd: omfwd: socket 6: error 101 sending via udp: Network is unreachable [v8.1908.0 try https://www.rsyslog.com/e/2354 ] Feb 4 19:13:46 Tower rsyslogd: action 'action-2-builtin:omfwd' suspended (module 'builtin:omfwd'), retry 0. There should be messages before this one giving the reason for suspension. [v8.1908.0 try https://www.rsyslog.com/e/2007 ] Feb 4 19:13:46 Tower rsyslogd: action 'action-2-builtin:omfwd' resumed (module 'builtin:omfwd') [v8.1908.0 try https://www.rsyslog.com/e/2359 ] Feb 4 19:13:46 Tower rsyslogd: omfwd/udp: socket 2: sendto() error: Network is unreachable [v8.1908.0 try https://www.rsyslog.com/e/2354 ] Feb 4 19:13:46 Tower rsyslogd: omfwd: socket 2: error 101 sending via udp: Network is unreachable [v8.1908.0 try https://www.rsyslog.com/e/2354 ] Feb 4 19:13:46 Tower rsyslogd: action 'action-2-builtin:omfwd' suspended (module 'builtin:omfwd'), retry 0. There should be messages before this one giving the reason for suspension. [v8.1908.0 try https://www.rsyslog.com/e/2007 ] Feb 4 19:13:46 Tower rsyslogd: action 'action-2-builtin:omfwd' resumed (module 'builtin:omfwd') [v8.1908.0 try https://www.rsyslog.com/e/2359 ] Feb 4 19:13:46 Tower rsyslogd: omfwd/udp: socket 2: sendto() error: Network is unreachable [v8.1908.0 try https://www.rsyslog.com/e/2354 ] Feb 4 19:13:46 Tower rsyslogd: omfwd: socket 2: error 101 sending via udp: Network is unreachable [v8.1908.0 try https://www.rsyslog.com/e/2354 ] Feb 4 19:13:46 Tower rsyslogd: action 'action-2-builtin:omfwd' suspended (module 'builtin:omfwd'), retry 0. There should be messages before this one giving the reason for suspension. [v8.1908.0 try https://www.rsyslog.com/e/2007 ] I guess that if the network stayed down long enough it could spam the log to death ? You should look at those issues. Regarding the errors on Feb 4th, was it just on Feb 4th or is it a ongoing error? We were having some internet issues some time not too long ago and I rebooted the router. If Feb 4th was a isolated incident, it could be from that, right? I dont have any idea regarding the pcieport error on Feb 9th. I only have 1 pcie device which is a video card. Does this mean something odd is going on with the video card? I'm sorry, I'm not much of a computer person so Im not positive what these are telling me. Edited February 16, 2021 by SPOautos Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.