Boldly_Goes Posted May 15, 2020 Share Posted May 15, 2020 I really like the new Netdata cloud feature, I sign in, claimed my node and ran the script they provide for dockers. I get an automatic update and now that claimed node is unreachable and I have to go through the process, over, and over, and over. How can I configure Netdata so it's name persists so I can get updates but also keep it sync'd with their cloud? Quote Link to comment
melmurp Posted May 18, 2020 Share Posted May 18, 2020 Started getting fork errors and tons of odd behavior... turned out something was creating a massive amount of processes. That something is netdata I've ran this for months and no issue but since the switch it seems things aren't working the same Note that I'm using the default config and haven't touched any settings. 201 is netdata and I waited 5s between commands... if I leave this going it'll just keep creating processes until my machine starts to throw errors after a few days. ps --no-headers auxwwwm | cut -f1 -d' ' | sort | uniq -c | sort -n 2 100 2 daemon 2 message+ 3 ntp 4 102 4 avahi 4 rpc 5 103 27 472 33 101 168 nobody 194 sshd 260 201 1677 root ps --no-headers auxwwwm | cut -f1 -d' ' | sort | uniq -c | sort -n 2 100 2 daemon 2 message+ 3 ntp 4 102 4 avahi 4 rpc 5 103 27 472 33 101 168 nobody 194 sshd 352 201 1672 root I check what the processes are and I see hundreds of these 201 12273 0.0 0.0 0 0 ? ZNs 21:08 0:00 [timeout] <defunct> 201 12470 0.0 0.0 0 0 ? ZNs 21:08 0:00 [timeout] <defunct> 201 12711 0.0 0.0 0 0 ? ZNs 21:08 0:00 [timeout] <defunct> 201 12895 0.0 0.0 0 0 ? ZNs 21:08 0:00 [timeout] <defunct> 201 13054 0.0 0.0 0 0 ? ZNs 21:08 0:00 [timeout] <defunct> 201 13235 0.0 0.0 0 0 ? ZNs 21:08 0:00 [timeout] <defunct> 201 13415 0.0 0.0 0 0 ? ZNs 21:08 0:00 [timeout] <defunct> I'm not sure where netdata logs are so I don't know what it's trying to do that keeps spinning... any thoughts? Quote Link to comment
muslimsteel Posted May 20, 2020 Share Posted May 20, 2020 I have come across the same issue as above. I originally posted the issue in the Dynamix forum because of the errors that I saw, they looked at my diagnostics and saw the process issue that you are seeing. This is what I originally posted here: Quote Hello, I hope this is the right place to post this. I have searched and have been unable to find a solution. In the last few days I added a second cache drive, identical to the existing one. I added this to create a cache pool. Since then I noticed occasional weird messages in my email that don't seem to make sense: Subject is: cron for user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null and the body consists of: /bin/sh: fork: retry: Resource temporarily unavailable Typically I get several in a row and then they stop for 12-24 hours. If I leave them it seems to only get worse leading to the server being unresponsive twice now in the last few days. I was able to reboot it from the GUI once, but the second time I had to do a hard boot. I tried uninstalling and reinstalling the SSD Trim plugin but did not seem to make a difference. It came back up without issue and the errors seemed to be cleared, but then about 24 hours later they started happening again. Everything seems to be working ok otherwise, I am not sure what is causing this. One thought I had is that one of the cache drives is on an HBA and the other is connected directly to the motherboard, not sure if that would make a difference. I have attached the diagnostic. Let me know what you guys think, the server has been running great otherwise and I have really been enjoying UNRAID. Thanks for the support! And then one of the guys there replied: Quote Hmm. your diagnostics show that you have a netdata container that is not properly reaping the finished processes 201 15538 0.2 0.0 33416 22352 ? SNl 03:01 2:23 | | \_ /usr/bin/python /usr/libexec/netdata/plugins.d/python.d.plugin 1 201 300 0.0 0.0 0 0 ? ZNs 05:34 0:00 | | \_ [timeout] <defunct> 201 301 0.0 0.0 0 0 ? ZNs 06:28 0:00 | | \_ [timeout] <defunct> 201 302 0.0 0.0 0 0 ? ZNs 04:32 0:00 | | \_ [timeout] <defunct> ... snip ... 201 32766 0.0 0.0 0 0 ? ZNs 09:32 0:00 | | \_ [timeout] <defunct> 201 32767 0.0 0.0 0 0 ? ZNs 08:54 0:00 | | \_ [timeout] <defunct> So your server is running out of process ids to run new processes. You should check with the support thread for the netdata container you are running I have attached my diagnostics if you want to take a look. Going to turn off the netdata container for now and see if I am seeing any more of these issues. Thanks in advance for the support! hulk-diagnostics-20200519-2143.zip Quote Link to comment
primeval_god Posted May 20, 2020 Share Posted May 20, 2020 @melmurp @muslimsteel You might want to consider raising an issue over at https://github.com/netdata/netdata/ where the developers of Netdata and the Docker container reside. Quote Link to comment
muslimsteel Posted May 20, 2020 Share Posted May 20, 2020 @primeval_god Thanks, looks like they might have an issue open for this same thing on GitHub: https://github.com/netdata/netdata/issues/9084 Quote Link to comment
melmurp Posted May 20, 2020 Share Posted May 20, 2020 9 hours ago, primeval_god said: @melmurp @muslimsteel You might want to consider raising an issue over at https://github.com/netdata/netdata/ where the developers of Netdata and the Docker container reside. 9 hours ago, muslimsteel said: @primeval_god Thanks, looks like they might have an issue open for this same thing on GitHub: https://github.com/netdata/netdata/issues/9084 Thanks guys, seems they fixed it so just need to wait for the next release https://github.com/netdata/netdata/pull/9107 Quote Link to comment
TexasDave Posted May 23, 2020 Share Posted May 23, 2020 Trying to get notifications to work....Following instructions here: https://hub.docker.com/r/titpetric/netdata https://learn.netdata.cloud/docs/agent/step-by-step/step-05 I have added my target email using "./edit-config health_alarm_notify.conf" for (1). And added the following parameters (with my emails) for (2): -e [email protected] -e SMTP_USER=user -e SMTP_PASS=password And I have generated an app password for the sending gmail account and am using that above. I get: # SENDING TEST CLEAR ALARM TO ROLE: sysadmin 2020-05-23 11:24:44: alarm-notify.sh: WARNING: Cannot find file '/etc/netdata/health_alarm_notify.conf'. sendmail: can't connect to remote host (127.0.0.1): Connection refused I am sure I am doing something silly - Thanks! Quote Link to comment
OdinEidolon Posted May 24, 2020 Share Posted May 24, 2020 Hi and thanks for this docker! Would you be able to suggest how to make Netdata automatically recognise HDDtemp's data coming from Atribe's HDDtemp docker image? (support here: ) Quote Link to comment
BeerNut Posted June 2, 2020 Share Posted June 2, 2020 Does anyone have UPS communication working on this new version? Quote Link to comment
Zack Posted June 3, 2020 Share Posted June 3, 2020 On 6/2/2020 at 9:19 PM, BeerNut said: Does anyone have UPS communication working on this new version? According to the docs Netdata needs certain UPS tools installed. Can you check if something like `upsc -l` works on the system? If it does but still no UPS charts are created, please submit a bug about it on github with the relevant details. Quote Link to comment
BeerNut Posted June 4, 2020 Share Posted June 4, 2020 I'm just having trouble with the configuration. I'll mess with it some more when I have time to see if I can get it working. Quote Link to comment
Zack Posted June 5, 2020 Share Posted June 5, 2020 Hope you all are finding Netdata easy to use, despite some of the changes to the official docker image. If you are running into problems, you will get fastest response times if you submit an issue (or feature request) on our github repo and provide the relevant logs and info. We try to respond as needed but other channels take longer! Feedback and pull requests are also much appreciated So let me know if you have any questions about Netdata. My turn to ask questions will be a few months from now once I've had enough with my dual-boot setup and have some time to tinker with hardware. - Zack, dev advocate with Netdata Quote Link to comment
ONI Assassin Posted July 1, 2020 Share Posted July 1, 2020 On 5/15/2020 at 2:25 PM, Boldly_Goes said: I really like the new Netdata cloud feature, I sign in, claimed my node and ran the script they provide for dockers. I get an automatic update and now that claimed node is unreachable and I have to go through the process, over, and over, and over. How can I configure Netdata so it's name persists so I can get updates but also keep it sync'd with their cloud? Did you manage to resolve this? Quote Link to comment
TexasUnraid Posted August 14, 2020 Share Posted August 14, 2020 I have noticed that netdata is not saving all the logs / graphs. It will save the last few hours but if I keep zooming out it will have large gaps where all the graphs are blank but then possibly days before the graphs will pop back into existence, then disappear again etc. The server has been on the whole time and the docker running non-stop with no issues I am aware of. Any ideas on how to get it to keep all the graphs/logs? Quote Link to comment
TexasUnraid Posted September 12, 2020 Share Posted September 12, 2020 Did more research and it seems that the docker is setup to use ram for the logs and for some reason the dbengine option is disabled? I tried to enable it but it just errors and says it is not supported on this platform? Anyway to enable it? I would like to keep a few days of logs if possible. Quote Link to comment
primeval_god Posted September 12, 2020 Share Posted September 12, 2020 1 hour ago, TexasUnraid said: Did more research and it seems that the docker is setup to use ram for the logs and for some reason the dbengine option is disabled? I tried to enable it but it just errors and says it is not supported on this platform? Anyway to enable it? I would like to keep a few days of logs if possible. It works in the netdata/netdata image. You also have bind mount /var/cache/netdata/dbengine/ to a folder in your appdata to ensure it persists across container upgrades. Quote Link to comment
TexasUnraid Posted September 12, 2020 Share Posted September 12, 2020 1 minute ago, primeval_god said: It works in the netdata/netdata image. You also have bind mount /var/cache/netdata/dbengine/ to a folder in your appdata to ensure it persists across container upgrades. That is the container I am using. This is the error I get after changing the netdata.conf file to enable dbengine: 2020-09-12 16:46:41: netdata FATAL : MAIN : RRD_MEMORY_MODE_DBENGINE is not supported in this platform. # : Invalid argument 2020-09-12 16:46:41: netdata INFO : MAIN : EXIT: netdata prepares to exit with code 1... 2020-09-12 16:46:41: netdata INFO : MAIN : EXIT: cleaning up the database... 2020-09-12 16:46:41: netdata INFO : MAIN : Cleaning up database [0 hosts(s)]... I mapped the dbengine to appdata along with the etc/netdata folder (after coping it to appdata first) to make updating the netdata.conf easier, never have been able to figure out the text editor in alpine. Quote Link to comment
primeval_god Posted September 12, 2020 Share Posted September 12, 2020 Do you have all the dbengine configuration options uncommented? memory mode = dbengine page cache size = 200 dbengine disk space = 2048 Quote Link to comment
TexasUnraid Posted September 12, 2020 Share Posted September 12, 2020 Yes, although in my case the netdata file was empty by default and asked me to wget it. I did that and it was then filled but the default memory mode was save and it did not include the page cache or disk space options in the file. I manually added then though but still got the same error. Quote Link to comment
TexasUnraid Posted September 13, 2020 Share Posted September 13, 2020 Ok, I just wiped the entire netdata docker and did a clean install and now it seems to be using the dbengine by default? I can't explain it, I had not messed with anything on the old container prior to this. I then made the same changes I did before, mapping the config and dbengine folder to appdata and bingo, seems to be working. Will have to wait and see if it holds onto the data for longer this time but database files are showing up in the folder. Although I did notice that under dbengine compression savings ratio, it shows 0? Seems from reading that it should be at least 50% or higher? Any idea why it is not being compressed? according to the calculator without compression it is going to eat a LOT of space. Quote Link to comment
TexasUnraid Posted September 13, 2020 Share Posted September 13, 2020 Ok, after it ran for awhile the dbengine compression level suddenly jumped up to 77%. So looks like it is working as expected! Quote Link to comment
TexasUnraid Posted September 15, 2020 Share Posted September 15, 2020 The DBengine is working good now, saving data as expected. Although I am now getting a new issue, my docker stats stopped showing up? I used to be able to see all the stats for my dockers but now they are simply missing? I have restarted netdata a few times but they are still gone. Any ideas? Quote Link to comment
mmz06 Posted September 25, 2020 Share Posted September 25, 2020 On 6/2/2020 at 11:19 PM, BeerNut said: Does anyone have UPS communication working on this new version? I have the same problem so I dug into it 😉 It seems that "apcupsd" package is missing, and netdata GitHub team seems to be struggling with lots of issues... Waiting for their solution, I built a custom script you can use to add this missing packet to netdata container automatically: #!/bin/bash #description=This script adds APC UPS support to Netdata Container if (! docker exec netdata apcaccess >/dev/null ) then { docker exec netdata apk add apcupsd docker restart netdata } fi Quite simple as you can see, and keep in mind to reuse it every time netdata container is updated. 1 Quote Link to comment
mmz06 Posted September 25, 2020 Share Posted September 25, 2020 On 9/15/2020 at 4:58 PM, TexasUnraid said: The DBengine is working good now, saving data as expected. Although I am now getting a new issue, my docker stats stopped showing up? I used to be able to see all the stats for my dockers but now they are simply missing? I have restarted netdata a few times but they are still gone. Any ideas? I had the same issue, and by inspecting the logs, it seems to be related to some db files still locked when netdata container is restarted, as I suspect my drive not fast enough, even if it's a nice Samsung SSD... So the workaround I found is to stop netdata container, wait a bit like a minute or two, and then start netdata container, and if containers status doesn't show at this time, just refresh netdata page after a few seconds... Now I'm not a Netdata expert so I may be wrong and you may face another issue. Quote Link to comment
BeerNut Posted September 26, 2020 Share Posted September 26, 2020 On 9/25/2020 at 10:49 AM, mmz06 said: I have the same problem so I dug into it 😉 It seems that "apcupsd" package is missing, and netdata GitHub team seems to be struggling with lots of issues... Waiting for their solution, I built a custom script you can use to add this missing packet to netdata container automatically: Thank you so much for this! Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.