ich777 Posted April 28, 2023 Author Share Posted April 28, 2023 10 hours ago, Masterwishx said: @ich777 can you maybe add NUT (ups) prometheus expoter if you will have time for it ? The main issue with NUT is that I don't have a UPS to test this and there are plenty of exporters out there and I really don't know which is best. A few users already requested that but this is really hard for me to do without testing it. Maybe I'm going to buy one soon... 1 Quote Link to comment
Masterwishx Posted April 28, 2023 Share Posted April 28, 2023 OK , Thanks , i understand you , becose myself only now buyed for first time UPS , anyway founded NUT exporter for influxdb in CA for now ... 1 Quote Link to comment
Masterwishx Posted April 30, 2023 Share Posted April 30, 2023 im using node exporter with Node Exporter Full ID: 1860 , is any way to update dashboard when new revision available or only to make new dashboard ? Quote Link to comment
ich777 Posted April 30, 2023 Author Share Posted April 30, 2023 2 hours ago, Masterwishx said: im using node exporter with Node Exporter Full ID: 1860 , is any way to update dashboard when new revision available or only to make new dashboard ? I don't understand, the Node Exporter is the latest version, to be precise v1.5.0 from here. If you need to update the dashboard, I'm sure you'll find a new dashboard here. The Dashboards that I've listed on the first page are only examples. Quote Link to comment
Masterwishx Posted April 30, 2023 Share Posted April 30, 2023 42 minutes ago, ich777 said: I'm sure you'll find a new dashboard here i was meaning about dashboard 1860 revision for node exporter... Quote Link to comment
ich777 Posted April 30, 2023 Author Share Posted April 30, 2023 16 minutes ago, Masterwishx said: i was meaning about dashboard 1860 revision for node exporter... I think you always have to update the full dashborad if I‘m not mistaken. Quote Link to comment
Masterwishx Posted April 30, 2023 Share Posted April 30, 2023 1 minute ago, ich777 said: I think you always have to update the full dashborad if I‘m not mistaken. its what i did all the time , but then if i make some changes to dashboard they gone in new one , so i was thinking maybe somehow i can move them (changes) to new one ... Quote Link to comment
ich777 Posted May 2, 2023 Author Share Posted May 2, 2023 On 4/30/2023 at 12:20 PM, Masterwishx said: so i was thinking maybe somehow i can move them (changes) to new one I don't think that's possible but I could be wrong about that... 1 Quote Link to comment
ezek1el3000 Posted May 14, 2023 Share Posted May 14, 2023 I got everything working except the Cpu, util. and energy metrics. Any idea why they dont show up? Fritzbox 7590 7.50 metrics.txt Quote Link to comment
moarSmokes Posted June 18, 2023 Share Posted June 18, 2023 (edited) Got a problem with the nvidia exporter. Prometheus throws a error, because my gpu doesn't seems to provide the value for one of the exported key. Got this from the exporter /metrics site. nvidiasmi_power_state_int{id="00000000:03:00.0",uuid="UUID",name="Quadro P2000"} And thats the error message from prometheus. expected value after metric, got "\n" ("INVALID") while parsing: "nvidiasmi_power_state_int{id=\"00000000:03:00.0\",uuid=\"GPU-UUID\",name=\"Quadro P2000\"} \n" Edited June 18, 2023 by moarSmokes 3 Quote Link to comment
ich777 Posted June 19, 2023 Author Share Posted June 19, 2023 11 hours ago, moarSmokes said: Got this from the exporter /metrics site. I will look into that, please give me a few days please. 4 Quote Link to comment
CoolHam Posted June 29, 2023 Share Posted June 29, 2023 Just wanted to jump in and confirm I am also having the same issue as moarSmokes. Seems like the most recent Unraid upgrade has broken it for some reason. Thank you for all the free, community work you do ich777! I'm sure you will have this fixed when you have the time to spare! Quote Link to comment
ich777 Posted June 29, 2023 Author Share Posted June 29, 2023 2 hours ago, CoolHam said: Just wanted to jump in and confirm I am also having the same issue as moarSmokes. Seems like the most recent Unraid upgrade has broken it for some reason. Thank you for all the free, community work you do ich777! I'm sure you will have this fixed when you have the time to spare! Currently I haven't got much time to look into this. BTW not Unraid broke this, the latest Nvidia driver is the cause of the issue because they changed the naming I think from the power draw. @CoolHam, @moarSmokes, @bailey & @Oasistem please also consider reporting that over here since the Prometheus exporter is based on this repository: Click Quote Link to comment
CoolHam Posted July 1, 2023 Share Posted July 1, 2023 Thanks for the additional information. I have opened an issue. https://github.com/e7d/docker-prometheus-nvidiasmi/issues/1 1 Quote Link to comment
HHUBS Posted July 9, 2023 Share Posted July 9, 2023 (edited) On 7/4/2021 at 1:18 AM, ich777 said: Prometheus AdGuard Exporter Note: You can connect to any AdGuard Home on your local network and of course if you run it on unRAID in a Docker container or VM. Download and install the Prometheus AdGuard Exporter plugin from the CA App: Go to the plugin settings by clicking on "Settings -> AdGuard Exporter" (at the bottom of the Settings page) : Enter your IP from AdGuard, Port, admin username & the password and click on "Confirm & Start": (Please note that if you run your AdGuard in a Docker container in a Custom network like br0 you have to enable the option "Enable host access" in your Docker settings, otherwise the plugin can't connect to your AdGuard instance) After that you should see in the right top corner that the Exporter is running and details about it: Open up the prometheus.yml (Step 4 + 5 from the first post), add a line with '- targets: ["YOURSERVERIP:9617"]' (please change "YOURSERVERIP" to your Server IP), save and close the file: Go to the Docker page and restart Prometheus: Open up the Grafana WebUI: In Grafana click on "+ -> Import": Now we are going to import a preconfigured Dashboard for the AdGuard Exporter from Grafana.com (Source), to do this simply enter the ID (13330) from the Dasboard and click "Load": In the next screen rename the Dashboard to your liking, select "Prometheus" as datasource and click on "Import": Now you should be greeted with something like this (please keep in mind that the Dashboard can display N/A at some values, especiall at the gauges, since there is not enough data available, wait a few minutes and you will see that the values are filled in): (Now you will notice that this warning: "Panel plugin not found: grafana-piechar-panel" appears on the Dasboard, to fix this follow the next steps) Go to your Docker page and click on Grafana and select "Console": In the next window enter the following 'grafana-cli plugins install grafana-piechart-panel' and press RETURN: After that close the Console window and restart the Grafana Docker container: Now go back to your AdGuard Dashboard within Grafana and you should now see that the Dasboard is fully loaded: ATTENTION Please note if you restart your AdGuard container the Exporter will stop and you have to manually start it from the plugin configuration page with the "START" button. This also applies if you have CA Backup installed and the container is beeing backed up. To workaround that you don't have to manually restart it after each CA Backup do the following steps: Go to Settings and click on the bottom on "Backup/Restore Appdata": Confirm the Warning that pops up and scroll all the way down to the bottom and click on "Show Advanced Settings": At AdGuard make sure that you click on the switch so that it shows "Don't Stop": Scroll down to the bottom and click "Apply": @ich777 will this work if it's set to ipvlan instead of macvlan? Because mine was set to ipvlan due to the recent issue about macvlan and prometheus adguard is not starting. Edited July 9, 2023 by HHUBS Quote Link to comment
ich777 Posted July 9, 2023 Author Share Posted July 9, 2023 13 minutes ago, HHUBS said: @ich777 will this work if it's set to ipvlan instead of macvlan? Yes, why not? But you have to enable host access. 13 minutes ago, HHUBS said: Because mine was set to ipvlan due to the recent issue about macvlan and prometheus adguard is not starting. Then reenable host acces again or better speaking restart the Docker service once. Quote Link to comment
HHUBS Posted July 9, 2023 Share Posted July 9, 2023 4 hours ago, ich777 said: Yes, why not? But you have to enable host access. It's already enabled. I tried to changed to macvlan. Still no joy. 4 hours ago, ich777 said: Then reenable host acces again or better speaking restart the Docker service once. Tried this but no luck. Quote Link to comment
ich777 Posted July 9, 2023 Author Share Posted July 9, 2023 1 minute ago, HHUBS said: It's already enabled. I tried to changed to macvlan. Still no joy. Can you describe your setup a bit more in depth please? Please change back to IPVLAN. On what network does the container run? What are your exact Docker settings now? What are the exporter settings now? Quote Link to comment
HHUBS Posted July 9, 2023 Share Posted July 9, 2023 (edited) 13 minutes ago, ich777 said: Can you describe your setup a bit more in depth please? Please change back to IPVLAN. Yep. I already did. 13 minutes ago, ich777 said: On what network does the container run? Prometheus and Grafana are in a custom docker network. Adguard is running in br0. 13 minutes ago, ich777 said: What are your exact Docker settings now? Here are the docker settings currently: 13 minutes ago, ich777 said: What are the exporter settings now? Here it is: Edited July 9, 2023 by HHUBS Quote Link to comment
ich777 Posted July 9, 2023 Author Share Posted July 9, 2023 8 minutes ago, HHUBS said: Yep. I already did. Please open up a Terminal from your AdGuard container and try to ping your Unraid IP and also try to ping your AdGuard container from a Unraid terminal. Maybe you have to install ping: apt-get update apt-get -y install iputils-ping Quote Link to comment
HHUBS Posted July 9, 2023 Share Posted July 9, 2023 3 minutes ago, ich777 said: Please open up a Terminal from your AdGuard container and try to ping your Unraid IP and also try to ping your AdGuard container from a Unraid terminal. I can ping either way. I can actually open adguard webgui with the username and password on my pc which is the same subnet as my unraid. Quote Link to comment
blub3k Posted July 21, 2023 Share Posted July 21, 2023 (edited) Hey, I found a bug in the nvidia-smi exporter. Power-states now start with a `P` prefix. Driver: 535.54.03 GPU: NVIDIA T400 root@Tower:~# nvidia-smi -q -x | grep state <performance_state>P0</performance_state> <power_state>P0</power_state> <power_state>P0</power_state> <state>N/A</state> Currently I am getting the following error in Prometheus: expected value after metric, got "\n" ("INVALID") while parsing: "nvidiasmi_power_state_int{id=\"00000000:01:00.0\",uuid=\"GPU-9939bddd-4894-5658-1192-eac5d9ce2151\",name=\"NVIDIA T400\"} \n" Metrics looks like this nvidiasmi_driver_version{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 535.54 nvidiasmi_cuda_version{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 12.2 nvidiasmi_attached_gpus{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 1 nvidiasmi_pci_pcie_gen_max{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 3 nvidiasmi_pci_pcie_gen_current{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 3 nvidiasmi_pci_link_width_max_multiplicator{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 16 nvidiasmi_pci_link_width_current_multiplicator{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 16 nvidiasmi_pci_replay_counter{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 0 nvidiasmi_pci_replay_rollover_counter{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 0 nvidiasmi_pci_tx_util_bytes_per_second{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 2e+06 nvidiasmi_pci_rx_util_bytes_per_second{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 0 nvidiasmi_fan_speed_percent{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 38 nvidiasmi_performance_state_int{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 0 nvidiasmi_fb_memory_usage_total_bytes{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 2.147483648e+09 nvidiasmi_fb_memory_usage_used_bytes{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 6.59554304e+08 nvidiasmi_fb_memory_usage_free_bytes{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 1.298137088e+09 nvidiasmi_bar1_memory_usage_total_bytes{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 2.68435456e+08 nvidiasmi_bar1_memory_usage_used_bytes{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 5.24288e+06 nvidiasmi_bar1_memory_usage_free_bytes{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 2.63192576e+08 nvidiasmi_utilization_gpu_percent{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 3 nvidiasmi_utilization_memory_percent{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 3 nvidiasmi_utilization_encoder_percent{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 8 nvidiasmi_utilization_decoder_percent{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 2 nvidiasmi_encoder_session_count{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 1 nvidiasmi_encoder_average_fps{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 6 nvidiasmi_encoder_average_latency{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 31910 nvidiasmi_fbc_session_count{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 0 nvidiasmi_fbc_average_fps{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 0 nvidiasmi_fbc_average_latency{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 0 nvidiasmi_gpu_temp_celsius{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 47 nvidiasmi_gpu_temp_max_threshold_celsius{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 101 nvidiasmi_gpu_temp_slow_threshold_celsius{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 98 nvidiasmi_gpu_temp_max_gpu_threshold_celsius{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 93 nvidiasmi_memory_temp_celsius{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 0 nvidiasmi_gpu_temp_max_mem_threshold_celsius{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 0 nvidiasmi_power_state_int{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} nvidiasmi_power_draw_watts{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 0 nvidiasmi_power_limit_watts{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 0 nvidiasmi_default_power_limit_watts{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 0 nvidiasmi_enforced_power_limit_watts{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 0 nvidiasmi_min_power_limit_watts{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 0 nvidiasmi_max_power_limit_watts{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 0 nvidiasmi_clock_graphics_hertz{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 1.335e+09 nvidiasmi_clock_graphics_max_hertz{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 2.1e+09 nvidiasmi_clock_sm_hertz{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 1.335e+09 nvidiasmi_clock_sm_max_hertz{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 2.1e+09 nvidiasmi_clock_mem_hertz{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 5e+09 nvidiasmi_clock_mem_max_hertz{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 5.001e+09 nvidiasmi_clock_video_hertz{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 1.23e+09 nvidiasmi_clock_video_max_hertz{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 1.95e+09 nvidiasmi_clock_policy_auto_boost{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 0 nvidiasmi_clock_policy_auto_boost_default{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400"} 0 nvidiasmi_process_used_memory_bytes{id="00000000:01:00.0",uuid="GPU-9939bddd-4894-5658-1192-eac5d9ce2151",name="NVIDIA T400",process_pid="3515",process_type="C"} 6.56408576e+08 The error might be in this function func filterNumber(value string) string { if value == "N/A" { return "0" } r := regexp.MustCompile("[^0-9.]") return r.ReplaceAllString(value, "") } ------------------------------ EDIT: I might found a solution but cannot test it. The XML looks different now: root@Tower:~# nvidia-smi -q -x | grep power_state <gpu_power_readings> <power_state>P0</power_state> <power_draw>N/A</power_draw> <current_power_limit>31.32 W</current_power_limit> <requested_power_limit>31.32 W</requested_power_limit> <default_power_limit>31.32 W</default_power_limit> <min_power_limit>20.00 W</min_power_limit> <max_power_limit>31.32 W</max_power_limit> </gpu_power_readings> Instead of "power_readings" it is "gpu_power_readings" now... https://github.com/e7d/docker-prometheus-nvidiasmi/pull/2/commits Edited July 21, 2023 by blub3k Quote Link to comment
ich777 Posted July 21, 2023 Author Share Posted July 21, 2023 40 minutes ago, blub3k said: Instead of "power_readings" it is "gpu_power_readings" now... Thank you, this is a known issue with newer drivers and I haven’t got time yet to investigate but I don‘t think that this PR will be accepted even if it will be it will break compatibility with older drivers or even the legacy drivers. Quote Link to comment
ich777 Posted July 24, 2023 Author Share Posted July 24, 2023 @blub3k, @CoolHam, @moarSmokes, @bailey & @Oasistem: The issue with the nvidia-smi exporter should be resolved, just make sure that you are on plugin version 2023.07.23 and everything should work again. Quote Link to comment
bailey Posted July 28, 2023 Share Posted July 28, 2023 On 7/24/2023 at 9:52 AM, ich777 said: @blub3k, @CoolHam, @moarSmokes, @bailey & @Oasistem: The issue with the nvidia-smi exporter should be resolved, just make sure that you are on plugin version 2023.07.23 and everything should work again. Can confirm, thank you! Had to uninstall and re-install instead of just update but works as expected now. Appreciate the time spent. 1 Quote Link to comment
Recommended Posts
Posted by SpencerJ,
1 reaction
Go to this post
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.