Monitoring
Learn how to monitor an AvalancheGo node.
This tutorial demonstrates how to set up infrastructure to monitor an instance of AvalancheGo. We will use:
- Prometheus to gather and store data
node_exporter
to get information about the machine,- AvalancheGo's Metrics API to get information about the node
- Grafana to visualize data on a dashboard.
- A set of pre-made Avalanche dashboards
Prerequisites:
- A running AvalancheGo node
- Shell access to the machine running the node
- Administrator privileges on the machine
This tutorial assumes you have Ubuntu 20.04 running on your node. Other Linux flavors that use systemd
for running services and apt-get
for package management might work but have not been tested. Community member has reported it works on Debian 10, might work on other Debian releases as well.
Caveat: Security
The system as described here should not be opened to the public internet. Neither Prometheus nor Grafana as shown here is hardened against unauthorized access. Make sure that both of them are accessible only over a secured proxy, local network, or VPN. Setting that up is beyond the scope of this tutorial, but exercise caution. Bad security practices could lead to attackers gaining control over your node! It is your responsibility to follow proper security practices.
Monitoring Installer Script
In order to make node monitoring easier to install, we have made a script that does most of the work for you. To download and run the script, log into the machine the node runs on with a user that has administrator privileges and enter the following command:
This will download the script and make it executable.
Script itself is run multiple times with different arguments, each installing a different tool or part of the environment. To make sure it downloaded and set up correctly, begin by running:
It should display:
Let's get to it.
Step 1: Set up Prometheus
Run the script to execute the first step:
It should produce output something like this:
You may be prompted to confirm additional package installs, do that if asked. Script run should end with instructions on how to check that Prometheus installed correctly. Let's do that, run:
It should output something like:
Note the active (running)
status (press q
to exit). You can also check Prometheus web interface, available on http://your-node-host-ip:9090/
You may need to do sudo ufw allow 9090/tcp
if the firewall is on, and/or adjust the security settings to allow connections to port 9090 if the node is running on a cloud instance. For AWS, you can look it up here. If on public internet, make sure to only allow your IP to connect!
If everything is OK, let's move on.
Step 2: Install Grafana
Run the script to execute the second step:
It should produce output something like this:
To make sure it's running properly:
which should again show Grafana as active
. Grafana should now be available at http://your-node-host-ip:3000/
from your browser. Log in with username: admin, password: admin, and you will be prompted to set up a new, secure password. Do that.
You may need to do sudo ufw allow 3000/tcp
if the firewall is on, and/or adjust the cloud instance settings to allow connections to port 3000. If on public internet, make sure to only allow your IP to connect!
Prometheus and Grafana are now installed, we're ready for the next step.
Step 3: Set up node_exporter
In addition to metrics from AvalancheGo, let's set up monitoring of the machine itself, so we can check CPU, memory, network and disk usage and be aware of any anomalies. For that, we will use node_exporter
, a Prometheus plugin.
Run the script to execute the third step:
The output should look something like this:
Again, we check that the service is running correctly:
If the service is running, Prometheus, Grafana and node_exporter
should all work together now. To check, in your browser visit Prometheus web interface on http://your-node-host-ip:9090/targets
. You should see three targets enabled:
- Prometheus
- AvalancheGo
avalanchego-machine
Make sure that all of them have State
as UP
.
If you run your AvalancheGo node with TLS enabled on your API port, you will need to manually edit the /etc/prometheus/prometheus.yml
file and change the avalanchego
job to look like this:
Mind the spacing (leading spaces too)! You will need admin privileges to do that (use sudo
). Restart Prometheus service afterwards with sudo systemctl restart prometheus
.
All that's left to do now is to provision the data source and install the actual dashboards that will show us the data.
Step 4: Dashboards
Run the script to install the dashboards:
It will produce output something like this:
This will download the latest versions of the dashboards from GitHub and provision Grafana to load them, as well as defining Prometheus as a data source. It may take up to 30 seconds for the dashboards to show up. In your browser, go to: http://your-node-host-ip:3000/dashboards
. You should see 7 Avalanche dashboards:
Select 'Avalanche Main Dashboard' by clicking its title. It should load, and look similar to this:
Some graphs may take some time to populate fully, as they need a series of data points in order to render correctly.
You can bookmark the main dashboard as it shows the most important information about the node at a glance. Every dashboard has a link to all the others as the first row, so you can move between them easily.
Step 5: Additional Dashboards (Optional)
Step 4 installs the basic set of dashboards that make sense to have on any node. Step 5 is for installing additional dashboards that may not be useful for every installation.
Currently, there is only one additional dashboard: Avalanche L1s. If your node is running any Avalanche L1s, you may want to add this as well. Do:
This will add the Avalanche L1s dashboard. It allows you to monitor operational data for any Avalanche L1 that is synced on the node. There is an Avalanche L1 switcher that allows you to switch between different Avalanche L1s. As there are many Avalanche L1s and not every node will have all of them, by default, it comes populated only with Spaces and WAGMI Avalanche L1s that exist on Fuji testnet:
To configure the dashboard and add any Layer 1s that your node is syncing, you will need to edit the dashboard. Select the dashboard settings
icon (image of a cog) in the upper right corner of the dashboard display and switch to Variables
section and select the subnet
variable. It should look something like this:
The variable format is:
and the separator between entries is a comma. Entries for Spaces and WAGMI look like:
After editing the values, press Update
and then click Save dashboard
button and confirm. Press the back arrow in the upper left corner to return to the dashboard. New values should now be selectable from the dropdown and data for the selected Avalanche L1 will be shown in the panels.
Updating
Available node metrics are updated constantly, new ones are added and obsolete removed, so it is good a practice to update the dashboards from time to time, especially if you notice any missing data in panels. Updating the dashboards is easy, just run the script with no arguments, and it will refresh the dashboards with the latest available versions. Allow up to 30s for dashboards to update in Grafana.
If you added the optional extra dashboards (step 5), they will be updated as well.
Summary
Using the script to install node monitoring is easy, and it gives you insight into how your node is behaving and what's going on under the hood. Also, pretty graphs!
If you have feedback on this tutorial, problems with the script or following the steps, send us a message on Discord.
Last updated on