Prometheus is an open supply monitoring and alerting system that gives perception into the state and historical past of a pc, software, or cluster by storing outlined metrics in a time-series database. It offers a strong question language, PromQL, that will help you discover and perceive the info it shops. Prometheus additionally consists of an Alertmanager that makes it straightforward to set off notifications when the metrics you gather cross sure thresholds. Most importantly, Prometheus is versatile and straightforward to set as much as monitor all types of metrics from no matter system it’s essential to observe.
As website reliability engineers (SREs) on Red Hat’s OpenShift Dedicated group, we use Prometheus as a central part of our monitoring and alerting for clusters and different facets of our infrastructure. Using Prometheus, we will predict when issues might happen by following traits within the knowledge we gather from nodes within the cluster and providers we run. We can set off alerts when sure thresholds are crossed or occasions happen. As an information supply for Grafana, Prometheus allows us to supply graphs of knowledge over time to see how a cluster or service is behaving.
Prometheus is a strategic piece of infrastructure for us at work, however additionally it is helpful to me at residence. Luckily, it is not solely highly effective and helpful but additionally straightforward to arrange in a house setting, with or with out Kubernetes, OpenShift, containers, and so on. This article reveals you learn how to construct a Prometheus container picture and arrange the Prometheus Node Exporter to gather knowledge from residence computer systems. It additionally explains some primary PromQL, the question language Prometheus makes use of to return knowledge and create graphs.
Build a Prometheus container picture
The Prometheus venture publishes its personal container picture, quay.io/prometheus/prometheus
. However, I take pleasure in constructing my very own for residence initiatives and like to make use of the Red Hat Universal Base Image household for my initiatives. These pictures are freely obtainable for anybody to make use of. I desire the Universal Base Image 8 Minimal (ubi8-minimal) picture primarily based on Red Hat Enterprise Linux eight. The ubi8-minimal picture is a smaller model of the conventional ubi8 pictures. It is bigger than the official Prometheus container picture’s ultra-sparse Busybox picture, however since I take advantage of the Universal Base Image for different initiatives, that layer is a wash when it comes to disk area for me. (If two pictures use the identical layer, that layer is shared between them and would not use any extra disk area after the primary picture.)
My Containerfile for this venture is break up right into a multi-stage build. The first, builder
, installs just a few instruments through DNF packages to make it simpler to obtain and extract a Prometheus launch from GitHub, then downloads a selected launch for no matter structure I would like (both ARM64 for my Raspberry Pi Kubernetes cluster or AMD64 for working domestically on my laptop computer), and extracts it:
# The first stage construct, downloading Prometheus from Github and extracting itFROM registry.entry.redhat.com/ubi8/ubi-minimal as builder
LABEL maintainer "Chris Collins <collins.christopher@gmail.com>"# Install packages wanted to obtain and extract the Prometheus launch
RUN microdnf set up -y gzip jq tar# Replace the ARCH for various structure variations, eg: "linux-arm64.tar.tz"
ENV PROMETHEUS_ARCH="linux-amd64.tar.gz"# Replace "tag/<tag_name>" with "latest" to construct regardless of the newest tag is on the time
ENV PROMETHEUS_VERSION="tags/v2.27.0"
ENV PROMETHEUS="https://api.github.com/repos/prometheus/prometheus/releases/$PROMETHEUS_VERSION"# The checksum file for the Prometheus venture is "sha256sums.txt"
ENV SUMFILE="sha256sums.txt"RUN mkdir /prometheus
WORKDIR /prometheus# Download the checksum
RUN /bin/sh -c "curl -sSLf $(curl -sSLf $ -o - | jq -r '.assets[] | select(.name|test(env.SUMFILE)) | .browser_download_url') -o $"# Download the binary tarball
RUN /bin/sh -c "curl -sSLf -O $(curl -sSLf $ -o - | jq -r '.assets[] | select(.name|test(env.PROMETHEUS_ARCH)) |.browser_download_url')"# Check the binary and checksum match
RUN sha256sum --check --ignore-missing $# Extract the tarball
RUN tar --extract --gunzip --no-same-owner --strip-components=1 --directory /prometheus --file *.tar.gz
The second stage of the multi-stage construct copies the extracted Prometheus information to a pristine ubi8-minimal picture (there is not any want for the additional instruments from the primary picture to take up area within the ultimate picture) and hyperlinks the binaries into the $PATH
:
# The second construct stage, creating the ultimate picture
FROM registry.entry.redhat.com/ubi8/ubi-minimal
LABEL maintainer "Chris Collins "# Get the binary from the builder picture
COPY --from=builder /prometheus /prometheusWORKDIR /prometheus
# Link the binary information into the $PATH
RUN ln prometheus /bin/
RUN ln promtool /bin/# Validate prometheus binary
RUN prometheus --version# Add dynamic goal (file_sd_config) assist to the prometheus config
# https://prometheus.io/docs/prometheus/latest/configuration/configuration/#file_sd_config
RUN echo -e "n
- job_name: 'dynamic'n
file_sd_configs:n
- information:n
- knowledge/sd_config*.yamln
- knowledge/sd_config*.jsonn
refresh_interval: 30s
" >> prometheus.ymlEXPOSE 9090
VOLUME ["/prometheus/data"]ENTRYPOINT ["prometheus"]
CMD ["--config.file=prometheus.yml"]
Build the picture:
# Build the Prometheus picture from the Containerfile
podman construct --format docker -f Containerfile -t prometheus
I am utilizing Podman as my container engine at residence, however you should utilize Docker if you happen to desire. Just substitute the podman
command with docker
above.
After constructing this picture, you are able to run Prometheus domestically and begin gathering some metrics.
Running Prometheus
# This solely must be achieved as soon as
# This listing will retailer the metrics Prometheus collects in order that they persist between container restarts
mkdir knowledge# Run Prometheus domestically, utilizing the ./knowledge listing for persistent knowledge storage
# Note that the picture identify, prometheus:newest, might be no matter picture you might be utilizing
podman run --mount=kind=bind,src=$(pwd)/knowledge,dst=/prometheus/knowledge,relabel=shared --publish=127.zero.zero.1:9090:9090 --detach prometheus:newest
The Podman command above runs Prometheus in a container, mounting the Data listing into the container and permitting you to entry the Prometheus internet interface with a browser solely from the machine working the container. If you wish to entry Prometheus from different hosts, substitute --publish=127.zero.zero.1:9090:9090
within the command with --publish=9090:9090
.
Once the container is working, you need to be capable to entry Prometheus at http://127.0.0.1:9000/graph
. There will not be a lot to have a look at but, although. By default, Prometheus is aware of solely to test itself (the Prometheus service) for metrics associated to itself. For instance, navigating to the hyperlink above and coming into a question for prometheus_http_requests_total
will present what number of HTTP requests Prometheus has obtained (almost certainly, simply these you’ve got made to date).
This question can be referenced as a URL:
http://127.0.0.1:9090/graph?g0.expr=prometheus_http_requests_total&g0.tab=1&g0.stacked=0&g0.range_input=1h
Clicking it ought to take you to the identical outcomes. By default, Prometheus scrapes for metrics each 15 seconds, so these metrics will replace over time (assuming they’ve modified because the final scrape).
You also can graph the info over time by coming into a question (as above) and clicking the Graph tab.
Graphs can be referenced as a URL:
http://127.0.0.1:9090/graph?g0.expr=prometheus_http_requests_total&g0.tab=0&g0.stacked=0&g0.range_input=1h
This inside knowledge will not be useful by itself, although. So let’s add some helpful metrics.
Add some knowledge
Prometheus—the venture—publishes a program referred to as Node Exporter for exporting helpful metrics in regards to the pc or node it’s working on. You can use Node Exporter to rapidly create a metrics goal to your native machine, exporting knowledge resembling reminiscence utilization and CPU consumption for Prometheus to trace.
In the curiosity of brevity, simply run the quay.io/prometheus/node-exporter:newest
container picture revealed by the Projetheus venture to get began.
Run the next with Podman or your container engine of selection:
podman run --net="host" --pid="host" --mount=kind=bind,src=/,dst=/host,ro=true,bind-propagation=rslave --detach quay.io/prometheus/node-exporter:newest --path.rootfs=/host
This will begin a Node Exporter in your native machine and start publishing metrics on port 9100. You can see which metrics are being generated by opening http://127.0.0.1:9100/metrics
in your browser. It will look just like this:
# HELP go_gc_duration_seconds A abstract of the pause length of rubbish assortment cycles.
# TYPE go_gc_duration_seconds abstract
go_gc_duration_seconds zero.000176569
go_gc_duration_seconds zero.000176569
go_gc_duration_seconds zero.000220407
go_gc_duration_secondsquantile="0.75" zero.000220407
go_gc_duration_secondsquantile="1" zero.000220407
go_gc_duration_seconds_sum zero.000396976
go_gc_duration_seconds_count 2
Now you simply want to inform Prometheus that the info is there. Prometheus makes use of a algorithm referred to as scrape_configs which can be outlined in its configuration file, prometheus.yml
, to resolve what hosts to test for metrics and the way typically to test them. The scrape_configs may be set statically within the Prometheus config file, however that does not make Prometheus very versatile. Every time you add a brand new goal, you would need to replace the config file, cease Prometheus manually, and restart it. Prometheus has a greater method, referred to as file-based service discovery.
In the Containerfile above, there is a stanza including a dynamic file-based service discovery configuration to the Prometheus config file:
RUN echo -e "n
- job_name: 'dynamic'n
file_sd_configs:n
- information:n
- knowledge/sd_config*.yamln
- knowledge/sd_config*.jsonn
refresh_interval: 30s
" >> prometheus.ym
This tells Prometheus to search for information named sd_config*.yaml
or sd_config*.json
within the Data listing which can be mounted into the working container and to test each 30 seconds to see if there are extra config information or if they’ve modified in any respect. Using information with that naming conference, you may inform Prometheus to begin searching for different targets, such because the Node Exporter you began earlier.
Create a file named sd_config_01.json
within the Data listing with the next contents, changing your_hosts_ip_address
with the IP tackle of the host working the Node Exporter:
[
Check http://127.0.0.1:9090/targets
in Prometheus; you need to see Prometheus monitoring itself (contained in the container) and the goal you added for the host with the Node Exporter. Click on the hyperlink for this new goal to see the uncooked knowledge Prometheus has scraped. It ought to look acquainted:
# NOTE: Truncated for brevity
# HELP go_gc_duration_seconds A abstract of the pause length of rubbish assortment cycles.
# TYPE go_gc_duration_seconds abstract
go_gc_duration_seconds three.6547e-05
go_gc_duration_seconds zero.000107517
go_gc_duration_seconds zero.00017582
go_gc_duration_secondsquantile="0.75" zero.000503352
go_gc_duration_secondsquantile="1" zero.008072206
go_gc_duration_seconds_sum zero.029700021
go_gc_duration_seconds_count 55
This is identical knowledge the Node Exporter is exporting:
http://127.0.0.1:9090/graph?g0.expr=rate(node_network_receive_bytes_total%7B%7D%5B5m%5D)&g0.tab=0&g0.stacked=0&g0.range_input=15m
With this info, you may create your personal guidelines and instrument your personal purposes to offer metrics for Prometheus to devour.
A light-weight introduction to PromQL
PromQL is Prometheus’ question language and a strong option to combination the time-series knowledge saved in Prometheus. Prometheus reveals you the output of a question because the uncooked end result, or it may be displayed as a graph exhibiting the development of the info over time, just like the node_network_receive_bytes_total
instance above. PromQL may be formidable to get into, and this text is not going to dive right into a full tutorial on learn how to use it, however I’ll cowl some fundamentals.
To get began, pull up the question interface for Prometheus:
http://127.0.0.1:9090/graph
Look on the node_network_receive_bytes_total
metrics on this instance. Enter that string into the question subject, and press Enter to show all of the collected community metrics from the pc on which the Node Exporter is working. (Note that Prometheus offers an autocomplete characteristic, making it straightforward to discover the metrics it collects.) You might even see a number of outcomes, every with labels which have been utilized to the info despatched by the Node Exporter:
Looking on the picture above, you may see eight interfaces, every labeled by the system identify (e.g., system="ensp12s0u1"
), the occasion they had been collected from (on this case, all the identical node), and the job node that was assigned within the sd_config_01.json
. To the best of those is the most recent uncooked metric knowledge for this system. In the case of the ensp12s0u1
system, it is obtained 4007938272
bytes of knowledge over the interface since Prometheus began monitoring the info.
Note: The “job” label is beneficial in defining what sort of knowledge is being collected. For instance, “node” for metrics despatched by the Node Exporter, or “cluster” for Kubernetes cluster knowledge, or maybe an software identify for a selected service you could be monitoring.
Click on the Graph tab, and you’ll see the metrics for these gadgets graphed over time (one hour by default). The time interval may be adjusted utilizing the - +
toggle on the left. Historical knowledge is displayed and graphed together with the present worth. This offers priceless perception into how the info adjustments over time:
You can additional refine the displayed knowledge utilizing the labels. This graph shows all of the interfaces reported by the Node Exporter, however what in case you are simply within the wi-fi system? By altering the question to incorporate the label node_network_receive_bytes_totalsystem="wlp2s0"
, you may consider simply the info matching that label. Prometheus robotically adjusts the dimensions to a extra human-readable one after the opposite gadgets’ knowledge is eliminated:
This knowledge is useful in itself, however Prometheus’ PromQL additionally has a number of question capabilities that may be utilized to the info to offer extra info. For instance, look once more on the charge()
operate. The charge()
operate “calculates the per-second average rate of increase of the time series in the range vector.” That’s a flowery method of claiming “shows how quickly the data grew.”
Looking on the graph for the wi-fi system above, you may see a slight curve—a barely extra vertical improve—within the line graph proper round 19:00 hours. It would not appear like a lot by itself however, utilizing the charge()
operate, it’s potential to calculate simply how a lot bigger the expansion spike was round that timeframe. Using the question charge(node_network_receive_bytes_totalsystem="wlp2s0"[15m])
reveals the speed that the obtained bytes elevated for the wi-fi system, averaged per second over a 15-minute interval:
It is rather more evident that round 19:00 hours, the wi-fi system obtained nearly 3 times as a lot visitors for a quick interval.
PromQL can do rather more than this. Using the predict_linear()
operate, Prometheus could make an informed guess about when a sure threshold might be crossed. Using the identical wi-fi network_receive_bytes
knowledge, you may predict the place the worth might be over the following 4 hours primarily based on the info from the earlier 4 hours (or any mixture you is perhaps desirous about). Try querying predict_linear(node_network_receive_bytes_totalsystem="wlp2s0"[4h], four * 3600)
.
The necessary little bit of the predict_linear()
operate above is [4h], four * 3600
. The [4h]
tells Prometheus to make use of the previous 4 hours as a dataset after which to foretell the place the worth might be over the following 4 hours (or four * 3600
since there are three,600 seconds in an hour). Using the instance above, Prometheus predicts that the wi-fi system may have obtained nearly 95MB of knowledge about an hour from now (your knowledge will differ):
You can begin to see how this is perhaps helpful, particularly in an operations capability. Kubernetes exports node disk utilization metrics and features a built-in alert utilizing predict_linear()
to estimate when a disk may run out of area. You can use all of those queries along side Prometheus’ Alertmanager to inform you when numerous situations are met—from community utilization being too excessive to disk area in all probability working out within the subsequent 4 hours and extra. Alertmanager is one other helpful subject that I am going to cowl in a future article.
Conclusion
Prometheus consumes metrics by scraping endpoints for specifically formatted knowledge. Data is tracked and may be queried for point-in-time information or graphed to indicate adjustments over time. Even higher, Prometheus helps, out of the field, alerting guidelines that may hook in along with your infrastructure in a wide range of methods. Prometheus can be used as an information supply for different initiatives, like Grafana, to offer extra subtle graphing info.
In the actual world at work, we use Prometheus to trace metrics and supply alert thresholds that web page us when clusters are unhealthy, and we use Grafana to make dashboards of knowledge we have to view often. We export node knowledge to trace our nodes and instrument our operators to trace their efficiency and well being. Prometheus is the spine of all of it.
If you’ve got been desirous about Prometheus, preserve your eyes peeled for follow-up articles. You’ll study alerting when sure situations are met, utilizing Prometheus’ built-in Alertmanager and integrations with it, extra difficult PromQL, and learn how to instrument your personal software and combine it with Prometheus.