This document provides information about best practices for working with components and resources that can be used in the monitoring-operator.
Grafana dashboards¶
Best practices for Grafana dashboards.
Dashboard UID¶
The unique identifier (UID) of a dashboard can be used for uniquely identify a dashboard between multiple Grafana installs. It’s automatically generated if not provided when creating a dashboard. The uid allows having consistent URLs for accessing dashboards and when syncing dashboards between multiple Grafana installs, see dashboard provisioning for more information. This means that changing the title of a dashboard will not break any bookmarked links to that dashboard.
ATTENTION: The uid can have a maximum length of 40 characters.
If you create your own dashboard, we highly recommend to use human-readable UIDs in the following format:
<dashboard-namespace>-<short-dashboard-name>
. It allows using all advantages of consistent URLs for dashboards, e.g.
to create stable drill down links. Namespace should be presented in the UID to avoid situations when two or more
dashboards with the same names have equal UIDs.
UIDs for OOB monitoring dashboards (managed by monitoring-operator):
Dashboard name | UID (namespace + short name) |
---|---|
alertmanager-overview | <namespace>-alertmanager-overview |
alerts-overview | <namespace>-alerts-overview |
core-dns-dashboard | <namespace>-core-dns |
etcd-dashboard | <namespace>-etcd |
govm-processes | <namespace>-govm-processes |
grafana-overview | <namespace>-grafana-overview |
ha-services | <namespace>-ha-services |
home-dashboard | <namespace>-home-dashboard |
ingress-list-of-ingresses | <namespace>-ing-list-of-ingresses |
ingress-nginx-controller | <namespace>-ing-nginx-controller |
ingress-request-handling-performance | <namespace>-ing-req-handl-perform |
jvm-processes | <namespace>-jvm-processes |
kubernetes-apiserver | <namespace>-k8s-apiserver |
kubernetes-cluster-overview | <namespace>-k8s-cluster-overview |
kubernetes-distribution-by-labels | <namespace>-k8s-distr-by-labels |
kubernetes-kubelet | <namespace>-k8s-kubelet |
kubernetes-namespace-resources | <namespace>-k8s-namespace-resources |
kubernetes-nodes-resources | <namespace>-k8s-nodes-resources |
kubernetes-pod-resources | <namespace>-k8s-pod-resources |
kubernetes-pods-distribution-by-node | <namespace>-k8s-pods-distr-by-node |
kubernetes-pods-distribution-by-zone | <namespace>-k8s-pods-distr-by-zone |
kubernetes-top-resources | <namespace>-k8s-top-resources |
node-details | <namespace>-node-details |
openshift-apiserver | <namespace>-os-apiserver |
openshift-cluster-version-operator | <namespace>-os-cluster-version-operator |
openshift-state-metrics | <namespace>-os-state-metrics |
openshift-haproxy | <namespace>-os-haproxy |
operators-overview | <namespace>-operators-overview |
overall-platform-health | <namespace>-overall-platform-health |
prometheus-cardinality-explorer | <namespace>-prom-cardinality |
prometheus-self-monitoring | <namespace>-prom-self-monitoring |
tls-status | <namespace>-tls-status |
victoriametrics-vmagent | <namespace>-vm-vmagent |
victoriametrics-vmalert | <namespace>-vm-vmalert |
victoriametrics-vmoperator | <namespace>-vm-vmoperator |
victoriametrics-vmsingle | <namespace>-vm-vmsingle |
There are several dashboards managed by Helm. In some cases name of the file with the dashboard is not the same as
the title of the dashboard, so we'll use dashboard titles below. Also, remember that the home-dashboard
is present
in both places: in the grafana-operator Helm chart as a ConfigMap and together with the rest of dashboards managed by
the operator:
Dashboard title | Helm subchart | UID (namespace + short name) |
---|---|---|
Blackbox Probes | blackbox-exporter | <namespace>-blackbox-probes |
SSL/TLS Certificates | cert-exporter | <namespace>-ssl-tls-certs |
Kafka Java Clients Monitoring | common-dashboards | <namespace>-kafka-java-clients |
Configurations Streamer | configurations-streamer | <namespace>-configurations-streamer |
Backup Daemon | grafana-operator | <namespace>-backup-daemon |
Home Dashboard | grafana-operator | <namespace>-home-dashboard |
Prometheus / Graphite remote adapter | graphite-remote-adapter | <namespace>-graphite-remote-adapter |
Network Latency Details | network-latency-exporter | <namespace>-network-latency-details |
Network Latency Overview | network-latency-exporter | <namespace>-network-latency-overview |
DR Overview | promxy | <namespace>-dr-overview |
Version overview | version-exporter | <namespace>-version-overview |
If the name of the namespace is too long, the whole UID of the OOB dashboard will be cut to 40 symbols.
Creating custom dashboard¶
Best practices for creating custom Grafana dashboards.
Tags¶
If you create dashboard, you should add some tags that will be described, what you can see on this dashboard.
NOTE: All tags must be in lowercase. If the tag contains more than one word, words must be in "kebab-case" (separated with hyphens).
The tags below should be added to dashboard, if it satisfies the following conditions:
- tag
k8s
- if the dashboard shows data about Kubernetes cluster; - tag
prometheus
- if the dashboard shows information about services (e.g. kafka, postgresql, mongodb, etc.); - tag
standalone
- if the dashboard shows information about standalone hosts (e.g. Graylog, balancers, etc.); - tag
self-monitor
- if the dashboard shows information about the monitoring system.
Also, dashboard should contain tags that describe specific information that can be founded on it.
For example, dashboard that shows information about Kubernetes namespace resources can contain
tags k8s
and k8s-namespaces
, dashboard that shows information about PostgreSQL - prometheus
and postgres
.
Recommendations for creating recording-rules¶
- Do not create recording rules without sense, recording rules must have a reason, for example:
- The new aggregated metric with heavy calculation will be used in alert - good case, we will use already calculated value
- The new aggregated metric will be used on the Grafana dashboard and will open very rare - bad case, better to calculate the value in runtime that spends CPU time on its calculation
- Do not write recording rules that can be used to calculate a big metrics scope, for example, to calculate CPU usage for the last 15 minutes for all pods in the Cloud
- If a product or project wants to use such rules I offer to add them only once and from our side, as a part of Monitoring deployment. A good example is moving a rule to calculate CPU usage for the last 15 minutes to Monitoring deployment.
- Do not duplicate recording rules that should calculate metrics by the same scope
- Recording rules must be used only to calculate aggregations or to prepare new metrics aggregated from some metrics, for example:
- Right case of usage - calculate CPU usage for the last 5-10-15 minutes
- Right case of usage - calculate new metric that will include metrics values and labels from (kube_pod_labels)
- Wrong case of usage - calculate any metric just because to calculate metrics need to use a big query