Skip to content

This document provides information about best practices for working with components and resources that can be used in the monitoring-operator.

Grafana dashboards

Best practices for Grafana dashboards.

Dashboard UID

The unique identifier (UID) of a dashboard can be used for uniquely identify a dashboard between multiple Grafana installs. It’s automatically generated if not provided when creating a dashboard. The uid allows having consistent URLs for accessing dashboards and when syncing dashboards between multiple Grafana installs, see dashboard provisioning for more information. This means that changing the title of a dashboard will not break any bookmarked links to that dashboard.

ATTENTION: The uid can have a maximum length of 40 characters.

If you create your own dashboard, we highly recommend to use human-readable UIDs in the following format: <dashboard-namespace>-<short-dashboard-name>. It allows using all advantages of consistent URLs for dashboards, e.g. to create stable drill down links. Namespace should be presented in the UID to avoid situations when two or more dashboards with the same names have equal UIDs.

UIDs for OOB monitoring dashboards (managed by monitoring-operator):

Dashboard name UID (namespace + short name)
alertmanager-overview <namespace>-alertmanager-overview
alerts-overview <namespace>-alerts-overview
core-dns-dashboard <namespace>-core-dns
etcd-dashboard <namespace>-etcd
govm-processes <namespace>-govm-processes
grafana-overview <namespace>-grafana-overview
ha-services <namespace>-ha-services
home-dashboard <namespace>-home-dashboard
ingress-list-of-ingresses <namespace>-ing-list-of-ingresses
ingress-nginx-controller <namespace>-ing-nginx-controller
ingress-request-handling-performance <namespace>-ing-req-handl-perform
jvm-processes <namespace>-jvm-processes
kubernetes-apiserver <namespace>-k8s-apiserver
kubernetes-cluster-overview <namespace>-k8s-cluster-overview
kubernetes-distribution-by-labels <namespace>-k8s-distr-by-labels
kubernetes-kubelet <namespace>-k8s-kubelet
kubernetes-namespace-resources <namespace>-k8s-namespace-resources
kubernetes-nodes-resources <namespace>-k8s-nodes-resources
kubernetes-pod-resources <namespace>-k8s-pod-resources
kubernetes-pods-distribution-by-node <namespace>-k8s-pods-distr-by-node
kubernetes-pods-distribution-by-zone <namespace>-k8s-pods-distr-by-zone
kubernetes-top-resources <namespace>-k8s-top-resources
node-details <namespace>-node-details
openshift-apiserver <namespace>-os-apiserver
openshift-cluster-version-operator <namespace>-os-cluster-version-operator
openshift-state-metrics <namespace>-os-state-metrics
openshift-haproxy <namespace>-os-haproxy
operators-overview <namespace>-operators-overview
overall-platform-health <namespace>-overall-platform-health
prometheus-cardinality-explorer <namespace>-prom-cardinality
prometheus-self-monitoring <namespace>-prom-self-monitoring
tls-status <namespace>-tls-status
victoriametrics-vmagent <namespace>-vm-vmagent
victoriametrics-vmalert <namespace>-vm-vmalert
victoriametrics-vmoperator <namespace>-vm-vmoperator
victoriametrics-vmsingle <namespace>-vm-vmsingle

There are several dashboards managed by Helm. In some cases name of the file with the dashboard is not the same as the title of the dashboard, so we'll use dashboard titles below. Also, remember that the home-dashboard is present in both places: in the grafana-operator Helm chart as a ConfigMap and together with the rest of dashboards managed by the operator:

Dashboard title Helm subchart UID (namespace + short name)
Blackbox Probes blackbox-exporter <namespace>-blackbox-probes
SSL/TLS Certificates cert-exporter <namespace>-ssl-tls-certs
Kafka Java Clients Monitoring common-dashboards <namespace>-kafka-java-clients
Configurations Streamer configurations-streamer <namespace>-configurations-streamer
Backup Daemon grafana-operator <namespace>-backup-daemon
Home Dashboard grafana-operator <namespace>-home-dashboard
Prometheus / Graphite remote adapter graphite-remote-adapter <namespace>-graphite-remote-adapter
Network Latency Details network-latency-exporter <namespace>-network-latency-details
Network Latency Overview network-latency-exporter <namespace>-network-latency-overview
DR Overview promxy <namespace>-dr-overview
Version overview version-exporter <namespace>-version-overview

If the name of the namespace is too long, the whole UID of the OOB dashboard will be cut to 40 symbols.

Creating custom dashboard

Best practices for creating custom Grafana dashboards.

Tags

If you create dashboard, you should add some tags that will be described, what you can see on this dashboard.

NOTE: All tags must be in lowercase. If the tag contains more than one word, words must be in "kebab-case" (separated with hyphens).

The tags below should be added to dashboard, if it satisfies the following conditions:

  • tag k8s - if the dashboard shows data about Kubernetes cluster;
  • tag prometheus - if the dashboard shows information about services (e.g. kafka, postgresql, mongodb, etc.);
  • tag standalone - if the dashboard shows information about standalone hosts (e.g. Graylog, balancers, etc.);
  • tag self-monitor - if the dashboard shows information about the monitoring system.

Also, dashboard should contain tags that describe specific information that can be founded on it. For example, dashboard that shows information about Kubernetes namespace resources can contain tags k8s and k8s-namespaces, dashboard that shows information about PostgreSQL - prometheus and postgres.

Recommendations for creating recording-rules

  • Do not create recording rules without sense, recording rules must have a reason, for example:
  • The new aggregated metric with heavy calculation will be used in alert - good case, we will use already calculated value
  • The new aggregated metric will be used on the Grafana dashboard and will open very rare - bad case, better to calculate the value in runtime that spends CPU time on its calculation
  • Do not write recording rules that can be used to calculate a big metrics scope, for example, to calculate CPU usage for the last 15 minutes for all pods in the Cloud
  • If a product or project wants to use such rules I offer to add them only once and from our side, as a part of Monitoring deployment. A good example is moving a rule to calculate CPU usage for the last 15 minutes to Monitoring deployment.
  • Do not duplicate recording rules that should calculate metrics by the same scope
  • Recording rules must be used only to calculate aggregations or to prepare new metrics aggregated from some metrics, for example:
  • Right case of usage - calculate CPU usage for the last 5-10-15 minutes
  • Right case of usage - calculate new metric that will include metrics values and labels from (kube_pod_labels)
  • Wrong case of usage - calculate any metric just because to calculate metrics need to use a big query