This section describes the various configurations that can be performed on Monitoring.

Configuration¶

This section describes all the configurations and extension points that are provided by Monitoring.

Frequently used Installation Options¶

This section is intended to describe the most frequently used options and what parameters should be set during the installation.

Use PV as Storage¶

To maintain data across deployments and version upgrades, the data must be persisted to some volume other than emptyDir, allowing it to be reused by Pods after an upgrade.

Kubernetes supports several kinds of storage volumes. The Monitoring Operator uses Prometheus Operator to control the Prometheus deployment. The Prometheus Operator works with PersistentVolumeClaims, which supports the underlying PersistentVolume to be provisioned when requested.

This document assumes a basic understanding of PersistentVolumes, PersistentVolumeClaims, and their provisioning.

Storage with Dynamic Provisioning¶

Automatic provisioning of storage requires an already existing StorageClass.

For best results, use volumes that have high I/O throughput.

The StorageClass that was created can be specified in the storage section in the Prometheus resource.

prometheus:
  storage:
    volumeClaimTemplate:
      spec:
        # Specify storage class to create volume
        storageClassName: nfs-dynamic-provisioning
        resources:
          requests:
            # Specify required storage size to create volume
            storage: 10Gi

To configure Grafana to use storage with any specific StorageClass, specify the following.

grafana:
  dataStorage:
    accessModes:
      - ReadWriteOnce
    # Specify required storage size to create volume
    size: 2Gi
    # Specify storage class to create volume
    class: local-storage

Manual Storage Provisioning¶

The monitoring deploy parameters allow you to support arbitrary storage through a PersistentVolumeClaim.

The easiest way to use a volume that cannot be automatically provisioned (for whatever reason) is to use a label selector alongside a manually created PersistentVolume.

For example, using hostPath persistence volume might be accomplished with the following specifications.

prometheus:
  # Vanilla Prometheus image use user and group nobody = 65534
  # So for use PV it is better to use user nobody = 65534
  securityContext:
    fsGroup: 65534
    runAsUser: 65534
  # Because hostPath PV created on specific node, we must bind Prometheus on this node
  nodeSelector:
    kubernetes.io/hostname: worker1
  storage:
    volumeClaimTemplate:
      spec:
        resources:
          requests:
            storage: 10Gi
        selector:
          # Match PV by label on PV
          matchLabels:
            app: prometheus

For Grafana, there is no ability to use an already created hostPath volume. You can use only dynamic provisioning.

Specify custom requests and limits¶

TBD

Override SecurityContext¶

TBD

Integration with external systems¶

TBD

Metrics Collectors and Long Term Storages¶

TBD

Alerting Systems¶

TBD

Dashboards¶

TBD

User Extension Points¶

This section contains description of all extension points that can be used by users or components.

Custom Resources in application namespaces¶

Currently, Monitoring provides only Custom Resources, which can create applications in its namespace, as extension points. Other extension points also exist, but in places in the monitoring namespace where the application usually has no access.

Difference between ServiceMonitor and PodMonitor¶

At first sight, ServiceMonitor and PodMonitor look the same and fulfill the same role. However, they are different custom resources.

In simple terms, ServiceMonitor is used when your microservice has Service (kubernetes object, which balances the traffic over some instances), while PodMonitor is used when your microservice does not have Service, but contains either a Pod, or a DaemonSet, or a Job, and so on.

Before taking a look at how these features work under the hood, it would be appropriate to know the Prometheus discovery abilities.

Prometheus can discover any Kubernetes objects by itself using specified rules. Prometheus supports discovery by:

Depending on the discovery type, Prometheus discovers different resources and adds different meta labels to them.

So when prometheus-operator collects all custom resources and processes them, it converts ServiceMonitor and PodMonitor using different discovery types.

ServiceMonitor is converted using a job with discovery by Endpoints.

For more information on Endpoints, refer to the official Kubernetes documentation at https://kubernetes.io/docs/concepts/services-networking/service/. In brief, Endpoints can be generated by Pods linked to Service and manually created by any static IPs. It may be useful in cases when you want to create Service into Kubernetes on any application on cloud.

For example, the following ServiceMonitor config,

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app.kubernetes.io/component: monitoring
    app.kubernetes.io/managed-by: monitoring-operator
    app.kubernetes.io/name: node-exporter-service-monitor
  name: monitoring-node-exporter
  namespace: monitoring
spec:
  endpoints:
    - bearerTokenSecret:
        key: ''
      interval: 30s
      port: metrics
      relabelings:
        - action: replace
          regex: (.*)
          replacement: $1
          sourceLabels:
            - __meta_kubernetes_pod_node_name
          targetLabel: node
      scrapeTimeout: 10s
  jobLabel: node-exporter
  namespaceSelector:
    matchNames:
      - monitoring
  selector:
    matchExpressions:
      - key: platform.monitoring.app
        operator: In
        values:
          - node-exporter

converts in the following Prometheus job.

- job_name: serviceMonitor/monitoring/monitoring-node-exporter/0
  honor_timestamps: true
  scrape_interval: 30s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  follow_redirects: true
  relabel_configs:
  - source_labels: [job]
    separator: ;
    regex: (.*)
    target_label: __tmp_prometheus_job_name
    replacement: $1
    action: replace
    ...
  kubernetes_sd_configs:
  - role: endpoints  # Used discovery by endpoints
    kubeconfig_file: ""
    follow_redirects: true
    namespaces:
      names:
      - monitoring

PodMonitor is converted using a job with discovery by Pod.

There are no any specific user cases. Discover by Pod just allows to find all the pods that satisfy specified conditions and start metrics collection from them.

For example, the following PodMonitor config,

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  labels:
    app.kubernetes.io/component: monitoring
    app.kubernetes.io/managed-by: monitoring-operator
    app.kubernetes.io/name: alertmanager-pod-monitor
    k8s-app: alertmanager-pod-monitor
  name: monitoring-alertmanager-pod-monitor
  namespace: monitoring
spec:
  jobLabel: k8s-app
  namespaceSelector: {}
  podMetricsEndpoints:
    - interval: 30s
      port: web
      scheme: http
      scrapeTimeout: 10s
  selector:
    matchLabels:
      app: alertmanager

converts in the following Prometheus job.

- job_name: podMonitor/monitoring/monitoring-alertmanager-pod-monitor/0
  honor_timestamps: true
  scrape_interval: 30s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  follow_redirects: true
  relabel_configs:
  - source_labels: [job]
    separator: ;
    regex: (.*)
    target_label: __tmp_prometheus_job_name
    replacement: $1
    action: replace
    ...
  kubernetes_sd_configs:
  - role: pod  # Used discovery by pods
    kubeconfig_file: ""
    follow_redirects: true
    namespaces:
      names:
      - monitoring

ServiceMonitor¶

The ServiceMonitor Custom Resource (CR) allows to provide how metrics should be collected from a microservice by Prometheus.

ServiceMonitor is used when:

You have to configure metrics collection from microservices which have Service.
You have to configure metrics collection from some microservices under one Service.

For example, the ServiceMonitor can look like the following.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: kubelet-service-monitor
  labels:
    k8s-app: kubelet-service-monitor
    app.kubernetes.io/name: kubelet-service-monitor
    app.kubernetes.io/component: monitoring  # Mandatory label
spec:
  endpoints:
  - honorLabels: true
    interval: 30s
    port: https-metrics
    scheme: https
  jobLabel: k8s-app
  namespaceSelector:
    matchNames:
    - monitoring
  selector:
    matchLabels:
      k8s-app: kubelet

In the above example, the metrics are collected with the following settings:

The metrics are collected with a job with a k8s-app label.
The metrics are collected from all pods with a k8s-app: kubelet label.
The metrics are collected from all discovered pods from the port with the https-metrics name and with interval 30s.

The following schema illustrates how prometheus-operator discovers ServiceMonitor, processes it, and applies the configuration to Prometheus:

ServiceMonitor

For more information about all available fields in ServiceMonitor, refer to the official documentation at https://github.com/prometheus-operator/prometheus-operator/blob/v0.79.2/Documentation/api.md#servicemonitor.

ServiceMonitor and `container` label¶

If you want to see the label container in your metrics collected by Prometheus/VictoriaMetrics you must expose the port used for metrics not only in Service but in the Pod. Otherwise, all metrics won't contain this label.

It means that you need to expose the port:

In Kubernetes Service

kind: Service
apiVersion: v1
...
spec:
  ports:
    - name: metrics
      protocol: TCP
      port: 9900
      targetPort: 9900

In the Pod

kind: Pod
apiVersion: v1
...
spec:
  containers:
    - name: xxx
      ports:
        - name: metrics
          containerPort: 9900
          protocol: TCP

Why it's needed?

The official Prometheus documentation: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#endpoints tells us that discovery by the endpoint will add the meta label __meta_kubernetes_pod_container_name only in case:

If the endpoints belong to a service, all labels of the role: service discovery are attached
For all targets backed by a pod, all labels of the role: pod discovery are attached

The official Kubernetes documentation doesn't contain anything about roles for endpoints. So it seems that this concept is in Prometheus Kubernetes SD.

So it should work as follows:

Service allows you to expose ports independently has the pod already exposed ports or not
So for Prometheus Kubernetes SD exist two cases:
If the port is exposed in Service but isn't exposed in Pod - it seems that in case Prometheus will use role: service
If the port is exposed in Service and it is exposed in Pod - it seems that in case Prometheus will use role: pod

In the case, when the port in Pod wasn't exposed, Prometheus attached to meta labels information about the Service that doesn't contain the container name. In all cases, Prometheus can collect metrics from endpoints. Discovery type affects only the list of meta labels and result label list.

PodMonitor¶

The PodMonitor Custom Resource (CR) allows to provide how metrics should be collected from a microservice by Prometheus.

PodMonitor is used when:

You have to configure the metrics collection from a pod directly without Service.

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  labels:
    app.kubernetes.io/name: monitoring-nginx-ingress-pod-monitor
    app.kubernetes.io/component: monitoring  # Mandatory label
  name: monitoring-nginx-ingress-pod-monitor
spec:
  jobLabel: k8s-app
  namespaceSelector: 
    any: true
  podMetricsEndpoints:
    - interval: 30s
      port: prometheus
      scheme: http
  selector:
    matchLabels:
      app.kubernetes.io/name: ingress-nginx

In the above example, the metrics are collected with the following settings:

The metrics are collected with a job with a k8s-app label.
The metrics are collected from all the pods with a app.kubernetes.io/name: ingress-nginx label.
The metrics are collected from all discovered pods from the port with a http name and with interval 30s

The following schema illustrates how prometheus-operator discovers PodMonitor, processes it, and applies the configuration to Prometheus:

PodMonitor

For more information about all available fields in PodMonitor, refer to the official documentation at https://github.com/prometheus-operator/prometheus-operator/blob/v0.79.2/Documentation/api.md#podmonitor.

Probe¶

The Probe Custom Resource (CR) allows to provide a list of static endpoints, or configure ingresses discovery to the Prometheus config and make checks for them using the Blackbox exporter.

This custom resource was created because the already existing resources such as ServiceMonitor and PodMonitor do not allow to specify more that one endpoint. However, for the Blackbox exporter, it is a usual case when it is required to specify some static URLs to check them.

Note: Probe works only when the Blackbox exporter is deployed. If it is absent, you can configure Probe to use any external Blackbox exporter.

Probe is used when:

You have to configure the metrics collection about the availability of any static URLs.
You have to configure the metrics collection about the availability of any Ingresses.

Note: In the following example, use the http_2xx module. But this module should be pre-configured in the Blackbox exporter to use it.

apiVersion: monitoring.coreos.com/v1
kind: Probe
metadata:
  name: blackbox-ingress-probe
  labels:
    app.kubernetes.io/component: monitoring  # Mandatory label
spec:
  jobName: http-get
  interval: 30s
  module: http_2xx
  prober:
    url: blackbox-exporter.monitoring.svc:9115
    scheme: http
    path: /probe
  targets:
    ingress:
      selector:
        matchLabels:
          name: prometheus
      namespaceSelector:
        matchNames:
          - monitoring

Or with static URLs:

Note: Please keep in mind that URLs should be accessible for Blackbox exporter to correct probe.

apiVersion: monitoring.coreos.com/v1
kind: Probe
metadata:
  name: blackbox-static-urls-probe
  labels:
    app.kubernetes.io/component: monitoring  # Mandatory label
spec:
  jobName: http-get
  interval: 30s
  module: http_2xx
  prober:
    url: blackbox-exporter.monitoring.svc:9115
    scheme: http
    path: /probe
  targets:
    staticConfig:
      static:
        - 'http://example.com'
        - 'https://example.com'
        - 'http://google.com'
        - 'https://google.com'

For more information about all available fields in Probe, refer to the official documentation at https://github.com/prometheus-operator/prometheus-operator/blob/v0.79.2/Documentation/api.md#probe.

AlertmanagerConfig¶

Note: This resource is described in the user section because it is possible to add the settings by a user. The Alertmanager configuration looks as an operation for admin users. Be careful what you specify in the settings and do not explicitly specify passwords or other sensitive data.

The AlertmanagerConfig Custom Resource (CR) allows you to add settings in the Alertmanager configuration by separated parts. It allows to add settings for Alertmanager as route settings and new receivers by a user or with a component. In case of using VictoriaMetrics VMAlertmanagerConfig, route and receivers are mandatory fields otherwise your inhibit rules won't be added into VictoriaMetrics configuration secret, and therefore they won't be applied at all!

AlertmanagerConfig is used when:

You have to configure the alerts grouping.
You have to configure the notification channels to send alerts.

For example, the AlertmanagerConfig can look like the following.

apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
  name: slack-config-example
  labels:
    app.kubernetes.io/component: monitoring  # Mandatory label
spec:
  route:
    groupBy: ['job']
    groupWait: 30s
    groupInterval: 5m
    repeatInterval: 12h
    receiver: 'slack-example'
  receivers:
  - name: 'slack-example'
    slackConfig:
    - channel: <channel_id>
      apiURL:
        name: 'slack-config'
        key: 'apiUrl'
---
apiVersion: v1
kind: Secret
type: Opaque
metadata:
  name: slack-config
data:
  apiUrl: <base64_encoded>  # For example in base64 can be encode "https://slack.com/app_redirect?app=...."

For more information about AlertmanagerConfig and its examples, refer to the following official documentation:

Prometheus-operator API at https://github.com/prometheus-operator/prometheus-operator/blob/v0.79.2/Documentation/api.md#alertmanagerconfig
Configure Alertmanager at https://github.com/prometheus-operator/prometheus-operator/blob/v0.79.2/Documentation/user-guides/alerting.md
Configure VMAlertmanager at https://docs.victoriametrics.com/operator/resources/vmalertmanager/?highlight=alertmanager

CustomScaleMetricRule¶

The CustomScaleMetricRule Custom Resource (CR) allows to provision of custom metrics that should be used in Metric API and will be available for Horizontal Pod Autoscaler (HPA).

This CR was designed for cases when the application wants to use custom metrics in HPA. But by default, prometheus-adapter allows adding new metrics into Metrics API only to cluster-admin users. So we designed this type of resource which allows the operator to discover a list of custom metrics and add them automatically.

To configure metrics that will expose as custom metrics in Metrics API you can use the same syntax as in prometheus-adapter: https://github.com/kubernetes-sigs/prometheus-adapter/blob/master/docs/config-walkthrough.md#configuring-the-adapter

For example:

apiVersion: monitoring.qubership.org/v1alpha1
kind: CustomScaleMetricRule
metadata:
  name: {{ .Values.name }}-custom-metric-rule
  labels:
    app.kubernetes.io/component: monitoring
spec:
  rules:
  - seriesQuery: '{__name__=~"prometheus_example_app_load",namespace!="",pod!=""}'
    name:
      matches: "prometheus_example_app_load"
    resources:
      overrides:
        namespace:
          resource: "namespace"
        pod:
          resource: "pod"
    metricsQuery: '<<.Series>>{<<.LabelMatchers>>}'

GrafanaDashboard¶

The GrafanaDashboard Custom Resource (CR) allows to provide Grafana dashboards into Grafana. The dashboard should be in the JSON format, which you can get from the Grafana UI using export.

All GrafanaDashboard custom resources are discovered by the grafana-operator. It collects them, processes, reads the JSON from custom resources and uses Grafana's Dashboard API to create or update the dashboard in an attached Grafana instance. For more information, refer to https://grafana.com/docs/grafana/latest/http_api/dashboard/.

GrafanaDashboard is used when:

You have to provide a Grafana dashboard with your service, as a component's part.

You can use the following options to provide the dashboard using the GrafanaDashboard CR:

JSON inline into CR
GrafanaDashboard refer to ConfigMap which contains JSON
GrafanaDashboard contains dashboard id from grafana.com
GrafanaDashboard contains URL for download dashboard

GrafanaDashboard Known Issues¶

Currently, one major issue is known related to the dashboard JSON size.

All Kubernetes resources (default such as ConfigMap, and custom such as GrafanaDashboard) that you created are stored in Etcd.

Etcd has a limit of 1 MB (megabyte) for each stored resource. It means that no resource can have a total size greater than 1 MB.

A typical dashboard has a size of about 100-300 KB. So the total size of GrafanaDashboard is about 100-300 KB.

In the first case, kubectl generates a special annotation.

"kubectl.kubernetes.io/last-applied-configuration": "{...}"

The annotation contains a serialized representation of the last processes resource.

It means that the total size of resource processes in the first case (kubectl apply -f ...) has a size x2 compared to the original resource. For example, the GrafanaDashboard resource can have a size of about 200-600 KB or more.

To solve this problem:

Decrease the size of the dashboard.
Upload the dashboard to any storage (for example Nexus) and refer to it by a URL in GrafanaDashboard. For more information, see Dashboard by URL.

Inline JSON¶

This option allows to provide a Grafana dashboard's JSON inline into a custom resource.

apiVersion: integreatly.org/v1alpha1
kind: GrafanaDashboard
metadata:
  name: inline-json-dashboard
  labels:
    app.kubernetes.io/component: monitoring  # Mandatory label
spec:
  # Dashboard will insert directly into custom resource
  json: |+
    { ... }

Dashboard in ConfigMap¶

This option allows to reference from the GrafanaDashboard CR to ConfigMap.

For example,

---
kind: ConfigMap
apiVersion: v1
metadata:
  name: dashboard-in-cm  # ConfigMap name
data:
  # Key with data in ConfigMap
  dashboard: |+
    { ... }

---
apiVersion: integreatly.org/v1alpha1
kind: GrafanaDashboard
metadata:
  name: dashboard-from-cm
  labels:
    app.kubernetes.io/component: monitoring  # Mandatory label
spec:
  json: '' # Mandatory field from CRD
  configMapRef:
    name: dashboard-in-cm    # Need specify ConfigMap name
    key: dashboard           # Need specify key in ConfigMap

Dashboard from Grafana Official Site¶

This option allows to download the dashboard using the ID from official site, https://grafana.com.

Warning: To use such configuration, you must have access from cloud (from grafana-operator) to the official site, https://grafana.com.

For example,

apiVersion: integreatly.org/v1alpha1
kind: GrafanaDashboard
metadata:
  name: dashboard-from-grafana-com
  labels:
    app.kubernetes.io/component: monitoring  # Mandatory label
spec:
  json: '' # Mandatory field from CRD
  grafanaCom:
    id: 9614     # ID of dashboard on grafana.com
    revision: 1  # Dashboard's revision

The grafana-operator downloads the dashboard in this example, https://grafana.com/grafana/dashboards/9614.

Dashboard by URL¶

This option allows to download the dashboard using a URL from any server.

Note: This option can allow to solve issues faced with a large GrafanaDashboard size. For more details, see GrafanaDashboard Known Issues.

For example,

apiVersion: integreatly.org/v1alpha1
kind: GrafanaDashboard
metadata:
  name: helm-example-dashboard-by-url
  labels:
    app.kubernetes.io/component: monitoring  # Mandatory label
spec:
  json: '' # Mandatory field from CRD
  # Dashboard will download by specified URL and after insert into custom resource
  # in filed 'json'. So result custom resource will contains both fields and 'url' and 'json'
  url: "http://any-host.org/any/path/to/dashboard.json"

Useful examples of Users Extensions¶

This section contains examples of how to configure Monitoring Custom Resources in various cases.

AlertmanagerConfig for Email¶

This example shows how you can configure Alertmanager to send a notification by an email.

Note: This example is expected to be used on an STMP server like Outlook. It has not been tested on other SMTP servers.

Example of AlertmanagerConfig,

apiVersion: v1
kind: Secret
type: Opaque
metadata:
  name: email-config
data:
  authPassword: <base64_encoded>
---
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
  name: email-config
  labels:
    app.kubernetes.io/component: monitoring  # Mandatory label
spec:
  route:
    groupBy: ['job']
    groupWait: 30s
    groupInterval: 5m
    repeatInterval: 12h
    receiver: 'email-qubership'
  receivers:
  - name: 'email-qubership'
    emailConfigs:
    - smarthost: 'testmail.qubership.com:25'
      authUsername: <integration_smtp_user>
      authPassword:
        name: 'email-config'
        key: 'authPassword'
      tlsConfig:
        insecureSkipVerify: true
      sendResolved: true
      from: alertmanager@testmail.qubership.com
      to: <user_to_send_notifications>

AlermanagerConfig for Rocket.Chat¶

This example shows how you can configure Alertmanager to send a notification in Rocket.Chat.

Note: This example is expected to be used on the Rocket.Chat server, https://rc.qubership.com/plt. If you want to use another instance, you have to change the URL.

The URL with the token to send alerts can be copied from Rocket.Chat.

Administration -> Integrations -> New integration (or select already exists) -> Webhook URL

Example of AlertmanagerConfig,

apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
  name: rocketchat-config
  labels:
    app.kubernetes.io/component: monitoring  # Mandatory label
spec:
  route:
    groupBy: ['job']
    groupWait: 30s
    groupInterval: 5m
    repeatInterval: 12h
    receiver: 'rocketchat-qubership'
  receivers:
  - name: 'rocketchat-qubership'
    webhookConfig:
    - url: 'https://rc.qubership.com/plt/hooks/<token>'
      sendResolved: true

You have to also add a custom script in the integration. It will process the JSON, which sends the Alertmanager and converts it to the formatted message in Rocket.Chat.

Script for integration

class Script {
    process_incoming_request({
        request
    }) {
        console.log(request.content);

        var alertColor = "warning";
        if (request.content.status == "resolved") {
            alertColor = "good";
        } else if (request.content.status == "firing") {
            alertColor = "danger";
        }

        let finFields = [];
        for (i = 0; i < request.content.alerts.length; i++) {
            var endVal = request.content.alerts[i];
            var elem = {
                title: "alertname: " + endVal.labels.alertname,
                value: "*instance:* " + endVal.labels.instance,
                short: false
            };

            finFields.push(elem);

            if (!!endVal.annotations.summary) {
                finFields.push({
                    title: "summary",
                    value: endVal.annotations.summary
                });
            }

            if (!!endVal.annotations.severity) {
                finFields.push({
                    title: "severity",
                    value: endVal.annotations.severity
                });
            }

            if (!!endVal.annotations.description) {
                finFields.push({
                    title: "description",
                    value: endVal.annotations.description
                });
            }
        }

        return {
            content: {
                username: "Prometheus Alert",
                attachments: [{
                    color: alertColor,
                    title_link: request.content.externalURL,
                    title: "Prometheus notification",
                    fields: finFields
                }]
            }
        };

        return {
            error: {
                success: false
            }
        };
    }
}

mTLS Config¶

Usually, the TLS traffic wraps and terminates on Kubernetes or OpenShift balancers.

But some customers or on some internal environments can request to close all connections by TLS. In this case, for Prometheus also it is required to specify which certificates should be used to connect to the microservice.

ServiceMonitor, PodMonitor, and Probe already support TLS/mTLS.

This functionality is designed in such a way that certificates must be kept in secrets, which are located in the same namespace where deploy microservice and CR for Monitoring.

ServiceMonitor example with TLS config,

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  annotations:
    meta.helm.sh/release-name: monitoring-operator-monitoring
    meta.helm.sh/release-namespace: monitoring
  labels:
    app.kubernetes.io/component: monitoring  # Mandatory label
    app.kubernetes.io/name: etcd-service-monitor
  name: monitoring-etcd-service-monitor
  namespace: monitoring
spec:
  endpoints:
    - interval: 30s
      port: metrics
      scheme: https
      scrapeTimeout: 10s
      tlsConfig:
        ca:
          secret:
            key: etcd-client-ca.crt       # Name of key with CA cert from secret
            name: kube-etcd-client-certs  # Secret name
        cert:
          secret:
            key: etcd-client.crt          # Name of key with Cert from secret
            name: kube-etcd-client-certs  # Secret name
        keySecret:
          key: etcd-client.key          # Name of key with KeySecret from secret
          name: kube-etcd-client-certs  # Secret name
  jobLabel: k8s-app
  namespaceSelector:
    matchNames:
      - kube-system
  selector:
    matchLabels:
      k8s-app: etcd

---
kind: Secret
apiVersion: v1
metadata:
  name: kube-etcd-client-certs
  namespace: monitoring
  labels:
    app.kubernetes.io/component: monitoring
    app.kubernetes.io/name: kube-etcd-client-certs
data:
  etcd-client-ca.crt: <CA_cert_base64_encoded_content>
  etcd-client.crt: <Cert_base64_encoded_content>
  etcd-client.key: <Key_base64_encoded_content>
type: Opaque

PodMonitor example with TLS config,

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  labels:
    app.kubernetes.io/name: monitoring-nginx-ingress-pod-monitor
    app.kubernetes.io/component: monitoring  # Mandatory label
  name: monitoring-nginx-ingress-pod-monitor
spec:
  jobLabel: k8s-app
  namespaceSelector: 
    any: true
  podMetricsEndpoints:
    - interval: 30s
      port: prometheus
      scheme: http
      tlsConfig:
        ca:
          secret:
            key: nginx-client-ca.crt  # Name of key with CA cert from secret
            name: nginx-client-certs  # Secret name
        cert:
          secret:
            key: nginx-client.crt     # Name of key with Cert from secret
            name: nginx-client-certs  # Secret name
        keySecret:
          key: nginx-client.key     # Name of key with KeySecret from secret
          name: nginx-client-certs  # Secret name
  selector:
    matchLabels:
      app.kubernetes.io/name: ingress-nginx

---
kind: Secret
apiVersion: v1
metadata:
  name: nginx-client-certs
  namespace: monitoring
  labels:
    app.kubernetes.io/component: monitoring
    app.kubernetes.io/name: nginx-client-certs
data:
  nginx-client-ca.crt: <CA_cert_base64_encoded_content>
  nginx-client.crt: <Cert_base64_encoded_content>
  nginx-client.key: <Key_base64_encoded_content>
type: Opaque

Probe example with TLS config,

apiVersion: monitoring.coreos.com/v1
kind: Probe
metadata:
  name: blackbox-ingress-probe
  labels:
    app.kubernetes.io/component: monitoring  # Mandatory label
spec:
  jobName: http-get
  interval: 30s
  module: http_2xx
  prober:
    url: blackbox-exporter.monitoring.svc:9115
    scheme: http
    path: /probe
  targets:
    ingress:
      selector:
        matchLabels:
          name: prometheus
      namespaceSelector:
        matchNames:
          - monitoring
  tlsConfig:
    ca:
      secret:
        key: monitoring-ingress-client-ca.crt  # Name of key with CA cert from secret
        name: monitoring-ingress-client-certs  # Secret name
    cert:
      secret:
        key: monitoring-ingress-client.crt     # Name of key with Cert from secret
        name: monitoring-ingress-client-certs  # Secret name
    keySecret:
      key: monitoring-ingress-client.key     # Name of key with KeySecret from secret
      name: monitoring-ingress-client-certs  # Secret name
---
kind: Secret
apiVersion: v1
metadata:
  name: monitoring-ingress-client-certs
  namespace: monitoring
  labels:
    app.kubernetes.io/component: monitoring
    app.kubernetes.io/name: monitoring-ingress-client-certs
data:
  monitoring-ingress-client-ca.crt: <CA_cert_base64_encoded_content>
  monitoring-ingress-client.crt: <Cert_base64_encoded_content>
  monitoring-ingress-client.key: <Key_base64_encoded_content>
type: Opaque

Also, you can enable skip TLS check for all the above resources.

Warning: This option is strongly not recommended to be used on production environments.

To disable TLS,

tlsConfig:
  insecureSkipVerify: true

For example,

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
...
spec:
  endpoints:
    - interval: 30s
      ...
      tlsConfig:
        insecureSkipVerify: true

Admin Extension Points¶

This section contains description of all extension points which can be used by an Admin user. For example, an Admin can use these extension points to add custom configurations to Prometheus or Alertmanager.

Custom Resources in monitoring namespace¶

Currently, Monitoring provide various ways for an Admin to change global Prometheus and other component configurations, or change the list of deployment components.

PlatformMonitoring¶

TBD

Prometheus¶

Important: This custom resource is managed by monitoring-operator. All the manual changes revert in less than 30 seconds.

The Prometheus Custom Resource (CR) allows to provide settings for Prometheus that are applied during the start of a Prometheus instance.

This custom resource fully describes the Prometheus instance. Prometheus-operator discovers all Prometheus CRs in the namespace where Monitoring (and prometheus-operator in particular) is deployed.

The following Prometheus instances by logic are managed by prometheus-operator:

For each separated Prometheus, the CR creates a new Prometheus StatefulSet. This StatefulSet has a name generated by a prometheus-<CR_name>-0 mask. If more than one replicas are specified in the Prometheus CR, the replicas are also specified in the associated StatefulSet.
If the Prometheus CR is removed, then prometheus-operator also removes the associated StatefulSet.

For example,

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: k8s
  labels:
    app.kubernetes.io/name: prometheus
    app.kubernetes.io/component: monitoring
spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - prometheus
        topologyKey: "kubernetes.io/hostname"
  additionalScrapeConfigs:
    key: prometheus-additional.yaml
    name: additional-scrape-configs
  ...
  alerting:
    alertmanagers: []
  securityContext: {}
  enableAdminAPI: false
  serviceAccountName: monitoring-prometheus
  ...

For more information about all the available fields in Prometheus, refer to the official documentation at https://github.com/prometheus-operator/prometheus-operator/blob/v0.79.2/Documentation/api.md#prometheus.

Alertmanager¶

Important: This custom resource is managed by monitoring-operator. All the manual changes revert in less than 30 seconds.

The Alertmanager Custom Resource (CR) allows to provide settings for Alertmanager that are applied while starting the Alertmanager instance.

This custom resource fully describes the Alertmanager instance. Prometheus-operator discovers all Alertmanager CRs in the namespace where Monitoring (and prometheus-operator in particular) is deployed.

The following Prometheus instances by logic are managed by prometheus-operator:

For each separated Alertmanager, the CR creates a new Prometheus StatefulSet. This StatefulSet has the name generated by a alertmanager-<CR_name>-0 mask. If more than one replicas are specified in the Alertmanager CR, the replicas are also specified in the associated StatefulSet.
If the Alertmanager CR is removed, then prometheus-operator also removes the associated StatefulSet.

For example,

apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
  name: k8s
  labels:
    app.kubernetes.io/name: alertmanager
    app.kubernetes.io/component: monitoring
spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - alertmanager
        topologyKey: "kubernetes.io/hostname"
  image: "prom/alertmanager:v0.19.0"
  replicas: 1

For more information about all the available fields in Alertmanager, refer to the official documentation at https://github.com/prometheus-operator/prometheus-operator/blob/v0.79.2/Documentation/api.md#alertmanager.

Grafana¶

Important: This custom resource is managed by monitoring-operator. All the manual changes revert in less than 30 seconds.

The Grafana Custom Resource (CR) allows to provide settings for Grafana that are applied during the start of the Grafana instance.

GrafanaDataSource¶

The GrafanaDataSource Custom Resource (CR) allows to provide settings to create the Grafana DataSource during the start of the Grafana instance.

The DataSource in Grafana is a data provider that Grafana uses to fetch data from any source (for example, from Prometheus, ClickHouse, and so on).

Currently, GrafanaDataSource is discovered and read only in the namespace where Monitoring is deployed. You can deploy this custom resource in any other namespace, but grafana-operator will not discover it.

Note: This is a known limitation and the community already has plans to change this behavior. Refer to the following links for more details:

Issue - https://github.com/grafana/grafana-operator/issues/304

GrafanaDatasource is used when:

You want to use any custom DataSources that Monitoring does not create by default. For example, for ClickHouse or Graphite, and so on.

By default, Monitoring creates Grafana DataSources for:

Prometheus in cloud (regular cases)
Promxy (in case of Kubernetes DR)
Jaeger (in case when Jaeger integration is enabled)

Example of Prometheus datasource is as follows.

apiVersion: integreatly.org/v1alpha1
kind: GrafanaDataSource
metadata:
  name: platform-monitoring-prometheus
  labels:
    app.kubernetes.io/name: grafana
    app.kubernetes.io/component: monitoring  # Mandatory label
    app.kubernetes.io/managed-by: monitoring-operator
spec:
  name: platform-monitoring.yaml
  datasources:
  - access: proxy
    editable: true
    isDefault: true
    jsonData:
      timeInterval: 5s
      tlsSkipVerify: true
    name: Platform Monitoring Prometheus
    type: prometheus
    url: 'http://prometheus-operated:9090'
    version: 1

Secrets¶

In addition to custom resources for monitoring, you can also specify settings for Monitoring in special Secrets. These secrets allow you to add raw Prometheus config parts into a common Prometheus config.

Additional Scrape Config¶

This secret allows you to add raw Prometheus scrape config through prometheus-operator into total config.

This secret is created while deploying Monitoring in the namespace where it is deployed, and has the name, additional-scrape-config. This secret should not rewrite and update during the Monitoring deployment. However, it is removed if you uninstall Monitoring.

This secret allows you to add only job configs into scrape_configs. For more information, refer to https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config.

There are other additional_... secrets for:

Alert Relabel config
Alertmanager config

Important: This config is applied as it is without any validation. There are two options available if a mistake is made:

If Prometheus has still not started or should be restarted, it fails and writes an error in the logs about the incorrect config.
If Prometheus is already running, it cannot apply the new config (but still work with previous), and will write an error in the logs about the incorrect config.

The additional-scrape-config secret is used when:

You have to configure metrics collection from any static target in or outside the current Cloud.
You want to add a new parameter in the Prometheus job, but the parameter is not support by the current prometheus-operator version.

In other cases, you should use Custom Resources, such as:

ServiceMonitor
PodMonitor
Probe

Note that using this feature may expose the possibility to break upgrades of Prometheus. It is advised to review the Prometheus release notes to ensure that no incompatible scrape configs are going to break Prometheus after the upgrade.

An example of config to collect metrics from Graylog deployed on VM is as follows.

- job_name: graylog
  honor_timestamps: true
  scrape_interval: 30s
  scrape_timeout: 10s
  metrics_path: /api/plugins/org.graylog.plugins.metrics.prometheus/metrics
  scheme: http
  static_configs:
  - targets:
    - 1.2.3.4
  basic_auth:
    username: admin
    password: <secret>
  tls_config:
    insecure_skip_verify: true

Additional Alert Relabel Config¶

This secret allows you to add raw Prometheus Alert Relabel config through prometheus-operator into total config.

This secret is created while deploying Monitoring in the namespace where it is deployed, and has the name, additional-alertrelabel-configs. This secret should not rewrite and update during the Monitoring deployment. However, it is removed if you uninstall Monitoring.

This secret allows you to add only job configs into alert_relabel_configs. For more information, refer to https://prometheus.io/docs/prometheus/latest/configuration/configuration/#alert_relabel_configs.

There are other additional_... secrets for:

Additional Scrape Config
Alertmanager config

Important: This config is applied as it is without any validation. There are two options available if a mistake is made:

If Prometheus has still not started or should be restarted, it fails and writes an error in the logs about the incorrect config.
If Prometheus is already running, it cannot apply the new config (but still work with previous), and will write an error in the logs about the incorrect config.

The additional-alertrelabel-config secret is used when:

You need to configure relabel for Alertmanager.

Additional AlertManager Config¶

This secret allows you to add a raw Prometheus Alertmanager config through prometheus-operator into total config.

This secret is created while deploying Monitoring in the namespace where it is deployed, and has the name, additional-alertmanager-configs. This secret should not rewrite and update during the Monitoring deployment. However, it is removed if you uninstall Monitoring.

This secret allows you to add only job configs into alertmanager_config. For more information, refer to https://prometheus.io/docs/prometheus/latest/configuration/configuration/#alertmanager_config.

There are other additional_... secrets for:

Additional Scrape Config
Alert Relabel config

Important: This config is applied as it is without any validation. There are two options available if a mistake is made:

If Prometheus has still not started or should be restarted, it fails and writes an error in the logs about the incorrect config.
If Prometheus is already running, it cannot apply the new config (but still work with previous), and will write an error in the logs about the incorrect config.

The additional-alertmanager-config secret is used when:

You need to configure any specific settings for Alertmanager.

In other cases, you should use Custom Resource:

AlertmanagerConfig

Configuration¶

Frequently used Installation Options¶

Use PV as Storage¶

Storage with Dynamic Provisioning¶

Manual Storage Provisioning¶

Specify custom requests and limits¶

Override SecurityContext¶

Integration with external systems¶

Metrics Collectors and Long Term Storages¶

Alerting Systems¶

Dashboards¶

User Extension Points¶

Custom Resources in application namespaces¶

Difference between ServiceMonitor and PodMonitor¶

ServiceMonitor¶

ServiceMonitor and container label¶

PodMonitor¶

Probe¶

AlertmanagerConfig¶

CustomScaleMetricRule¶

GrafanaDashboard¶

GrafanaDashboard Known Issues¶

Inline JSON¶

Dashboard in ConfigMap¶

Dashboard from Grafana Official Site¶

Dashboard by URL¶

Useful examples of Users Extensions¶

AlertmanagerConfig for Email¶

AlermanagerConfig for Rocket.Chat¶

mTLS Config¶

Admin Extension Points¶

Custom Resources in monitoring namespace¶

PlatformMonitoring¶

Prometheus¶

Alertmanager¶

Grafana¶

GrafanaDataSource¶

Secrets¶

Additional Scrape Config¶

Additional Alert Relabel Config¶

Additional AlertManager Config¶

ServiceMonitor and `container` label¶