Skip to content

Prometheus Self Monitoring

There is no description on this dashboard

Tags

  • k8s
  • prometheus
  • self-monitor

Panels

Name Description Thresholds Repeat
Uptime Show the prometheus instance uptime Default:
Mode: absolute
Level 1: 3600

Last Successful Config Reload Show last successful config reload
CPU usage Shows CPU usage for the prometheus instance Default:
Mode: absolute
Level 1: 0.85
Level 2: 1.2
Level 3: 2

Resident memory The amount of memory the Prometheus process is using from the kernel Default:
Mode: absolute
Level 1: 6442450944
Level 2: 9663676416

Data The number of bytes that are currently used for local storage by all blocks
Oldest data Show the earliest saved data
Count of Jobs Shows the number of active jobs Default:
Mode: absolute
Level 1: 300
Level 2: 500

Series Shows the number of active time series Default:
Mode: absolute
Level 1: 500000
Level 2: 1000000

Points per second Show number of metrics per second stored by Prometheus as taken from the last $interval.
Build, uptime and runtime instance info Prometheus instance overview Default:
Mode: absolute
Level 1: 80

Main info

Name Description Thresholds Repeat
Number of time series Shows the number of active time series
$quantile quantile of interval length between scrapes per job Actual amount of time between target scrapes
Prometheus errors in $interval Shows the total number of Prometheus errors, including different non-default behavior.

For example, if you are sending not all the metrics to the remoteStorage, then the panel will also display the number of unsent points (prometheus_remote_storage_samples_dropped_total).

Exporters/Targets

Name Description Thresholds Repeat
Ready jobs/ All jobs Shows the number of jobs(Ready/All)
kubelet Show number of targets for kubelet Default:
Mode: absolute
Level 1: 1

node-exporter Show number of targets for node-exporter Default:
Mode: absolute
Level 1: 1

kube-state-metrics Show number of targets for kube-state-metrics Default:
Mode: absolute
Level 1: 1

etcd Show number of targets for etcd Default:
Mode: absolute
Level 1: 1

kube-apiserver Show number of targets for kube-apiserver Default:
Mode: absolute
Level 1: 1

nginx-ingress Show number of targets for nginx-ingress

* Works only in kubernetes
Default:
Mode: absolute
Level 1: 1

Scrape targets Show current number of targets in this scrape pool
Discovered targets Show current number of discovered targets

Cardinality

Name Description Thresholds Repeat
Number of series by job Show number of series by job
New series by job Show how many new series were created
Highest cardinality for $job Show 10 metrics with the highest cardinality for $job target Default:
Mode: absolute
Level 1: 1000
Level 2: 5000
Level 3: 10000

Panel is multiplied by parameter job
Count metrics for $job Show number of metrics for $job Default:
Mode: absolute
Level 1: 5000
Level 2: 10000
Level 3: 50000

Panel is multiplied by parameter job

Requests & queries

Name Description Thresholds Repeat
HTTP requests/s Average request duration for the last 1m.
HTTP request latency Latencies for HTTP requests
Time spent in HTTP requests/s Time spent in HTTP requests/s
HTTP request count by handler in $interval Counter of HTTP requests
$quantile quantile of HTTP request duration per handler HTTP request duration per handler
$quantile of request size by handler Request size by handler

Resources

Name Description Thresholds Repeat
CPU usage Shows CPU usage for the prometheus instance
Memory usage Resident and allocated memory size in bytes.
Allocations per second in $interval Total number of bytes allocated, even if freed

TSDB stats

Name Description Thresholds Repeat
Oldest data Show the earliest saved data
Size of the storage The number of bytes that are currently used for local storage by all blocks

Query engine & API

Name Description Thresholds Repeat
Number of concurent queries and it's limit The current and max number of queries being executed or waiting
$quantile quantile of query engine evaluation duration per slice Query engine evaluation duration per slice
Number of queries on remote read API The current number of remote read queries being executed or waiting

Rule evaluation

Name Description Thresholds Repeat
Percentage rule group evaluation duration from the rule group evaluation interval (top 20) Percentage rule group evaluation duration from the rule group evaluation interval (top 20)
$quantile quantile of rule evaluation duration Rule evaluation duration (for quantile => 0.5)
Number of rules per group (top 20 groups) Number of rules for the first 20 groups

Alerting

Name Description Thresholds Repeat
Notification queue size and capacity The number of alert notifications in the queue. And the capacity of the alert notifications queue.
Number of sent notifictions per alertmanager in $interval Total number of alerts sent
$quantile of notification latency per alertmanager Latency quantiles for sending alert notifications

Scraping

Name Description Thresholds Repeat
Scrape Interval Intervals between scrapes
Number of target errors Number of scrapes that hit the sample limit and were rejected. And the number of samples rejected due to duplicate timestamps but different values, not being out of the expected order, timestamp falling outside of the time bounds.
Metadata Cache Size The number of bytes that are currently used for storing metric metadata in the cache

Service discovery

Name Description Thresholds Repeat
Number of discovered targets per Type and config Current number of discovered targets
Service discovery sync count by mechanism in $interval Service discovery sync count by mechanism
$quantile quantile of refresh duration per SD mechanism

Compaction and retention

Name Description Thresholds Repeat
$quantile quantile of compaction duration Duration of compaction runs
Number of vertical/horizontal compactions in last $interval Number of vertical/horizontal compactions
Number of time/size retention cutoffs in $interval The number of times that blocks were deleted because the maximum time limit or number of bytes was exceeded.

WAL

Name Description Thresholds Repeat
$quantile quantile of WAL fsync duration Duration of WAL fsync ( for quantile => 0.5 )
Number of completed pages in $interval Number of completed pages
Average duration of WAL truncation TODO: Fill panel description

Remote storage

Name Description Thresholds Repeat
Number of queries to remote storage per client Number of remote read queries
Number of samles successfuly sent to remote storage per queue Total number of samples successfully sent to remote storage.
$quantile quantile of batch send call duration to remote storage Call duration to remote storage
Number of used shards for concurrent sending of data to remote storage and it's capacity Number of used shards for concurrent sending of data to remote storage and it's capacity

Go stats

Name Description Thresholds Repeat
Number of gorutines Number of goroutines that currently exist.
Duration of Go garbage collection Max duration of garbage collection cycles.
Go system memory allocations Go system memory allocations

Network

Name Description Thresholds Repeat
Accepted/closed inbound connections per listener Accepted/closed inbound connections per listener of a given name.
Established/closed outbound connections per dialer Established/closed outbound connections per dialer a given name. Default:
Mode: absolute
Level 1: 80

Rule Group: $RuleGroup

Row Rule Group: $RuleGroup is multiplied by parameter RuleGroup

Name Description Thresholds Repeat
$RuleGroup: Duration The interval and last duration evaluation of a rule group.
$RuleGroup: Rules The number of rules