Skip to content

Alertmanager Overview

Dashboard showing Prometheus Alertmanager metrics for observing status of the cluster and possible debugging.

Tags

  • alertmanager
  • self-monitor

Panels

Overview

Name Description Thresholds Repeat
Number of instances Number of Alertmanager instances Default:
Mode: absolute
Level 1: 80

Cluster size Number of peers in the Alertmanager cluster. Default:
Mode: absolute
Level 1: 80

Instance versions and up time Table containing list of Alertmanager instances showing it's version, up time, last reload time and if it was successful. Default:
Mode: absolute
Level 1: 80

Number of active alerts Current number of active alerts. Default:
Mode: absolute
Level 1: 80

Number of suppressed alerts Current number of suppressed alerts. Default:
Mode: absolute
Level 1: 80

Number of active silences Current number of active silences. Default:
Mode: absolute
Level 1: 80

Notifications

Name Description Thresholds Repeat
Notifications sent from $instance Number of sent notifications to distinct integrations such as PagerDuty, Slack and so on. On negative axis are displayed failed notifications. Default:
Mode: absolute
Level 1: 80

Panel is multiplied by parameter instance
Notification durations per integration on $instance Duration of notification sends in 0.99 and 0.9 quantiles per integration. Default:
Mode: absolute
Level 1: 80

Panel is multiplied by parameter instance

Alerts

Name Description Thresholds Repeat
Active alerts in $instance Number of alerts by state such as active, suppressed etc. Default:
Mode: absolute
Level 1: 80

Panel is multiplied by parameter instance
Received alerts by status for $instance Number of received alerts from Prometheus by status firing on positive axis and resolved on negative axis. Default:
Mode: absolute
Level 1: 80

Panel is multiplied by parameter instance

Cluster members

Name Description Thresholds Repeat
Cluster health score for $instance Shows cluster score representing cluster health. From Hashicorps official documentation:
> This metric describes a node's perception of its own health based on how well it is meeting the soft real-time requirements of the protocol. This metric ranges from 0 to 8, where 0 indicates "totally healthy".

For more info see https://www.consul.io/docs/agent/telemetry.html#cluster-health
Default:
Mode: absolute
Level 1: 80

Panel is multiplied by parameter instance
Cluster members count on $instance Shows gossip cluster members count in time and failing peers in case of any in red color. Default:
Mode: absolute
Level 1: 80

Panel is multiplied by parameter instance
Cluster peers left/joined on $instance On positive axis shows number of peers that joined the cluster and on negative axis number of peers that left the cluster. Default:
Mode: absolute
Level 1: 80

Panel is multiplied by parameter instance
Cluster reconnections on $instance On positive axis is number of attempts to reconnect the cluster. On negative axis if number of failed attempts. Default:
Mode: absolute
Level 1: 80

Panel is multiplied by parameter instance
Cluster messages count on $instance On positive axis is number of sent cluster messages by type update or full_state and on negative axis the same for received messages. Default:
Mode: absolute
Level 1: 80

Panel is multiplied by parameter instance
Cluster messages size on $instance On positive axis is size of sent cluster messages by type update or full_state and on negative axis the same for received messages. Default:
Mode: absolute
Level 1: 80

Panel is multiplied by parameter instance
Cluster messages queue on $instance On positive axis is number of queued cluster messages and on negative axis number of pruned messages. Default:
Mode: absolute
Level 1: 80

Panel is multiplied by parameter instance

Gossip messages

Name Description Thresholds Repeat
Count of oversized gossip messages on $instance Number of oversized gossip message sent by $instance Default:
Mode: absolute
Level 1: 80

Panel is multiplied by parameter instance
Number of propagated gossip messages on $instance Number of propagated gossip messages on $instance Default:
Mode: absolute
Level 1: 80

Panel is multiplied by parameter instance
Duration of oversized gossip messages on $instance Duration of oversized gossip message requests on $instance Default:
Mode: absolute
Level 1: 80

Panel is multiplied by parameter instance

Nflog

Name Description Thresholds Repeat
Nf log queries count for $instance Number of log queries for $instance Default:
Mode: absolute
Level 1: 80

Panel is multiplied by parameter instance
Nf log query duration for $instance Nf log query duration for $instance Default:
Mode: absolute
Level 1: 80

Panel is multiplied by parameter instance
Nf log snapshot size for $instance Snapshot size for NF log on $instance Default:
Mode: absolute
Level 1: 80

Panel is multiplied by parameter instance
Nf log Go GC time for $instance Duration of the last notification log garbage collection cycle for $instance Default:
Mode: absolute
Level 1: 80

Panel is multiplied by parameter instance
Nf log snapshot duration for $instance The duration of creating snapshoot fo NF log on $instance Default:
Mode: absolute
Level 1: 80

Panel is multiplied by parameter instance

Silences

Name Description Thresholds Repeat
Silences count by state on $instance Number of silences by state on $instance Default:
Mode: absolute
Level 1: 80

Panel is multiplied by parameter instance
Silences query count on $instance Number of silences queries and errors on $instance Default:
Mode: absolute
Level 1: 80

Panel is multiplied by parameter instance
Silences query duration on $instance Silences query duration on $instance Default:
Mode: absolute
Level 1: 80

Panel is multiplied by parameter instance
Silences snapshot size on $instance Size of the silence snapshot in bytes on $instance Default:
Mode: absolute
Level 1: 80

Panel is multiplied by parameter instance
Silences snapshot duration on $instance Duration of the silence snapshot on $instance Default:
Mode: absolute
Level 1: 80

Panel is multiplied by parameter instance
Silences GC duraton on $instance Duration of the silence garbage collection cycle
on $instance
Default:
Mode: absolute
Level 1: 80

Panel is multiplied by parameter instance