Kubernetes / Etcd¶

Etcd Dashboard for Prometheus metrics scraper

Tags¶

k8s
etcd

Panels¶

Name	Description	Thresholds
Etcd has a leader?	Indicates whether the members have a leader. Yes - all members have a leader. No - one or more members haven't a leader.	Default: Mode: absolute Level 1: 1
Leader name	Indicate which server is the leader	Default: Mode: absolute Level 1: 80
Disk operations wal_fsync duration	The latency distributions of fsync called by wal
Count of leader changes per $interval	Show the number of leader changes per $interval
The sum of leader changes	Show the sum of leader changes per $interval	Default: Mode: absolute Level 1: 80
Etcd nodes status	Shows status of each etcd pod in the cluster.	Default: Mode: absolute Level 1: 80
The sum rate of failed proposals seen	Show the sum rate of failed proposals seen per $interval	Default: Mode: absolute Level 1: 80
RPC Rate	Total number of gRPC's started and failed on the server
Active Streams	Show active streams
DB Size	Total size of the underlying database
Disk Sync Duration	The latency distributions of fsync called by wal and backend
Memory	Resident memory size
Client Traffic In	The total number of bytes received from grpc clients
Client Traffic Out	The total number of bytes sent to grpc clients
Peer Traffic In	The total number of bytes received from peers
Peer Traffic Out	The total number of bytes sent to peers
Raft Proposals	The total number of failed proposals seen The current number of pending proposals to commit The total number of consensus proposals applied, committed
Total Leader Elections Per Day	The number of leader changes seen per day
The total number of consensus proposals committed	proposals_committed_total records the total number of consensus proposals committed. This gauge should increase over time if the cluster is healthy. Several healthy members of an etcd cluster may have different total committed proposals at once. This discrepancy may be due to recovering from peers after starting, lagging behind the leader, or being the leader and therefore having the most commits. It is important to monitor this metric across all the members in the cluster; a consistently large lag between a single member and its leader indicates that member is slow or unhealthy. proposals_applied_total records the total number of consensus proposals applied. The etcd server applies every committed proposal asynchronously. The difference between proposals_committed_total and proposals_applied_total should usually be small (within a few thousands even under high load). If the difference between them continues to rise, it indicates that the etcd server is overloaded. This might happen when applying expensive queries like heavy range queries or large txn operations.
Proposals pending	indicates how many proposals are queued to commit. Rising pending proposals suggests there is a high client load or the member cannot commit proposals.
Disks operations	The sum rate latency distributions of fsync called by wal and backend
Slow Operations	The count of slow apply and read index operations
Network	The rate number of bytes received and sent from grpc clients
Snapshot Duration	Abnormally high snapshot duration (snapshot_save_total_duration_seconds) indicates disk issues and might cause the cluster to be unstable.
Snapshot Fsync Duration	The latency distributions of fsync called by snap
Snapshot DB File Duration	The latency distributions of saving and fsyncing `.snap.db` files.