Overall Cloud Status¶
Dashboard shows health status of applications are deployed into cloud platform, k8s/OpenShift nodes, applications are deployed out of cloud.
Tags¶
k8s
health
Panels¶
Kubernetes overview¶
Name | Description | Thresholds | Repeat |
---|---|---|---|
API server status | Shows status of Kubernetes API server. | Default: Mode: absolute Level 1: 1 |
|
API servers | Shows number of API servers. | Default: Mode: absolute Level 1: 2 Level 2: 3 |
|
API server requests | Shows count of requests to API server, requests per minute. | ||
API server errors | Shows errors in requests to API server. | Default: Mode: absolute Level 1: 1 Level 2: 3 |
|
ETCD status | Show status of etcd cluster. May contain no data for PaaS clouds. | ||
ETCD servers | Shows number of active ETCD servers. May contain no data for PaaS clouds. | Default: Mode: absolute Level 1: 1 Level 2: 3 |
|
ETCD requests | Shows number of requests per second to ETCD servers. May contain no data for PaaS clouds. | Default: Mode: absolute Level 1: 1 Level 2: 500 |
|
ETCD server request error | Shows percent of error requests to ETCD server. May contain no data for PaaS clouds. | Default: Mode: absolute Level 1: 1 Level 2: 3 |
|
API server nodes status | Shows status of each API server in the cluster. 1 - OK, 0 - Problem | ||
API server failed requests | Shows errors in requests to API server, operations per minute. | ||
Etcd nodes status | Shows status of each etcd pod in the cluster. 1 - OK, 0 - Problem. May contain no data for PaaS clouds. |
||
ETCD failed requests | Shows number of errors per minute in requests to ETCD server. May contain no data for PaaS clouds. | ||
Total CPU usage | Shows overall CPU usage | Default: Mode: absolute Level 1: 75 Level 2: 90 |
|
Total Memory usage | Shows overall RAM usage for all nodes against total available RAM on all nodes. | Default: Mode: absolute Level 1: 75 Level 2: 90 |
|
Total Filesystem usage | Shows summary file system usage on Kubernetes cluster nodes | Default: Mode: absolute Level 1: 75 Level 2: 90 |
|
Used cores | Show used cores for cloud in cores (1 core = 1000 millicores) | ||
Total cores | Show total cores available for cloud | ||
Used memory | Show total used memory for cloud | ||
Total memory | Show total available memory for cloud | ||
Used space | Show sum by used space for directories and files on all nodes in cloud where fstype == xfs | ext. . It means that all FS like tmpfs , rootfs will be exclude from value. |
||
Total space | Show total available space for directories and files on all nodes in cloud where fstype == xfs | ext. . It means that all FS like tmpfs , rootfs will be exclude from value. |
||
Number of nodes | Shows number of active Kubernetes cluster nodes | Default: Mode: absolute Level 1: 1 Level 2: 3 |
|
Nodes Unavailable | Shows number of unavailable nodes. | Default: Mode: absolute Level 1: 1 |
|
Running Pods | Shows the total number of running pods in cluster. Show only pods with status = ready |
||
Running containers | Shows the total number of running containers in pods. |
Node health¶
Name | Description | Thresholds | Repeat |
---|---|---|---|
Node State | Show running state of all nodes in selected cloud | Default: Mode: absolute Level 1: 1 Level 2: 2 |
|
Nodes Overview | Shows cluster nodes overview: * Node Uptime * Total available CPU and RAM on node * Overall resources usage on node * Can be grouped by node_label |
Default: Mode: absolute Level 1: 80 |
Applications health¶
Name | Description | Thresholds | Repeat |
---|---|---|---|
Total pods | Shows the total number of pods in cluster. | ||
Running pods | Shows the total number of running pods in cluster. Show only pods with status = ready |
||
Not runnning pods | Shows the total number of not running / not healthy pods in cluster. | Default: Mode: absolute Level 1: 1 |
|
Help | Show information about panels in current section | ||
Not Healthy Pods | Show information about the reason the container is currently in waiting or terminated state | Default: Mode: absolute Level 1: 80 |
|
Last Terminated Status | Show information about the last reason the container was in terminated state | Default: Mode: absolute Level 1: 80 |