Overall Cloud Status¶
Dashboard shows health status of applications are deployed into cloud platform, k8s/OpenShift nodes, applications are deployed out of cloud.
Tags¶
k8shealth
Panels¶
Kubernetes overview¶
| Name | Description | Thresholds | Repeat |
|---|---|---|---|
| API server status | Shows status of Kubernetes API server. | Default: Mode: absolute Level 1: 1 |
|
| API servers | Shows number of API servers. | Default: Mode: absolute Level 1: 2 Level 2: 3 |
|
| API server requests | Shows count of requests to API server, requests per minute. | ||
| API server errors | Shows errors in requests to API server. | Default: Mode: absolute Level 1: 1 Level 2: 3 |
|
| ETCD status | Show status of etcd cluster. May contain no data for PaaS clouds. | ||
| ETCD servers | Shows number of active ETCD servers. May contain no data for PaaS clouds. | Default: Mode: absolute Level 1: 1 Level 2: 3 |
|
| ETCD requests | Shows number of requests per second to ETCD servers. May contain no data for PaaS clouds. | Default: Mode: absolute Level 1: 1 Level 2: 500 |
|
| ETCD server request error | Shows percent of error requests to ETCD server. May contain no data for PaaS clouds. | Default: Mode: absolute Level 1: 1 Level 2: 3 |
|
| API server nodes status | Shows status of each API server in the cluster. 1 - OK, 0 - Problem | ||
| API server failed requests | Shows errors in requests to API server, operations per minute. | ||
| Etcd nodes status | Shows status of each etcd pod in the cluster. 1 - OK, 0 - Problem. May contain no data for PaaS clouds. |
||
| ETCD failed requests | Shows number of errors per minute in requests to ETCD server. May contain no data for PaaS clouds. | ||
| Total CPU usage | Shows overall CPU usage | Default: Mode: absolute Level 1: 75 Level 2: 90 |
|
| Total Memory usage | Shows overall RAM usage for all nodes against total available RAM on all nodes. | Default: Mode: absolute Level 1: 75 Level 2: 90 |
|
| Total Filesystem usage | Shows summary file system usage on Kubernetes cluster nodes | Default: Mode: absolute Level 1: 75 Level 2: 90 |
|
| Used cores | Show used cores for cloud in cores (1 core = 1000 millicores) | ||
| Total cores | Show total cores available for cloud | ||
| Used memory | Show total used memory for cloud | ||
| Total memory | Show total available memory for cloud | ||
| Used space | Show sum by used space for directories and files on all nodes in cloud where fstype == xfs | ext.. It means that all FS like tmpfs, rootfs will be exclude from value. |
||
| Total space | Show total available space for directories and files on all nodes in cloud where fstype == xfs | ext.. It means that all FS like tmpfs, rootfs will be exclude from value. |
||
| Number of nodes | Shows number of active Kubernetes cluster nodes | Default: Mode: absolute Level 1: 1 Level 2: 3 |
|
| Nodes Unavailable | Shows number of unavailable nodes. | Default: Mode: absolute Level 1: 1 |
|
| Running Pods | Shows the total number of running pods in cluster. Show only pods with status = ready |
||
| Running containers | Shows the total number of running containers in pods. |
Node health¶
| Name | Description | Thresholds | Repeat |
|---|---|---|---|
| Node State | Show running state of all nodes in selected cloud | Default: Mode: absolute Level 1: 1 Level 2: 2 |
|
| Nodes Overview | Shows cluster nodes overview: * Node Uptime * Total available CPU and RAM on node * Overall resources usage on node * Can be grouped by node_label |
Default: Mode: absolute Level 1: 80 |
Applications health¶
| Name | Description | Thresholds | Repeat |
|---|---|---|---|
| Total pods | Shows the total number of pods in cluster. | ||
| Running pods | Shows the total number of running pods in cluster. Show only pods with status = ready |
||
| Not runnning pods | Shows the total number of not running / not healthy pods in cluster. | Default: Mode: absolute Level 1: 1 |
|
| Help | Show information about panels in current section | ||
| Not Healthy Pods | Show information about the reason the container is currently in waiting or terminated state | Default: Mode: absolute Level 1: 80 |
|
| Last Terminated Status | Show information about the last reason the container was in terminated state | Default: Mode: absolute Level 1: 80 |