This document describes kubelet metrics list and how to collect them.
Metrics¶
kubelet already exposes its metrics in Prometheus format and doesn't require using specific exporters.
Name | Metrics Port | Metrics Endpoint | Need Exporter? | Auth? | Is Exporter Third Party? |
---|---|---|---|---|---|
Prometheus (kubelet) | 10250 |
/metrics |
No | Require, RBAC | N/A |
Prometheus (cAdvisor) | 10250 |
/metrics/cadvisor |
No | Require, RBAC | N/A |
How to Collect¶
Metrics expose on port 10250
and endpoint /metrics
for kubelet and /metrics/cadvisor
for cadvisor metrics.
It requires kubernetes authentication which placed in /var/run/secrets/kubernetes.io/serviceaccount/token
.
Before configure Prometheus or VMAgent to collect these metrics you need to create a Service that will route to
kubelet
. It can be created in any namespace. In case of using the prometheus-operator
it automatically create
a Service in a namespace where will be deploy the Prometheus.
It config:
apiVersion: v1
kind: Service
metadata:
name: kubelet
labels:
k8s-app: kubelet
spec:
ports:
- name: https-metrics
protocol: TCP
port: 10250
targetPort: 10250
- name: http-metrics
protocol: TCP
port: 10255
targetPort: 10255
- name: cadvisor
protocol: TCP
port: 4194
targetPort: 4194
clusterIP: None
clusterIPs:
- None
type: ClusterIP
sessionAffinity: None
ipFamilies:
- IPv4
- IPv6
ipFamilyPolicy: RequireDualStack
Config ServiceMonitor
for prometheus-operator
to collect kubelet metrics:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: kubelet-service-monitor
labels:
k8s-app: kubelet-service-monitor
app.kubernetes.io/name: kubelet-service-monitor
app.kubernetes.io/component: monitoring
app.kubernetes.io/managed-by: monitoring-operator
spec:
endpoints:
- interval: 30s
scrapeTimeout: 10s
metricRelabelings:
- sourceLabels: ['pod_name']
targetLabel: pod
regex: (.+)
- sourceLabels: ['container_name']
targetLabel: container
regex: (.+)
- action: labeldrop
regex: pod_name
- action: labeldrop
regex: container_name
- regex: 'kubelet_running_pods'
replacement: 'kubelet_running_pod_count'
sourceLabels: ['__name__']
targetLabel: __name__
- regex: 'kubelet_running_containers'
replacement: 'kubelet_running_container_count'
sourceLabels: ['__name__']
targetLabel: __name__
relabelings: []
bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
honorLabels: true
port: https-metrics
scheme: https
tlsConfig:
caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecureSkipVerify: true
- interval: 30s
scrapeTimeout: 10s
metricRelabelings:
- sourceLabels: ['pod_name']
targetLabel: pod
regex: (.+)
- sourceLabels: ['container_name']
targetLabel: container
regex: (.+)
- action: labeldrop
regex: pod_name
- action: labeldrop
regex: container_name
- regex: 'kubelet_running_pods'
replacement: 'kubelet_running_pod_count'
sourceLabels: ['__name__']
targetLabel: __name__
- regex: 'kubelet_running_containers'
replacement: 'kubelet_running_container_count'
sourceLabels: ['__name__']
targetLabel: __name__
relabelings: []
bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
honorLabels: true
path: /metrics/cadvisor
port: https-metrics
scheme: https
tlsConfig:
caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecureSkipVerify: true
jobLabel: k8s-app
selector:
matchLabels:
k8s-app: kubelet
To collect (or just to check) metrics manually you can use the following command:
curl -v -k -L -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" https://<kube_node_ip_or_dns>:10250/metrics
You can't use wget
because it doesn't allow to add headers for authorization.
Metrics List¶
Metrics list of kubelet
you can find in files:
Volumes usage metrics¶
Some kubelet metrics kubelet_volume_stats_*
may not work properly because get information from different CSI drivers
of volumes.
The most CSI drivers, or Provisioners support collecting of the metrics by default. Some of its have ability to turn it on.
Built-in storages¶
There are some type of volumes that built-in in the Kubernetes.
Storage Type | Support metrics | Turned on by default | How to turn on collecting |
---|---|---|---|
HostPath | yes | yes | Can't be disabled |
ConfigMap | no | no | Metrics are not supported |
Secret | no | no | Metrics are not supported |
Projected | no | no | Metrics are not supported |
EmptyDir | no | no | Metrics are not supported |
CSI Drivers¶
There is a list of the CSI drivers and information about support the metrics of volume statistics collecting:
NOTE: Some CSI Drivers can expose only some of kubelet_volume_stats_*
metrics. For example, AWS EFS
expose
only kubelet_volume_stats_used_bytes
(capacity
and available
metrics are not available). So please keep in
mind it and verify before using.
Storage Type | CSI Driver | Support metrics | Turned on by default | How to turn on collecting |
---|---|---|---|---|
AWS EBS | ebs.csi.aws.com |
yes | yes | - |
AWS EFS | efs.csi.aws.com |
yes | no | Set in helm values.yaml controller.volMetricsOptIn=true set --vol-metrics-opt-in=true in DaemonSet sci node |
Azure Blob | blob.csi.azure.com |
yes | no | Set in helm values.yaml feature.enableGetVolumeStats=true , or flag in DaemonSet sci node container's args --enable-get-volume-stats=true |
Azure Disk | disk.csi.azure.com |
yes | yes | - |
Azure File | file.csi.azure.com |
yes | yes | Set in helm values.yaml feature.enableGetVolumeStats=true , or flag in DaemonSet sci node container's args --enable-get-volume-stats=true |
Ceph RBD | rbd.csi.ceph.com |
yes | yes | - |
CephFS | cephfs.csi.ceph.com |
yes | yes | - |
Cinder | cinder.csi.openstack.org |
yes | yes | - |
GCE Persistent Disk | pd.csi.storage.gke.io |
yes | yes | - |
Google Cloud Filestore | com.google.csi.filestore |
yes | yes | - |
Google Cloud Storage | gcs.csi.ofek.dev |
no | - | - |
NFS | nfs.csi.k8s.io |
yes | yes | - |
External Provisioners¶
Expect of built-in volume providers and CDI Drivers also exists External Provisioners:
Storage Type | Provisioner | Support metrics | Turned on by default | How to turn on collecting |
---|---|---|---|---|
LocalPath | rancher.io/local-path |
no | no | Currently not supported. See the issue volumes usage collecting |