Victoria Metrics / VmAgent¶
Overview for VictoriaMetrics vmagent v1.80.0 or higher
Tags¶
self-monitor
victoriametrics
vmagent
Panels¶
Overview¶
Name | Description | Thresholds | Repeat |
---|---|---|---|
Samples scraped/s | Shows the rate of samples scraped from configured targets. | ||
Samples ingested/s | Shows the rate of ingested samples | ||
Scrape targets up | Shows total number of all configured scrape targets in state "up". See http://vmagent-host:8429/targets to get list of all targets. |
||
Scrape targets down | Shows total number of all configured scrape targets in state "down". See http://vmagent-host:8429/targets to get list of all targets. |
Default: Mode: absolute Level 1: 1 |
|
Log errors (30m) | Shows number of generated error messages in logs over last 30m. Non-zero value may be a sign of connectivity or missconfiguration errors. | Default: Mode: absolute Level 1: 1 |
|
Persistent queue size | Persistent queue size shows size of pending samples in bytes which hasn't been flushed to remote storage yet. Increasing of value might be a sign of connectivity issues. In such cases, vmagent starts to flush pending data on disk with attempt to send it later once connection is restored. |
Default: Mode: absolute Level 1: 10485760 |
|
TODO: Add panel name | TODO: Fill panel description | Default: Mode: absolute Level 1: 80 |
|
Uptime | TODO: Fill panel description | ||
Samples rate ($instance) | Shows in/out samples rate including push and pull models. The out-rate could be different to in-rate because of replication or additional timeseries added by vmagent for every scraped target. |
||
Requests rate ($instance) | Shows the rate of requests served by vmagent HTTP server. | ||
Errors rate ($instance) | Errors rate shows rate for multiple metrics that track possible errors in vmagent, such as network or parsing errors. | ||
Persistent queue size ($instance) to ($url) | Shows the persistent queue size of pending samples in bytes which hasn't been flushed to remote storage yet. Increasing of value might be a sign of connectivity issues. In such cases, vmagent starts to flush pending data on disk with attempt to send it later once connection is restored. Remote write URLs are hidden by default but might be unveiled once -remoteWrite.showURL is set to true. |
||
Data blocks dropped ($instance) to ($url) | Shows the rate of dropped data blocks in cases when remote storage replies with 400 Bad Request and 409 Conflict HTTP responses.See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1149 |
||
Persistent queue dropped rate ($instance) | Shows rate of dropped samples from persistent queue. VMagent drops samples from queue if in-memory and on-disk queues are full and it is unable to flush them to remote storage. The max size of on-disk queue is configured by -remoteWrite.maxDiskUsagePerURL flag. |
||
Rows dropped by relabeling ($instance) to ($url) | Shows the rate of dropped samples due to relabeling. Metric tracks drops for -remoteWrite.relabelConfig configuration only. |
||
Logging rate ($instance) | Shows the rate of logging the messages by their level. Unexpected spike in rate is a good reason to check logs. |
Scraping¶
Name | Description | Thresholds | Repeat |
---|---|---|---|
Scrape targets UP | TODO: Fill panel description | ||
Scrape targets DOWN | TODO: Fill panel description | ||
Scrape rate ($instance) | TODO: Fill panel description | ||
Scrape fails ($instance) | TODO: Fill panel description | ||
Scrape response size ($instance) | TODO: Fill panel description | ||
Scrape duration ($instance) | This panel uses MetricsQL and works only when VM is used as a datasource |
Ingestion¶
Name | Description | Thresholds | Repeat |
---|---|---|---|
Requests rate ($instance) | Shows the rate of write requests served by ingestserver (UDP, TCP connections) and HTTP server. | ||
Error rate ($instance) | Shows the rate of write errors in ingestserver (UDP, TCP connections) and HTTP server. | ||
Rows rate ($instance) | Shows the rate of parsed rows from write or scrape requests. | ||
Invalid rows rate ($instance) | Tracks the rate of dropped invalid rows because of errors while unmarshaling write requests. The exact errors messages will be printed in logs. |
Remote write¶
Name | Description | Thresholds | Repeat |
---|---|---|---|
Requests rate ($instance) to ($url) | Shows the rate of requests to configured remote write endpoints by url and status code. Remote write URLs are hidden by default but might be unveiled once -remoteWrite.showURL is set to true. |
||
Bytes write rate ($instance) | Shows the global rate for number of written bytes via remote write connections. | ||
Retry rate ($instance) to ($url) | Shows requests retry rate by url. Number of retries is unlimited but protected with delays up to 1m between attempts. Remote write URLs are hidden by default but might be unveiled once -remoteWrite.showURL is set to true. |
||
Connections ($instance) | Shows current number of established connections to remote write endpoints. |
||
Push duration ($instance) to ($url) | Shows the remote write request duration distribution in seconds. Value depends on block size, network quality and remote storage performance. | ||
Remote write connection saturation ($instance) | Shows saturation of every connection to remote storage. If the threshold of 90% is reached, then the connection is saturated (busy or slow) by more than 90%, so vmagent won't be able to keep up and can start buffering data. This usually means that -remoteWrite.queues command-line flag must be increased in order to increase the number of connections per each remote storage. |
||
Block size rows ($instance) | Shows the remote write request block size distribution in rows. | ||
Block size bytes ($instance) | Shows the remote write request block size distribution in bytes. | ||
Hourly series limit | Shows the current limit usage of unique series over an hourly period. Vmagent will start to drop series once the limit is reached. Please note, panel will be blank if remoteWrite.maxHourlySeries is not set. |
||
Daily series limit | Shows the current limit usage of unique series over a daily period. Vmagent will start to drop series once the limit is reached. Please note, panel will be blank if remoteWrite.maxDailySeries is not set. |
Troubleshooting¶
Name | Description | Thresholds | Repeat |
---|---|---|---|
Top 5 jobs by unique samples | Shows top 5 job by the number of new series registered by vmagent over the 5min range. These jobs generate the most of the churn rate. | ||
Top 5 instances by unique samples | Shows top 5 instances by the number of new series registered by vmagent over the 5min range. These instances generate the most of the churn rate. | ||
Persistent queue write saturation ($instance) | Shows saturation persistent queue for writes. If the threshold of 0.9sec is reached, then persistent is saturated by more than 90% and vmagent won't be able to keep up with flushing data on disk. In this case, consider to decrease load on the vmagent or improve the disk throughput. | ||
Persistent queue read saturation ($instance) | Shows saturation persistent queue for reads. If the threshold of 0.9sec is reached, then persistent is saturated by more than 90% and vmagent won't be able to keep up with reading data from the disk. In this case, consider to decrease load on the vmagent or improve the disk throughput. |
Resource usage¶
Name | Description | Thresholds | Repeat |
---|---|---|---|
CPU ($instance) | Shows the CPU usage percentage per vmagent instance. If you think that usage is abnormal or unexpected, pls file an issue and attach CPU profile if possible. |
||
Memory usage ($instance) | Amount of used memory If you think that usage is abnormal or unexpected, please file an issue and attach memory profile if possible. |
||
Disk writes/reads ($instance) | Shows the number of bytes read/write from the storage layer when vmagent has to buffer data on disk or read already buffered data. | ||
Network usage ($instance) | Network usage shows the bytes rate for data accepted by vmagent and pushed via remotewrite protocol. Discrepancies are possible because of different protocols used for ingesting, scraping and writing data. |
||
Open FDs ($instance) | Panel shows the percentage of open file descriptors in the OS per instance. Reaching the limit of open files (100%) can cause various issues and must be prevented. See how to change limits here https://medium.com/@muhammadtriwibowo/set-permanently-ulimit-n-open-files-in-ubuntu-4d61064429a |
||
Goroutines ($instance) | TODO: Fill panel description | ||
GC duration ($instance) | TODO: Fill panel description | ||
Threads ($instance) | TODO: Fill panel description |