Victoria Metrics / VmSingle¶
Overview for single node VictoriaMetrics v1.79.0 or higher
Tags¶
self-monitor
victoriametrics
vmsingle
Panels¶
Stats¶
Name | Description | Thresholds | Repeat |
---|---|---|---|
Version | TODO: Fill panel description | ||
Total datapoints | How many datapoints are in storage | ||
Disk space usage | Total amount of used disk space | ||
Bytes per point | Average disk usage per datapoint. | ||
Allowed memory | Total size of allowed memory via flag -memory.allowedPercent |
||
Uptime | TODO: Fill panel description | Default: Mode: absolute Level 1: 1800 |
|
Active series | Shows the number of active time series with new data points inserted during the last hour. High value may result in ingestion slowdown. See more details here https://docs.victoriametrics.com/FAQ.html#what-is-an-active-time-series |
||
Min free disk space | The minimum free disk space left | ||
Available CPU | Total number of available CPUs for VM process | Default: Mode: absolute Level 1: 80 |
|
Available memory | Total size of available memory for VM process |
Performance¶
Name | Description | Thresholds | Repeat |
---|---|---|---|
Requests rate ($instance) | * * - unsupported query path* /write - insert into VM* /metrics - query VM system metrics* /query - query instant values* /query_range - query over a range of time* /series - match a certain label set* /label/{}/values - query a list of label values (variables mostly) |
||
Query duration ($instance) | The less time it takes is better. * * - unsupported query path* /write - insert into VM* /metrics - query VM system metrics* /query - query instant values* /query_range - query over a range of time* /series - match a certain label set* /label/{}/values - query a list of label values (variables mostly) |
||
Active time series ($instance) | Shows the number of active time series with new data points inserted during the last hour. High value may result in ingestion slowdown. See following link for details: |
||
Requests error rate ($instance) | * * - unsupported query path* /write - insert into VM* /metrics - query VM system metrics* /query - query instant values* /query_range - query over a range of time* /series - match a certain label set* /label/{}/values - query a list of label values (variables mostly) |
||
Concurrent flushes on disk ($instance) | Shows how many ongoing insertions (not API /write calls) on disk are taking place, where: * max - equal to number of CPUs;* current - current number of goroutines busy with inserting rows into underlying storage.Every successful API /write call results into flush on disk. However, these two actions are separated and controlled via different concurrency limiters. The max on this panel can't be changed and always equal to number of CPUs. When current hits max constantly, it means storage is overloaded and requires more CPU. |
||
Rows read per query ($instance) | 99th percentile of number of raw samples read per query. | ||
Series read per query ($instance) | 99th percentile of number of series read per query. | ||
Rows scanned per series ($instance) | 99th percentile of number of raw samples scanner per query. This number can exceed number of RowsReadPerQuery if step query arg passed to /api/v1/query_range is smaller than the lookbehind window set in square brackets of rollup function. For example, if increase(some_metric[1h]) is executed with the step=5m , then the same raw samples on a hour time range are scanned 1h/5m=12 times. See this article for details. |
||
Rows read per series ($instance) | 99th percentile of number of raw samples read per queried series. |
Caches¶
Name | Description | Thresholds | Repeat |
---|---|---|---|
Cache size ($instance) | VictoriaMetrics stores various caches in RAM. Memory size for these caches may be limited with -memory.allowedPercent flag. Line max allowed shows max allowed memory size for cache. |
||
Cache usage % ($instance) | Shows the percentage of used cache size from the allowed size by type. Values close to 100% show the maximum potential utilization. Values close to 0% show that cache is underutilized. |
||
Cache hit ratio ($instance) | Cache hit ratio shows cache efficiency. The higher is hit rate the better. |
Storage¶
Name | Description | Thresholds | Repeat |
---|---|---|---|
Datapoints ingestion rate ($instance) | How many datapoints are inserted into storage per second | ||
Storage full ETA ($instance) | Shows the time needed to reach the 100% of disk capacity based on the following params: * free disk space; * row ingestion rate; * dedup rate; * compression. Use this panel for capacity planning in order to estimate the time remaining for running out of the disk space. |
||
Datapoints ($instance) | Shows how many datapoints are in the storage and what is average disk usage per datapoint. | ||
Pending datapoints ($instance) | How many datapoints are in RAM queue waiting to be written into storage. The number of pending data points should be in the range from 0 to 2*<ingestion_rate> , since VictoriaMetrics pushes pending data to persistent storage every second. |
||
Disk space usage - datapoints ($instance) | Shows amount of on-disk space occupied by data points and the remaining disk space at -storageDataPath |
||
LSM parts ($instance) | Data parts of LSM tree. High number of parts could be an evidence of slow merge performance - check the resource utilization. * indexdb - inverted index* storage/small - recently added parts of data ingested into storage(hot data)* storage/big - small parts gradually merged into big parts (cold data) |
||
Disk space usage - index ($instance) | Shows amount of on-disk space occupied by inverted index. | ||
Active merges ($instance) | The number of on-going merges in storage nodes. It is expected to have high numbers for storage/small metric. |
||
Rows ignored ($instance) | Shows how many rows were ignored on insertion due to corrupted or out of retention timestamps. | ||
Merge speed ($instance) | The number of rows merged per second by storage nodes. | ||
Logging rate ($instance) | Shows the rate of logging the messages by their level. Unexpected spike in rate is a good reason to check logs. |
Troubleshooting¶
Name | Description | Thresholds | Repeat |
---|---|---|---|
Churn rate ($instance) | Shows the rate and total number of new series created over last 24h. High churn rate tightly connected with database performance and may result in unexpected OOM's or slow queries. It is recommended to always keep an eye on this metric to avoid unexpected cardinality "explosions". The higher churn rate is, the more resources required to handle it. Consider to keep the churn rate as low as possible. Good references to read: * https://www.robustperception.io/cardinality-is-key * https://www.robustperception.io/using-tsdb-analyze-to-investigate-churn-and-cardinality |
||
IndexDB items rate ($instance) | Shows the rate of adding new items to the index. It should correlate with Slow inserts and Churn rate graphs and could help to determine the pressure on indexdb. |
||
Slow inserts ($instance) | The percentage of slow inserts comparing to total insertion rate during the last 5 minutes. The less value is better. If percentage remains high (>10%) during extended periods of time, then it is likely more RAM is needed for optimal handling of the current number of active time series. In general, VictoriaMetrics requires ~1KB or RAM per active time series, so it should be easy calculating the required amounts of RAM for the current workload according to capacity planning docs. But the resulting number may be far from the real number because the required amounts of memory depends on may other factors such as the number of labels per time series and the length of label values. |
||
Slow queries rate ($instance) | Slow queries rate according to search.logSlowQueryDuration flag, which is 5s by default. |
||
Cache usage % ($instance) | Shows the percentage of used cache size from the allowed size by type. Values close to 100% show the maximum potential utilization. Values close to 0% show that cache is underutilized. |
||
Labels limit exceeded ($instance) | VictoriaMetrics limits the number of labels per each metric with -maxLabelsPerTimeseries command-line flag.This prevents from ingesting metrics with too many labels. The value of maxLabelsPerTimeseries must be adjusted for your workload.When limit is exceeded (graph is > 0) - extra labels are dropped, which could result in unexpected identical time series. |
Resource usage¶
Name | Description | Thresholds | Repeat |
---|---|---|---|
Memory usage ($instance) | TODO: Fill panel description | ||
CPU ($instance) | TODO: Fill panel description | ||
Open FDs ($instance) | Panel shows the number of open file descriptors in the OS. Reaching the limit of open files can cause various issues and must be prevented. See how to change limits here https://medium.com/@muhammadtriwibowo/set-permanently-ulimit-n-open-files-in-ubuntu-4d61064429a |
||
Disk writes/reads ($instance) | Shows the number of bytes read/write from the storage layer. | ||
Goroutines ($instance) | TODO: Fill panel description | ||
GC duration ($instance) | Shows avg GC duration | ||
Threads ($instance) | TODO: Fill panel description | ||
TCP connections ($instance) | TODO: Fill panel description | ||
TCP connections rate ($instance) | TODO: Fill panel description |