Victoria Metrics / VmSingle¶

Overview for single node VictoriaMetrics v1.79.0 or higher

Tags¶

self-monitor
victoriametrics
vmsingle

Panels¶

Stats¶

Name	Description	Thresholds
Version	TODO: Fill panel description
Total datapoints	How many datapoints are in storage
Disk space usage	Total amount of used disk space
Bytes per point	Average disk usage per datapoint.
Allowed memory	Total size of allowed memory via flag `-memory.allowedPercent`
Uptime	TODO: Fill panel description	Default: Mode: absolute Level 1: 1800
Active series	Shows the number of active time series with new data points inserted during the last hour. High value may result in ingestion slowdown. See more details here https://docs.victoriametrics.com/FAQ.html#what-is-an-active-time-series
Min free disk space	The minimum free disk space left
Available CPU	Total number of available CPUs for VM process	Default: Mode: absolute Level 1: 80
Available memory	Total size of available memory for VM process

Performance¶

Name	Description	Thresholds	Repeat
Requests rate ($instance)	* `` - unsupported query path `/write` - insert into VM * `/metrics` - query VM system metrics * `/query` - query instant values * `/query_range` - query over a range of time * `/series` - match a certain label set * `/label/{}/values` - query a list of label values (variables mostly)
Query duration ($instance)	The less time it takes is better. * `` - unsupported query path `/write` - insert into VM * `/metrics` - query VM system metrics * `/query` - query instant values * `/query_range` - query over a range of time * `/series` - match a certain label set * `/label/{}/values` - query a list of label values (variables mostly)
Active time series ($instance)	Shows the number of active time series with new data points inserted during the last hour. High value may result in ingestion slowdown. See following link for details:
Requests error rate ($instance)	* `` - unsupported query path `/write` - insert into VM * `/metrics` - query VM system metrics * `/query` - query instant values * `/query_range` - query over a range of time * `/series` - match a certain label set * `/label/{}/values` - query a list of label values (variables mostly)
Concurrent flushes on disk ($instance)	Shows how many ongoing insertions (not API /write calls) on disk are taking place, where: * `max` - equal to number of CPUs; * `current` - current number of goroutines busy with inserting rows into underlying storage. Every successful API /write call results into flush on disk. However, these two actions are separated and controlled via different concurrency limiters. The `max` on this panel can't be changed and always equal to number of CPUs. When `current` hits `max` constantly, it means storage is overloaded and requires more CPU.
Rows read per query ($instance)	99^th percentile of number of raw samples read per query.
Series read per query ($instance)	99^th percentile of number of series read per query.
Rows scanned per series ($instance)	99^th percentile of number of raw samples scanner per query. This number can exceed number of RowsReadPerQuery if `step` query arg passed to /api/v1/query_range is smaller than the lookbehind window set in square brackets of rollup function. For example, if `increase(some_metric[1h])` is executed with the `step=5m`, then the same raw samples on a hour time range are scanned `1h/5m=12` times. See this article for details.
Rows read per series ($instance)	99^th percentile of number of raw samples read per queried series.

Caches¶

Name	Description	Thresholds	Repeat
Cache size ($instance)	VictoriaMetrics stores various caches in RAM. Memory size for these caches may be limited with -`memory.allowedPercent` flag. Line `max allowed` shows max allowed memory size for cache.
Cache usage % ($instance)	Shows the percentage of used cache size from the allowed size by type. Values close to 100% show the maximum potential utilization. Values close to 0% show that cache is underutilized.
Cache hit ratio ($instance)	Cache hit ratio shows cache efficiency. The higher is hit rate the better.

Storage¶

Name	Description	Thresholds	Repeat
Datapoints ingestion rate ($instance)	How many datapoints are inserted into storage per second
Storage full ETA ($instance)	Shows the time needed to reach the 100% of disk capacity based on the following params: * free disk space; * row ingestion rate; * dedup rate; * compression. Use this panel for capacity planning in order to estimate the time remaining for running out of the disk space.
Datapoints ($instance)	Shows how many datapoints are in the storage and what is average disk usage per datapoint.
Pending datapoints ($instance)	How many datapoints are in RAM queue waiting to be written into storage. The number of pending data points should be in the range from 0 to `2*<ingestion_rate>`, since VictoriaMetrics pushes pending data to persistent storage every second.
Disk space usage - datapoints ($instance)	Shows amount of on-disk space occupied by data points and the remaining disk space at `-storageDataPath`
LSM parts ($instance)	Data parts of LSM tree. High number of parts could be an evidence of slow merge performance - check the resource utilization. * `indexdb` - inverted index * `storage/small` - recently added parts of data ingested into storage(hot data) * `storage/big` - small parts gradually merged into big parts (cold data)
Disk space usage - index ($instance)	Shows amount of on-disk space occupied by inverted index.
Active merges ($instance)	The number of on-going merges in storage nodes. It is expected to have high numbers for `storage/small` metric.
Rows ignored ($instance)	Shows how many rows were ignored on insertion due to corrupted or out of retention timestamps.
Merge speed ($instance)	The number of rows merged per second by storage nodes.
Logging rate ($instance)	Shows the rate of logging the messages by their level. Unexpected spike in rate is a good reason to check logs.

Troubleshooting¶

Name	Description	Thresholds	Repeat
Churn rate ($instance)	Shows the rate and total number of new series created over last 24h. High churn rate tightly connected with database performance and may result in unexpected OOM's or slow queries. It is recommended to always keep an eye on this metric to avoid unexpected cardinality "explosions". The higher churn rate is, the more resources required to handle it. Consider to keep the churn rate as low as possible. Good references to read: * https://www.robustperception.io/cardinality-is-key * https://www.robustperception.io/using-tsdb-analyze-to-investigate-churn-and-cardinality
IndexDB items rate ($instance)	Shows the rate of adding new items to the index. It should correlate with `Slow inserts` and `Churn rate` graphs and could help to determine the pressure on indexdb.
Slow inserts ($instance)	The percentage of slow inserts comparing to total insertion rate during the last 5 minutes. The less value is better. If percentage remains high (>10%) during extended periods of time, then it is likely more RAM is needed for optimal handling of the current number of active time series. In general, VictoriaMetrics requires ~1KB or RAM per active time series, so it should be easy calculating the required amounts of RAM for the current workload according to capacity planning docs. But the resulting number may be far from the real number because the required amounts of memory depends on may other factors such as the number of labels per time series and the length of label values.
Slow queries rate ($instance)	Slow queries rate according to `search.logSlowQueryDuration` flag, which is `5s` by default.
Cache usage % ($instance)	Shows the percentage of used cache size from the allowed size by type. Values close to 100% show the maximum potential utilization. Values close to 0% show that cache is underutilized.
Labels limit exceeded ($instance)	VictoriaMetrics limits the number of labels per each metric with `-maxLabelsPerTimeseries` command-line flag. This prevents from ingesting metrics with too many labels. The value of `maxLabelsPerTimeseries` must be adjusted for your workload. When limit is exceeded (graph is > 0) - extra labels are dropped, which could result in unexpected identical time series.

Resource usage¶

Name	Description	Thresholds	Repeat
Memory usage ($instance)	TODO: Fill panel description
CPU ($instance)	TODO: Fill panel description
Open FDs ($instance)	Panel shows the number of open file descriptors in the OS. Reaching the limit of open files can cause various issues and must be prevented. See how to change limits here https://medium.com/@muhammadtriwibowo/set-permanently-ulimit-n-open-files-in-ubuntu-4d61064429a
Disk writes/reads ($instance)	Shows the number of bytes read/write from the storage layer.
Goroutines ($instance)	TODO: Fill panel description
GC duration ($instance)	Shows avg GC duration
Threads ($instance)	TODO: Fill panel description
TCP connections ($instance)	TODO: Fill panel description
TCP connections rate ($instance)	TODO: Fill panel description