Skip to content

Victoria Metrics / VmSingle

Overview for single node VictoriaMetrics v1.79.0 or higher

Tags

  • self-monitor
  • victoriametrics
  • vmsingle

Panels

Stats

Name Description Thresholds Repeat
Version TODO: Fill panel description
Total datapoints How many datapoints are in storage
Disk space usage Total amount of used disk space
Bytes per point Average disk usage per datapoint.
Allowed memory Total size of allowed memory via flag -memory.allowedPercent
Uptime TODO: Fill panel description Default:
Mode: absolute
Level 1: 1800

Active series Shows the number of active time series with new data points inserted during the last hour. High value may result in ingestion slowdown.

See more details here https://docs.victoriametrics.com/FAQ.html#what-is-an-active-time-series
Min free disk space The minimum free disk space left
Available CPU Total number of available CPUs for VM process Default:
Mode: absolute
Level 1: 80

Available memory Total size of available memory for VM process

Performance

Name Description Thresholds Repeat
Requests rate ($instance) * * - unsupported query path
* /write - insert into VM
* /metrics - query VM system metrics
* /query - query instant values
* /query_range - query over a range of time
* /series - match a certain label set
* /label/{}/values - query a list of label values (variables mostly)
Query duration ($instance) The less time it takes is better.
* * - unsupported query path
* /write - insert into VM
* /metrics - query VM system metrics
* /query - query instant values
* /query_range - query over a range of time
* /series - match a certain label set
* /label/{}/values - query a list of label values (variables mostly)
Active time series ($instance) Shows the number of active time series with new data points inserted during the last hour. High value may result in ingestion slowdown.

See following link for details:
Requests error rate ($instance) * * - unsupported query path
* /write - insert into VM
* /metrics - query VM system metrics
* /query - query instant values
* /query_range - query over a range of time
* /series - match a certain label set
* /label/{}/values - query a list of label values (variables mostly)
Concurrent flushes on disk ($instance) Shows how many ongoing insertions (not API /write calls) on disk are taking place, where:
* max - equal to number of CPUs;
* current - current number of goroutines busy with inserting rows into underlying storage.

Every successful API /write call results into flush on disk. However, these two actions are separated and controlled via different concurrency limiters. The max on this panel can't be changed and always equal to number of CPUs.

When current hits max constantly, it means storage is overloaded and requires more CPU.

Rows read per query ($instance) 99th percentile of number of raw samples read per query.
Series read per query ($instance) 99th percentile of number of series read per query.
Rows scanned per series ($instance) 99th percentile of number of raw samples scanner per query.

This number can exceed number of RowsReadPerQuery if step query arg passed to /api/v1/query_range is smaller than the lookbehind window set in square brackets of rollup function. For example, if increase(some_metric[1h]) is executed with the step=5m, then the same raw samples on a hour time range are scanned 1h/5m=12 times. See this article for details.
Rows read per series ($instance) 99th percentile of number of raw samples read per queried series.

Caches

Name Description Thresholds Repeat
Cache size ($instance) VictoriaMetrics stores various caches in RAM. Memory size for these caches may be limited with -memory.allowedPercent flag. Line max allowed shows max allowed memory size for cache.
Cache usage % ($instance) Shows the percentage of used cache size from the allowed size by type.
Values close to 100% show the maximum potential utilization.
Values close to 0% show that cache is underutilized.
Cache hit ratio ($instance) Cache hit ratio shows cache efficiency. The higher is hit rate the better.

Storage

Name Description Thresholds Repeat
Datapoints ingestion rate ($instance) How many datapoints are inserted into storage per second
Storage full ETA ($instance) Shows the time needed to reach the 100% of disk capacity based on the following params:
* free disk space;
* row ingestion rate;
* dedup rate;
* compression.

Use this panel for capacity planning in order to estimate the time remaining for running out of the disk space.

Datapoints ($instance) Shows how many datapoints are in the storage and what is average disk usage per datapoint.
Pending datapoints ($instance) How many datapoints are in RAM queue waiting to be written into storage. The number of pending data points should be in the range from 0 to 2*<ingestion_rate>, since VictoriaMetrics pushes pending data to persistent storage every second.
Disk space usage - datapoints ($instance) Shows amount of on-disk space occupied by data points and the remaining disk space at -storageDataPath
LSM parts ($instance) Data parts of LSM tree.
High number of parts could be an evidence of slow merge performance - check the resource utilization.
* indexdb - inverted index
* storage/small - recently added parts of data ingested into storage(hot data)
* storage/big - small parts gradually merged into big parts (cold data)
Disk space usage - index ($instance) Shows amount of on-disk space occupied by inverted index.
Active merges ($instance) The number of on-going merges in storage nodes. It is expected to have high numbers for storage/small metric.
Rows ignored ($instance) Shows how many rows were ignored on insertion due to corrupted or out of retention timestamps.
Merge speed ($instance) The number of rows merged per second by storage nodes.
Logging rate ($instance) Shows the rate of logging the messages by their level. Unexpected spike in rate is a good reason to check logs.

Troubleshooting

Name Description Thresholds Repeat
Churn rate ($instance) Shows the rate and total number of new series created over last 24h.

High churn rate tightly connected with database performance and may result in unexpected OOM's or slow queries. It is recommended to always keep an eye on this metric to avoid unexpected cardinality "explosions".

The higher churn rate is, the more resources required to handle it. Consider to keep the churn rate as low as possible.

Good references to read:
* https://www.robustperception.io/cardinality-is-key
* https://www.robustperception.io/using-tsdb-analyze-to-investigate-churn-and-cardinality
IndexDB items rate ($instance) Shows the rate of adding new items to the index. It should correlate with Slow inserts and Churn rate graphs and could help to determine the pressure on indexdb.
Slow inserts ($instance) The percentage of slow inserts comparing to total insertion rate during the last 5 minutes.

The less value is better. If percentage remains high (>10%) during extended periods of time, then it is likely more RAM is needed for optimal handling of the current number of active time series.

In general, VictoriaMetrics requires ~1KB or RAM per active time series, so it should be easy calculating the required amounts of RAM for the current workload according to capacity planning docs. But the resulting number may be far from the real number because the required amounts of memory depends on may other factors such as the number of labels per time series and the length of label values.
Slow queries rate ($instance) Slow queries rate according to search.logSlowQueryDuration flag, which is 5s by default.
Cache usage % ($instance) Shows the percentage of used cache size from the allowed size by type.
Values close to 100% show the maximum potential utilization.
Values close to 0% show that cache is underutilized.
Labels limit exceeded ($instance) VictoriaMetrics limits the number of labels per each metric with -maxLabelsPerTimeseries command-line flag.

This prevents from ingesting metrics with too many labels. The value of maxLabelsPerTimeseries must be adjusted for your workload.

When limit is exceeded (graph is > 0) - extra labels are dropped, which could result in unexpected identical time series.

Resource usage

Name Description Thresholds Repeat
Memory usage ($instance) TODO: Fill panel description
CPU ($instance) TODO: Fill panel description
Open FDs ($instance) Panel shows the number of open file descriptors in the OS.
Reaching the limit of open files can cause various issues and must be prevented.

See how to change limits here https://medium.com/@muhammadtriwibowo/set-permanently-ulimit-n-open-files-in-ubuntu-4d61064429a
Disk writes/reads ($instance) Shows the number of bytes read/write from the storage layer.
Goroutines ($instance) TODO: Fill panel description
GC duration ($instance) Shows avg GC duration
Threads ($instance) TODO: Fill panel description
TCP connections ($instance) TODO: Fill panel description
TCP connections rate ($instance) TODO: Fill panel description