Architecture¶
This document describes the detailed architecture of the Qubership Monitoring Operator, a Kubernetes operator that manages the deployment and configuration of a comprehensive monitoring stack. It covers the core components, their relationships, control flows, and integration points.
Overview¶
The Qubership Monitoring Operator serves as a centralized controller for managing multiple monitoring components within a Kubernetes environment. It orchestrates the deployment and configuration of Prometheus, VictoriaMetrics, Grafana, AlertManager, and various exporters to create a complete monitoring solution.
graph TB
subgraph "Kubernetes Cluster"
MO[Monitoring Operator]
PM[PlatformMonitoring CR]
TSDB[VictoriaMetrics OR Prometheus]
GRAF[Grafana]
GO[Grafana Operator]
AM[AlertManager OR VMAlert]
KSM[kube-state-metrics]
NE[node-exporter]
EXPORTERS[Various Exporters]
subgraph "Application Namespaces"
APP1[Application 1]
APP2[Application 2]
SM[ServiceMonitors]
end
end
MO -->|Manages| PM
PM -->|Configures| TSDB
PM -->|Configures| GRAF
PM -->|Configures| AM
GO -->|Manages| GRAF
TSDB -->|Scrapes| KSM
TSDB -->|Scrapes| NE
TSDB -->|Scrapes| EXPORTERS
TSDB -->|Scrapes| APP1
TSDB -->|Scrapes| APP2
GRAF -->|Queries| TSDB
SM -->|Configures| TSDB
Operator Architecture¶
The Monitoring Operator is the central component that manages the entire monitoring stack. It watches for and processes a custom resource called PlatformMonitoring
, which defines the desired state of the monitoring setup.
Operator Controller Pattern¶
The operator follows the Kubernetes operator pattern, using the controller-runtime library to watch for changes to the PlatformMonitoring
resource and reconcile the current state with the desired state.
graph LR
subgraph "Operator Controller"
WATCH[Watch Events]
RECONCILE[Reconcile Logic]
APPLY[Apply Changes]
end
CR[PlatformMonitoring CR] -->|Change Event| WATCH
WATCH --> RECONCILE
RECONCILE --> APPLY
APPLY -->|Creates/Updates| RESOURCES[K8s Resources]
Component Architecture¶
Time Series Databases¶
Prometheus Stack¶
The Prometheus stack includes Prometheus itself, AlertManager, and related components. The operator deploys and configures these components using the Prometheus Operator.
graph TB
subgraph "Prometheus Stack"
PO[Prometheus Operator]
PROM[Prometheus Server]
AM[AlertManager]
CR[Config Reloader]
subgraph "Monitoring Configuration"
SM[ServiceMonitor CRs]
PM[PodMonitor CRs]
PR[PrometheusRule CRs]
end
end
PO -->|Manages| PROM
PO -->|Manages| AM
SM -->|Configures| PROM
PM -->|Configures| PROM
PR -->|Configures| PROM
CR -->|Reloads Config| PROM
The Prometheus Operator handles the deployment and configuration of Prometheus and AlertManager instances. It automatically generates scrape configurations based on ServiceMonitor and PodMonitor custom resources.
VictoriaMetrics Integration¶
VictoriaMetrics can be used as an alternative or complement to Prometheus for storing metrics.
graph TB
subgraph "VictoriaMetrics Stack"
VMO[VM Operator]
VMAGENT[VMAgent]
VMSINGLE[VMSingle]
VMALERT[VMAlert]
VMAUTH[VMAuth]
VMALERTMGR[VMAlertManager]
end
VMO -->|Manages| VMAGENT
VMO -->|Manages| VMSINGLE
VMO -->|Manages| VMALERT
VMO -->|Manages| VMAUTH
VMO -->|Manages| VMALERTMGR
VMAGENT -->|Writes| VMSINGLE
VMALERT -->|Queries| VMSINGLE
VictoriaMetrics provides a similar but more resource-efficient alternative to Prometheus, with its own set of custom resources for configuration.
Grafana Stack¶
The Grafana stack is responsible for visualization of metrics collected by Prometheus or VictoriaMetrics.
graph TB
subgraph "Grafana Stack"
GO[Grafana Operator]
GRAF[Grafana Instance]
subgraph "Grafana Resources"
GD[GrafanaDashboard CRs]
GDS[GrafanaDataSource CRs]
end
end
GO -->|Manages| GRAF
GD -->|Provides| GRAF
GDS -->|Configures| GRAF
The Grafana Operator manages Grafana instances, datasources, and dashboards. It automatically discovers and applies GrafanaDashboard custom resources.
Custom Resource Architecture¶
The monitoring system uses various custom resources to configure its components:
PlatformMonitoring¶
This is the main custom resource that defines the overall monitoring setup. It's watched by the Monitoring Operator.
apiVersion: monitoring.qubership.org/v1alpha1
kind: PlatformMonitoring
metadata:
name: monitoring-stack
spec:
prometheus:
install: true
retention: "7d"
grafana:
install: true
persistence:
enabled: true
victoriametrics:
vmOperator:
install: true
ServiceMonitor and PodMonitor¶
These custom resources define what metrics should be collected by Prometheus or VictoriaMetrics:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: example-app
spec:
selector:
matchLabels:
app: example-app
endpoints:
- port: metrics
interval: 30s
PrometheusRule and AlertmanagerConfig¶
These resources define alerting and recording rules:
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: example-alerts
spec:
groups:
- name: example.rules
rules:
- alert: HighErrorRate
expr: rate(http_requests_total{status="500"}[5m]) > 0.1
for: 5m
labels:
severity: warning
annotations:
summary: "High error rate detected"
Metrics Collection Architecture¶
The system collects metrics from various sources:
graph TB
subgraph "Metric Sources"
KUBELET[Kubelet]
KSM[kube-state-metrics]
NE[node-exporter]
APPS[Applications]
EXPORTERS[External Exporters]
end
subgraph "Time Series Database"
TSDB[VictoriaMetrics OR Prometheus]
end
KUBELET -->|/metrics| TSDB
KSM -->|/metrics| TSDB
NE -->|/metrics| TSDB
APPS -->|/metrics| TSDB
EXPORTERS -->|/metrics| TSDB
Cloud Provider Integration¶
The system integrates with various cloud providers for metrics collection:
graph TB
subgraph "Cloud Providers"
AWS[AWS CloudWatch]
AZURE[Azure Monitor]
GCP[Google Cloud Operations]
end
subgraph "Cloud Exporters"
CWE[CloudWatch Exporter]
PROMITOR[Promitor Agent]
SDE[Stackdriver Exporter]
end
subgraph "Monitoring Stack"
TSDB[VictoriaMetrics OR Prometheus]
end
AWS -->|API| CWE
AZURE -->|API| PROMITOR
GCP -->|API| SDE
CWE -->|/metrics| TSDB
PROMITOR -->|/metrics| TSDB
SDE -->|/metrics| TSDB
The monitoring operator can deploy specialized exporters for each cloud platform to collect metrics from cloud services and make them available to Prometheus/VictoriaMetrics.
Deployment Architecture¶
The system is deployed using Helm charts with a set of configurable values:
graph TB
subgraph "Deployment Process"
HELM[Helm Chart]
VALUES[values.yaml]
subgraph "Generated Resources"
PM[PlatformMonitoring CR]
OPERATORS[Operator Deployments]
CONFIGS[ConfigMaps/Secrets]
end
end
subgraph "Operator Controllers"
MO[Monitoring Operator]
TSDB_OP[VM Operator OR Prometheus Operator]
GO[Grafana Operator]
end
subgraph "Monitoring Components"
TSDB[VictoriaMetrics OR Prometheus]
GRAF[Grafana]
AM[AlertManager]
end
HELM -->|Creates| PM
HELM -->|Creates| OPERATORS
HELM -->|Creates| CONFIGS
VALUES -->|Configures| HELM
MO -->|Watches| PM
MO -->|Manages| TSDB
MO -->|Manages| GRAF
MO -->|Manages| AM
TSDB_OP -->|Manages| TSDB
GO -->|Manages| GRAF
The deployment can be customized through various configuration options in the Helm chart's values.yaml file, which controls aspects like storage, authentication, resource limits, and cloud provider integration.
Extension Architecture¶
The system can be extended through various custom resources and configuration options:
graph TB
subgraph "User Extensions"
SM[ServiceMonitor]
PM[PodMonitor]
GD[GrafanaDashboard]
PR[PrometheusRule]
AC[AlertmanagerConfig]
end
subgraph "Admin Extensions"
PLATFORMMON[PlatformMonitoring]
HELM[Helm Values]
CRD[Custom CRDs]
end
subgraph "Monitoring Stack"
TSDB[VictoriaMetrics OR Prometheus]
GRAF[Grafana]
AM[AlertManager]
end
SM -->|Configures| TSDB
PM -->|Configures| TSDB
GD -->|Provides| GRAF
PR -->|Configures| TSDB
AC -->|Configures| AM
PLATFORMMON -->|Controls| TSDB
PLATFORMMON -->|Controls| GRAF
PLATFORMMON -->|Controls| AM
HELM -->|Configures| PLATFORMMON
This architecture allows for flexible extensions by both users (who can add monitoring for their applications) and administrators (who can configure the overall monitoring system).
Security Architecture¶
The system includes security features such as authentication for the monitoring components:
graph TB
subgraph "Authentication Layer"
OAUTH[OAuth2/OIDC]
BASIC[Basic Auth]
LDAP[LDAP]
end
subgraph "Monitoring UIs"
GRAF[Grafana]
TSDB[VictoriaMetrics OR Prometheus]
AM[AlertManager]
end
subgraph "Proxy Layer"
OAUTH2PROXY[oauth2-proxy]
end
OAUTH -->|Native| GRAF
LDAP -->|Native| GRAF
BASIC -->|Native| GRAF
OAUTH -->|via Proxy| OAUTH2PROXY
OAUTH2PROXY -->|Protects| TSDB
OAUTH2PROXY -->|Protects| AM
The system supports various authentication methods, including OAuth/OIDC, basic auth, and token-based authentication, as well as TLS encryption for secure communications.
Component Relationships¶
The following table shows the relationships between different components:
Component | Managed By | Configures | Provides Data To |
---|---|---|---|
Prometheus | Prometheus Operator | ServiceMonitor, PodMonitor | Grafana, AlertManager |
VictoriaMetrics | VM Operator | VMServiceScrape, VMPodScrape | Grafana, VMAlert |
Grafana | Grafana Operator | GrafanaDashboard, GrafanaDataSource | Users |
AlertManager | Prometheus Operator | AlertmanagerConfig | Notification channels |
kube-state-metrics | Monitoring Operator | Built-in config | Prometheus, VictoriaMetrics |
node-exporter | Monitoring Operator | Built-in config | Prometheus, VictoriaMetrics |
Benefits of This Architecture¶
The Qubership Monitoring Operator architecture provides several key benefits:
- Simplified Management: Automates the deployment and configuration of complex monitoring components
- Comprehensive Monitoring: Collects metrics from various sources including Kubernetes, applications, and cloud providers
- Scalability: Supports both Prometheus and VictoriaMetrics for metrics storage, allowing for scalable monitoring solutions
- Visualization: Integrates with Grafana for metrics visualization and dashboarding
- Alerting: Provides alerting capabilities through AlertManager and VMAlertManager
- Cloud Integration: Supports integration with major cloud providers
- Extensibility: Allows users and administrators to extend functionality through custom resources
- Security: Provides multiple authentication and authorization options
This architecture enables organizations to deploy and maintain a production-ready monitoring stack with minimal operational overhead while providing the flexibility to customize and extend the system as needed.