Monitoring¶
Overview¶
The controller exposes Prometheus metrics on port 9090 at the /metrics endpoint. This page covers how to enable and configure metrics collection via the Helm chart.
For the complete metrics reference, alerting rules, and dashboard examples, see the Monitoring Guide in the controller documentation.
Metrics Overview¶
The controller exposes 11 Prometheus metrics covering:
- Reconciliation: Cycles, errors, and duration
- Deployment: Operations, errors, and duration
- Validation: Total validations and errors
- Resources: Tracked resource counts by type
- Events: Event bus activity and subscribers
Quick Access¶
Access metrics directly via port-forward:
kubectl port-forward -n haptic deployment/haptic-controller 9090:9090
curl http://localhost:9090/metrics
Prometheus ServiceMonitor¶
Enable Prometheus Operator integration:
monitoring:
serviceMonitor:
enabled: true
interval: 30s
scrapeTimeout: 10s
labels:
prometheus: kube-prometheus # Match your Prometheus selector
With NetworkPolicy¶
If using NetworkPolicy, allow Prometheus to scrape metrics. See Networking for details.
Advanced ServiceMonitor Configuration¶
Add custom labels and relabeling:
monitoring:
serviceMonitor:
enabled: true
interval: 15s
labels:
prometheus: kube-prometheus
team: platform
# Add cluster label to all metrics
relabelings:
- sourceLabels: [__address__]
targetLabel: cluster
replacement: production
# Drop specific metrics
metricRelabelings:
- sourceLabels: [__name__]
regex: 'haptic_event_subscribers'
action: drop
Example Prometheus Queries¶
# Reconciliation rate (per second)
rate(haptic_reconciliation_total[5m])
# Error rate
rate(haptic_reconciliation_errors_total[5m])
# 95th percentile reconciliation duration
histogram_quantile(0.95, rate(haptic_reconciliation_duration_seconds_bucket[5m]))
# Current HAProxy pod count
haptic_resource_count{type="haproxy-pods"}
Grafana Dashboard¶
Create dashboards using these key metrics:
- Operations Overview: reconciliation_total, deployment_total, validation_total
- Error Tracking: *_errors_total counters
- Performance: *_duration_seconds histograms
- Resource Utilization: resource_count gauge
For complete metric definitions and more queries, see pkg/controller/metrics/README.md in the repository.