Performance Guide¶
This guide covers performance tuning and optimization for the HAProxy Template Ingress Controller.
Overview¶
Performance optimization involves three areas:
- Controller performance - Template rendering, reconciliation cycles
- HAProxy performance - Load balancer throughput and latency
- Kubernetes integration - Resource watching and event handling
Controller Resource Sizing¶
Recommended Resources¶
| Deployment Size | CPU Request | CPU Limit | Memory Request | Memory Limit |
|---|---|---|---|---|
| Small (<50 Ingresses) | 50m | 200m | 64Mi | 256Mi |
| Medium (50-200 Ingresses) | 100m | 500m | 128Mi | 512Mi |
| Large (200+ Ingresses) | 200m | 1000m | 256Mi | 1Gi |
Configure via Helm values:
Memory Considerations¶
Memory usage scales with:
- Number of watched resources (Ingresses, Services, Endpoints)
- Size of template library
- Event buffer size (default 1000 events)
- Number of HAProxy pods being managed
Monitor memory usage:
CPU Considerations¶
CPU spikes occur during:
- Template rendering (complex templates with many resources)
- Initial resource synchronization (startup)
- Burst of resource changes (rolling updates)
Monitor CPU usage:
Reconciliation Tuning¶
Debounce Interval¶
The controller debounces resource changes to avoid excessive reconciliation:
Tuning guidelines:
- Lower (100-300ms): Faster response to changes, higher CPU usage
- Default (500ms): Balanced for most workloads
- Higher (1-5s): Better for high-churn environments with many changes
Reconciliation Metrics¶
Monitor reconciliation performance:
# Average reconciliation duration
rate(haptic_reconciliation_duration_seconds_sum[5m]) /
rate(haptic_reconciliation_duration_seconds_count[5m])
# Reconciliation rate
rate(haptic_reconciliation_total[5m])
# P95 reconciliation latency
histogram_quantile(0.95, rate(haptic_reconciliation_duration_seconds_bucket[5m]))
Target metrics:
- Average reconciliation: <500ms
- P95 reconciliation: <2s
- Error rate: <1%
Template Optimization¶
Efficient Template Patterns¶
Use early filtering:
{#- GOOD: Filter early, process less data -#}
{%- var matching_ingresses = []any{} %}
{%- for _, ingress := range resources.ingresses.List() %}
{%- if ingress.spec.ingressClassName == "haproxy" %}
{%- matching_ingresses = append(matching_ingresses, ingress) %}
{%- end %}
{%- end %}
{%- for _, ingress := range matching_ingresses %}
...
{%- end %}
{#- ALTERNATIVE: Process with inline filtering -#}
{%- for _, ingress := range resources.ingresses.List() %}
{%- if ingress.spec.ingressClassName == "haproxy" %}
...
{%- end %}
{%- end %}
Use caching for expensive operations:
{%- if !has_cached("sorted_routes") %}
{#- Expensive computation only runs once per render -#}
{%- var sorted_routes = []any{} %}
{%- for _, route := range resources.httproutes.List() %}
{%- sorted_routes = append(sorted_routes, route) %}
{%- end %}
{%- set_cached("sorted_routes", sorted_routes) %}
{%- end %}
{%- var analysis_routes = get_cached("sorted_routes") %}
Avoid nested loops when possible:
{#- AVOID: O(n*m) complexity -#}
{%- for _, ingress := range ingresses %}
{%- for _, service := range services %}
{%- if ingress.spec.backend.service.name == service.metadata.name %}
...
{%- end %}
{%- end %}
{%- end %}
{#- BETTER: Use indexing or filtering -#}
{%- var service_map = map[string]any{} %}
{%- for _, service := range services %}
{%- service_map[service.metadata.name] = service %}
{%- end %}
{%- for _, ingress := range ingresses %}
{%- var service = service_map[ingress.spec.backend.service.name] %}
...
{%- end %}
Template Debugging¶
Profile template rendering:
# Enable template tracing
./bin/haptic-controller validate -f config.yaml --trace
# View trace output
cat /tmp/template-trace.log
HAProxy Optimization¶
Configuration Parameters¶
Key HAProxy parameters for performance:
global
maxconn {{ fallback(controller.config.haproxy.maxconn, 2000) }}
nbthread {{ fallback(controller.config.haproxy.nbthread, 4) }}
tune.bufsize {{ fallback(controller.config.haproxy.bufsize, 16384) }}
tune.ssl.default-dh-param 2048
defaults
timeout connect 5s
timeout client 50s
timeout server 50s
timeout http-request 10s
timeout queue 60s
Connection Limits¶
Calculate maxconn based on expected load:
Example:
- Expected: 10,000 concurrent connections
- Safety factor: 1.5
- HAProxy pods: 3
- maxconn = (10,000 * 1.5) / 3 = 5,000
Thread Configuration¶
Match nbthread to available CPU cores:
# HAProxy pod resources
resources:
requests:
cpu: 2
limits:
cpu: 4
# HAProxy config
global
nbthread 4 # Match CPU limit
Buffer Sizing¶
Increase buffers for large headers or payloads:
Password Hash Performance¶
HAProxy validates password hash formats during configuration parsing by running the full hashing algorithm. This can significantly slow down config validation when using expensive hash algorithms.
Hash algorithm validation times:
| Algorithm | Example | Time per hash |
|---|---|---|
| MD5 | $1$salt$hash |
~0.004ms |
| SHA-256 | $5$salt$hash |
~3ms |
| SHA-512 | $6$salt$hash |
~3ms |
| bcrypt (cost 10) | $2y$10$salt$hash |
~85ms |
bcrypt with high cost factors is expensive
A configuration with 200 bcrypt passwords at cost factor 10 adds ~17 seconds to every config validation. This directly impacts reconciliation time and webhook validation latency.
Recommendations:
- Prefer SHA-512 (
$6$) for password hashes - cryptographically strong with fast validation - Avoid bcrypt cost factors above 8 in high-frequency validation scenarios
- Consolidate userlists to avoid duplicate password entries - HAProxy validates each occurrence separately, even for identical hashes
- Consider external authentication (OAuth, OIDC) for large user bases instead of embedding passwords in config
Checking your config:
# Count expensive bcrypt hashes
grep -c '\$2[aby]\$' /path/to/haproxy.cfg
# Estimate validation overhead (bcrypt count × 85ms)
Scaling Strategies¶
Horizontal Scaling¶
Scale HAProxy pods for increased traffic:
The controller automatically discovers new pods and deploys configuration.
Controller Scaling (HA Mode)¶
For high availability, run multiple controller replicas:
Only the leader performs deployments; followers maintain hot-standby status.
Resource Watching Optimization¶
Reduce watched resources to minimize controller load:
# Only watch specific namespaces
spec:
watchedResources:
ingresses:
apiVersion: networking.k8s.io/v1
resources: ingresses
namespaceSelector:
matchNames:
- production
- staging
# Use label selectors
spec:
watchedResources:
ingresses:
apiVersion: networking.k8s.io/v1
resources: ingresses
labelSelector:
matchLabels:
managed-by: haptic
Deployment Performance¶
Deployment Latency¶
Monitor deployment time:
# Average deployment duration
rate(haptic_deployment_duration_seconds_sum[5m]) /
rate(haptic_deployment_duration_seconds_count[5m])
# P95 deployment latency
histogram_quantile(0.95, rate(haptic_deployment_duration_seconds_bucket[5m]))
Target metrics:
- Average deployment: <1s per HAProxy pod
- P95 deployment: <3s
Parallel Deployment¶
The controller deploys to multiple HAProxy pods in parallel. If deployment is slow:
- Check DataPlane API responsiveness
- Verify network connectivity to HAProxy pods
- Consider reducing config complexity
Drift Prevention¶
Configure drift prevention to avoid unnecessary deployments:
Event Processing¶
Event Buffer Sizing¶
The controller maintains event buffers for debugging:
Increase for high-throughput environments if you need more event history.
Subscriber Performance¶
Monitor event subscriber health:
# Event publishing rate
rate(haptic_events_published_total[5m])
# Subscriber count (should be constant)
haptic_event_subscribers
If subscriber count drops, components may be failing.
Profiling¶
Go Profiling¶
Access pprof endpoints for profiling:
# CPU profile (30 seconds)
curl http://localhost:6060/debug/pprof/profile?seconds=30 > cpu.pprof
go tool pprof -http=:8080 cpu.pprof
# Memory profile
curl http://localhost:6060/debug/pprof/heap > heap.pprof
go tool pprof -http=:8080 heap.pprof
# Goroutine dump
curl http://localhost:6060/debug/pprof/goroutine?debug=1
Profile-Guided Optimization (PGO)¶
The controller is built with Go's Profile-Guided Optimization (PGO) for improved performance. PGO typically provides 2-7% CPU improvement by optimizing frequently-called functions.
How it works:
A baseline CPU profile (cmd/controller/default.pgo) is committed to the repository. Go automatically uses this profile during builds to optimize hot paths.
Updating the profile:
To collect a fresh profile from the development environment:
-
Start the dev environment:
-
Port-forward to the controller's debug port:
-
Generate workload (trigger reconciliation by modifying resources)
-
Collect a 30-second CPU profile:
-
Rebuild with the new profile:
Production profiles:
For optimal results, collect profiles from production during representative workloads. Merge multiple profiles for broader coverage:
Common Performance Issues¶
High memory usage:
- Check for memory leaks: growing heap over time
- Reduce event buffer size
- Limit watched resources
High CPU usage:
- Profile to find hot spots
- Optimize template complexity
- Increase debounce interval
Slow deployments:
- Check DataPlane API health
- Verify network latency to HAProxy pods
- Consider reducing config size
Performance Checklist¶
Initial Deployment¶
- [ ] Set appropriate resource requests/limits
- [ ] Configure debounce interval for workload
- [ ] Set HAProxy maxconn based on expected load
- [ ] Match nbthread to CPU allocation
Ongoing Optimization¶
- [ ] Monitor reconciliation latency
- [ ] Monitor deployment latency
- [ ] Watch for memory growth
- [ ] Track event subscriber count
High-Load Environments¶
- [ ] Scale HAProxy pods horizontally
- [ ] Enable HA mode for controller
- [ ] Limit watched namespaces
- [ ] Use label selectors to filter resources
- [ ] Profile and optimize templates
See Also¶
- Monitoring Guide - Performance metrics and alerting
- High Availability - HA deployment patterns
- Debugging Guide - Performance troubleshooting