Overview
Production deployments of UDS Core need redundancy, autoscaling, and fault tolerance to meet uptime requirements. This section provides per-component guides for configuring high availability across the platform stack.
These guides assume you already have UDS Core deployed and are familiar with UDS bundle overrides. Where relevant, guides also cover how to adjust resource allocations for production workloads. For background on each component, see the Core Features concepts.
HA capabilities at a glance
Section titled “HA capabilities at a glance”| Component | HA Mechanism | External Dependency | Default Behavior |
|---|---|---|---|
| Keycloak | HPA (2–5 replicas) | PostgreSQL | Single replica (devMode) |
| Grafana | HPA (2–5 replicas) | PostgreSQL | Single replica |
| Loki | Multi-replica (SimpleScalable) | S3-compatible storage | 3 replicas per tier |
| Vector | DaemonSet | None | One pod per node |
| Prometheus | Resource tuning | External TSDB (for multi-replica) | Single replica |
| Authservice | HPA (1–3 replicas) | Redis / Valkey | Single replica |
| Falcosidekick | Static replicas | None | 2 replicas |
| Istio (istiod) | HPA + pod anti-affinity | None | HPA (1–5 replicas) |
| Istio (gateways) | HPA | None | HPA (1–5 replicas) |
Related Documentation
Section titled “Related Documentation”These external resources provide foundational Kubernetes and component-specific HA guidance that complements the UDS Core guides below:
- Kubernetes: Running in multiple zones — distributing workloads across failure domains
- Kubernetes: Disruptions and PodDisruptionBudgets — protecting availability during voluntary disruptions
- Kubernetes: Horizontal Pod Autoscaling — scaling workloads based on resource utilization
- EKS Best Practices: Reliability — AWS-specific resilience patterns
- AKS Best Practices: Reliability — Azure-specific resilience patterns
- GKE Best Practices: Scalability — GCP-specific scaling and HA guidance
Component guides
Section titled “Component guides” Keycloak External PostgreSQL, HPA autoscaling, and waypoint proxy scaling.
Logging (Loki & Vector) Loki replica tuning with external S3 storage and Vector production resource configuration.
Monitoring (Grafana & Prometheus) HA Grafana with external PostgreSQL and Prometheus resource tuning.
Authservice External Redis session store and replica scaling for SSO proxy resilience.
Runtime Security Falcosidekick replica tuning for resilient alert delivery.
Service Mesh Istio control plane and ingress gateway scaling, resource tuning, and anti-affinity verification.