Skip to content

Service Mesh

You’ll configure Istio’s control plane (istiod) and ingress gateways for production high availability by increasing minimum replica counts, tuning resource allocation, and verifying that pod anti-affinity is spreading replicas across nodes.

Istio’s control plane manages service discovery, certificate rotation, and configuration distribution for the entire mesh. If istiod becomes unavailable, new connections cannot be established and configuration changes stop propagating. The ingress gateways are the entry point for all external traffic — if a gateway goes down, traffic to the applications it serves is interrupted.

  • UDS CLI installed
  • Access to a Kubernetes cluster (multi-node, multi-AZ recommended)

UDS Core configures istiod with two HA mechanisms out of the box:

  • Horizontal Pod Autoscaler (HPA): enabled by default, scaling between 1 and 5 replicas based on CPU utilization
  • Pod anti-affinity: preferredDuringSchedulingIgnoredDuringExecution anti-affinity, which tells Kubernetes to prefer scheduling istiod replicas on different nodes

With the default autoscaleMin: 1, the HPA may scale istiod down to a single replica during low-traffic periods — creating a temporary single point of failure.

  1. Increase the minimum replica count for HA

    Set autoscaleMin to 2 (or higher) to ensure at least two istiod replicas are always running:

    uds-bundle.yaml
    packages:
    - name: core
    repository: registry.defenseunicorns.com/public/core
    ref: x.x.x-upstream
    overrides:
    istio-controlplane:
    istiod:
    values:
    # Minimum istiod replicas (default: 1)
    - path: autoscaleMin
    value: 2
    # Maximum istiod replicas (default: 5)
    - path: autoscaleMax
    value: 5
  2. Tune istiod resources

    The default istiod resource allocation (500m CPU, 2Gi memory) is sized for moderate clusters. For larger clusters with many services or high configuration complexity, increase the allocation:

    uds-bundle.yaml
    packages:
    - name: core
    repository: registry.defenseunicorns.com/public/core
    ref: x.x.x-upstream
    overrides:
    istio-controlplane:
    istiod:
    values:
    # istiod resources (adjust for your environment)
    - path: resources
    value:
    requests:
    cpu: 500m
    memory: 2Gi
    limits:
    cpu: 1000m
    memory: 4Gi
  3. Scale the admin and tenant ingress gateways

    UDS Core deploys separate ingress gateways for admin and tenant traffic. Both use the upstream Istio gateway chart with HPA enabled by default (min 1, max 5). For production, increase the minimum replicas and tune resources for both gateways:

    uds-bundle.yaml
    packages:
    - name: core
    repository: registry.defenseunicorns.com/public/core
    ref: x.x.x-upstream
    overrides:
    istio-admin-gateway:
    gateway:
    values:
    # Admin gateway minimum replicas (default: 1)
    - path: autoscaling.minReplicas
    value: 2
    # Admin gateway maximum replicas (default: 5)
    - path: autoscaling.maxReplicas
    value: 8
    # Admin gateway resources (adjust for your environment)
    - path: resources.requests.cpu
    value: 750m
    - path: resources.requests.memory
    value: 1024Mi
    - path: resources.limits.cpu
    value: 2000m
    - path: resources.limits.memory
    value: 4Gi
    # Scale based on CPU and memory request utilization
    - path: autoscaling.targetCPUUtilizationPercentage
    value: 100
    - path: autoscaling.targetMemoryUtilizationPercentage
    value: 100
    istio-tenant-gateway:
    gateway:
    values:
    # Tenant gateway minimum replicas (default: 1)
    - path: autoscaling.minReplicas
    value: 2
    # Tenant gateway maximum replicas (default: 5)
    - path: autoscaling.maxReplicas
    value: 8
    # Tenant gateway resources (adjust for your environment)
    - path: resources.requests.cpu
    value: 750m
    - path: resources.requests.memory
    value: 1024Mi
    - path: resources.limits.cpu
    value: 2000m
    - path: resources.limits.memory
    value: 4Gi
    # Scale based on CPU and memory request utilization
    - path: autoscaling.targetCPUUtilizationPercentage
    value: 100
    - path: autoscaling.targetMemoryUtilizationPercentage
    value: 100
    # Optional: customize scaling behavior
    - path: autoscaling.autoscaleBehavior
    value:
    scaleUp:
    stabilizationWindowSeconds: 30
    policies:
    - type: Percent
    value: 50
    periodSeconds: 15
    scaleDown:
    stabilizationWindowSeconds: 300
    policies:
    - type: Percent
    value: 20
    periodSeconds: 60
  4. Create and deploy your bundle

    Terminal window
    uds create <path-to-bundle-dir>
    uds deploy uds-bundle-<name>-<arch>-<version>.tar.zst
Terminal window
# Confirm istiod pods are on different nodes
uds zarf tools kubectl get pods -n istio-system -l app=istiod -o custom-columns=NAME:.metadata.name,NODE:.spec.nodeName,STATUS:.status.phase
# Check istiod HPA status
uds zarf tools kubectl get hpa -n istio-system
# Check admin gateway HPA and pods
uds zarf tools kubectl get hpa -n istio-admin-gateway
uds zarf tools kubectl get pods -n istio-admin-gateway -o wide
# Check tenant gateway HPA and pods
uds zarf tools kubectl get hpa -n istio-tenant-gateway
uds zarf tools kubectl get pods -n istio-tenant-gateway -o wide

Success criteria:

  • istiod has at least 2 replicas Running, distributed across different nodes (on 3+ node clusters)
  • Admin and tenant gateways each have at least 2 replicas Running
  • All HPAs show the expected min/max replica range

Problem: istiod pods scheduled on the same node

Section titled “Problem: istiod pods scheduled on the same node”

Symptoms: All istiod replicas are on a single node, creating a single point of failure.

Solution: The anti-affinity is a soft preference — Kubernetes will co-locate pods when it has no better option. Verify you have at least 3 schedulable nodes:

Terminal window
uds zarf tools kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints

If nodes have taints preventing istiod scheduling, add appropriate tolerations via bundle overrides for the istiod chart under the istio-controlplane component.

Symptoms: HPA shows <unknown> for current metrics or replicas stay at minimum.

Solution: Ensure the metrics-server is running and healthy:

Terminal window
uds zarf tools kubectl get pods -n kube-system -l k8s-app=metrics-server

These guides and concepts may be useful to explore next: