Skip to content

Create metric alerting rules

Define alerting conditions based on Prometheus metrics using PrometheusRule CRDs. Alerts are automatically picked up by the Prometheus Operator and routed to Alertmanager.

  • UDS CLI installed
  • Access to a Kubernetes cluster with UDS Core deployed
  • Familiarity with PromQL

UDS Core ships default alerting rules from the upstream kube-prometheus-stack chart covering cluster health, node conditions, and platform components. Runbooks for these default rules are available at runbooks.prometheus-operator.dev. This guide covers creating custom rules for your applications and optionally tuning the defaults.

  1. Create a PrometheusRule

    Define a PrometheusRule custom resource containing one or more alert rules. The Prometheus Operator watches for these CRs and loads them into Prometheus automatically.

    my-app-alerts.yaml
    apiVersion: monitoring.coreos.com/v1
    kind: PrometheusRule
    metadata:
    name: my-app-alerts
    namespace: my-app
    spec:
    groups:
    - name: my-app
    rules:
    - alert: PodRestartingFrequently
    expr: increase(kube_pod_container_status_restarts_total[1h]) > 5
    for: 5m
    labels:
    severity: warning
    annotations:
    summary: "Pod {{ $labels.pod }} is restarting frequently"
    runbook: "https://example.com/runbooks/pod-restart"
    description: "Pod restarted {{ $value }} times in the last hour"
    - alert: HighMemoryUsage
    expr: |
    (container_memory_working_set_bytes / on(namespace, pod, container) kube_pod_container_resource_limits{resource="memory"}) * 100 > 80
    for: 15m
    labels:
    severity: warning
    annotations:
    summary: "High memory usage detected"
    runbook: "https://example.com/runbooks/high-memory-usage"
    description: "Container using {{ $value }}% of memory limit"

    Key fields in each rule:

    • expr — PromQL expression that defines the alerting condition. When this expression returns results, the alert becomes active.
    • for — How long the condition must be continuously true before the alert fires. Prevents flapping on transient spikes.
    • labels.severity — Used by Alertmanager for routing. Common values are critical, warning, and info.
    • annotations — Human-readable context attached to the alert. Include a summary, description, and runbook URL to make alerts actionable.
  2. Deploy the rule

    (Recommended) Include the PrometheusRule in your Zarf package and create/deploy. See Packaging applications for general packaging guidance.

    Terminal window
    uds zarf package create --confirm
    uds zarf package deploy zarf-package-*.tar.zst --confirm

    Or apply the PrometheusRule directly for quick testing:

    Terminal window
    uds zarf tools kubectl apply -f my-app-alerts.yaml

    The Prometheus Operator picks up PrometheusRule CRs automatically.

  3. Optional: Disable or tune default alert rules

    If default kube-prometheus-stack alerts are too noisy or not relevant to your environment, you can disable individual rules or entire rule groups through bundle overrides.

    uds-bundle.yaml
    packages:
    - name: core
    repository: registry.defenseunicorns.com/public/core
    ref: x.x.x-upstream
    overrides:
    kube-prometheus-stack:
    kube-prometheus-stack:
    values:
    # Disable specific individual rules by name
    - path: defaultRules.disabled
    value:
    KubeControllerManagerDown: true
    KubeSchedulerDown: true
    # Disable entire rule groups with boolean toggles
    - path: defaultRules.rules.kubeControllerManager
    value: false
    - path: defaultRules.rules.kubeSchedulerAlerting
    value: false

    Use defaultRules.disabled for fine-tuned control over individual rules. Use defaultRules.rules.* to disable entire rule groups when broader changes are needed.

    Create and deploy your bundle:

    Terminal window
    uds create <path-to-bundle-dir>
    uds deploy uds-bundle-<name>-<arch>-<version>.tar.zst

Open Grafana and navigate to Alerting > Alert rules. Filter by the Prometheus datasource. Confirm your custom rules appear in the list.

Check the rule state to understand its current status:

  • Inactive — condition is not met
  • Pending — condition is met but the for duration has not elapsed
  • Firing — active alert being sent to Alertmanager

Symptom: Custom alert rules do not show up in the Grafana alerting UI.

Solution: Verify the PrometheusRule CR was created successfully and check for YAML syntax errors:

Terminal window
uds zarf tools kubectl get prometheusrule -A
uds zarf tools kubectl describe prometheusrule <name> -n <namespace>

Symptom: The PromQL expression should match, but the alert stays in Inactive state.

Solution: Verify the PromQL expression returns results in the Prometheus UI:

Terminal window
uds zarf connect prometheus

Navigate to the Graph tab and run your expr query directly. If it returns results, check that the for duration has elapsed — the alert will remain in Pending state until the condition is continuously true for that period.

These guides and concepts may be useful to explore next: