Create metric alerting rules
What you’ll accomplish
Section titled “What you’ll accomplish”Define alerting conditions based on Prometheus metrics using PrometheusRule CRDs. Alerts are automatically picked up by the Prometheus Operator and routed to Alertmanager.
Prerequisites
Section titled “Prerequisites”Before you begin
Section titled “Before you begin”UDS Core ships default alerting rules from the upstream kube-prometheus-stack chart covering cluster health, node conditions, and platform components. Runbooks for these default rules are available at runbooks.prometheus-operator.dev. This guide covers creating custom rules for your applications and optionally tuning the defaults.
-
Create a PrometheusRule
Define a
PrometheusRulecustom resource containing one or more alert rules. The Prometheus Operator watches for these CRs and loads them into Prometheus automatically.my-app-alerts.yaml apiVersion: monitoring.coreos.com/v1kind: PrometheusRulemetadata:name: my-app-alertsnamespace: my-appspec:groups:- name: my-apprules:- alert: PodRestartingFrequentlyexpr: increase(kube_pod_container_status_restarts_total[1h]) > 5for: 5mlabels:severity: warningannotations:summary: "Pod {{ $labels.pod }} is restarting frequently"runbook: "https://example.com/runbooks/pod-restart"description: "Pod restarted {{ $value }} times in the last hour"- alert: HighMemoryUsageexpr: |(container_memory_working_set_bytes / on(namespace, pod, container) kube_pod_container_resource_limits{resource="memory"}) * 100 > 80for: 15mlabels:severity: warningannotations:summary: "High memory usage detected"runbook: "https://example.com/runbooks/high-memory-usage"description: "Container using {{ $value }}% of memory limit"Key fields in each rule:
expr— PromQL expression that defines the alerting condition. When this expression returns results, the alert becomes active.for— How long the condition must be continuously true before the alert fires. Prevents flapping on transient spikes.labels.severity— Used by Alertmanager for routing. Common values arecritical,warning, andinfo.annotations— Human-readable context attached to the alert. Include asummary,description, andrunbookURL to make alerts actionable.
-
Deploy the rule
(Recommended) Include the PrometheusRule in your Zarf package and create/deploy. See Packaging applications for general packaging guidance.
Terminal window uds zarf package create --confirmuds zarf package deploy zarf-package-*.tar.zst --confirmOr apply the PrometheusRule directly for quick testing:
Terminal window uds zarf tools kubectl apply -f my-app-alerts.yamlThe Prometheus Operator picks up PrometheusRule CRs automatically.
-
Optional: Disable or tune default alert rules
If default kube-prometheus-stack alerts are too noisy or not relevant to your environment, you can disable individual rules or entire rule groups through bundle overrides.
uds-bundle.yaml packages:- name: corerepository: registry.defenseunicorns.com/public/coreref: x.x.x-upstreamoverrides:kube-prometheus-stack:kube-prometheus-stack:values:# Disable specific individual rules by name- path: defaultRules.disabledvalue:KubeControllerManagerDown: trueKubeSchedulerDown: true# Disable entire rule groups with boolean toggles- path: defaultRules.rules.kubeControllerManagervalue: false- path: defaultRules.rules.kubeSchedulerAlertingvalue: falseUse
defaultRules.disabledfor fine-tuned control over individual rules. UsedefaultRules.rules.*to disable entire rule groups when broader changes are needed.Create and deploy your bundle:
Terminal window uds create <path-to-bundle-dir>uds deploy uds-bundle-<name>-<arch>-<version>.tar.zst
Verification
Section titled “Verification”Open Grafana and navigate to Alerting > Alert rules. Filter by the Prometheus datasource. Confirm your custom rules appear in the list.
Check the rule state to understand its current status:
- Inactive — condition is not met
- Pending — condition is met but the
forduration has not elapsed - Firing — active alert being sent to Alertmanager
Troubleshooting
Section titled “Troubleshooting”Rule not appearing in Grafana
Section titled “Rule not appearing in Grafana”Symptom: Custom alert rules do not show up in the Grafana alerting UI.
Solution: Verify the PrometheusRule CR was created successfully and check for YAML syntax errors:
uds zarf tools kubectl get prometheusrule -Auds zarf tools kubectl describe prometheusrule <name> -n <namespace>Alert not firing when expected
Section titled “Alert not firing when expected”Symptom: The PromQL expression should match, but the alert stays in Inactive state.
Solution: Verify the PromQL expression returns results in the Prometheus UI:
uds zarf connect prometheusNavigate to the Graph tab and run your expr query directly. If it returns results, check that the for duration has elapsed — the alert will remain in Pending state until the condition is continuously true for that period.
Related Documentation
Section titled “Related Documentation”- Prometheus: Alerting rules — PromQL alerting rule syntax
- Prometheus: Alerting best practices — guidance on alert design
- Prometheus Operator: PrometheusRule API — full CRD field reference
- Default rule runbooks — troubleshooting guides for kube-prometheus-stack alerts
Next steps
Section titled “Next steps”These guides and concepts may be useful to explore next: