Skip to content

Alert Management

UDS Core deploys Alertmanager as a part of the kube-prometheus-stack Helm chart. Alertmanager is responsible for handling alerts sent by Prometheus/Loki and managing notification routing and silencing.

It is recommended to configure Alertmanager to send alerts to a location that is actively monitored by your team. Common options include email, Slack, Mattermost, Microsoft Teams, or a paging service like PagerDuty or OpsGenie.

You can configure Alertmanager by providing bundle overrides. The below example shows how to configure Alertmanager to send Critical/Warning alert notifications to a Slack channel:

packages:
- name: uds-core
repository: ghcr.io/defenseunicorns/packages/uds/core
ref: x.x.x
overrides:
kube-prometheus-stack:
uds-prometheus-config:
values:
# Open up network egress to Slack API for Alertmanager
- path: additionalNetworkAllow
value:
- direction: Egress
selector:
app.kubernetes.io/name: alertmanager
ports:
- 443
remoteHost: api.slack.com
remoteProtocol: TLS
description: "Allow egress Alertmanager to Slack API"
kube-prometheus-stack:
values:
# Setup Alertmanager receivers
# These are the destinations that alerts can be sent to
# See: https://prometheus.io/docs/alerting/latest/configuration/#general-receiver-related-settings
- path: alertmanager.config.receivers
value:
- name: slack
slack_configs:
- api_url: <YOUR_SLACK_WEBHOOK_SECRET_URL> # e.g. "https://hooks.slack.com/services/XXX/YYY/ZZZ"
channel: <YOUR_SLACK_CHANNEL> # e.g. "#alerts"
send_resolved: true
- name: empty # Default receiver to catch any alerts that don't match a route
# Setup Alertmanager routing
# This defines how alerts are grouped and routed to receivers
# See: https://prometheus.io/docs/alerting/latest/configuration/#route-related-settings
- path: alertmanager.config.route
value:
group_by: ["alertname", "job"] # group by alertname and job
receiver: empty # Default receiver if no routes match
# Routes contains a route chain for matching alerts to receivers
routes:
# Send always firing Watchdog alerts to the empty receiver to avoid noise
# (you could also point this to a Dead Man's Snitch like service to detect if Alertmanager is down)
- matchers:
- alertname = Watchdog
receiver: empty
# Send critical and warning alerts to Slack
- matchers:
- severity =~ "warning|critical"
receiver: slack

You can find more information on configuring Alertmanager in the official documentation.

By default, UDS Core configures Alertmanager as a data source in Grafana. This means you can view and manage Alertmanager alerts by navigating to the Alerting section in the Grafana UI.

To view alerts go to Alerting -> Alert rules in the left-hand menu. Here you can see a list of all alerts. You can filter alerts by data source (Prometheus or Loki), severity, and status (firing or resolved). You can also click on an individual alert to see more details, including the alert expression, labels, and annotations.

Sometimes you may want to temporarily mute or silence certain alerts, for example during maintenance windows or when investigating an issue. You can do this by creating a silence in Alertmanager via the Grafana UI.

To create a silence, go to Alerting -> Silences (ensure Choose Alertmanager is set to Alertmanager and not Grafana) in the left-hand menu and click the New Silence button. Here you can specify the matchers for the alerts you want to silence, the duration of the silence, and an optional comment. This silence will be applied to Alertmanager via the Grafana Alertmanager data source.