Skip to content

Observability Overview

UDS Core provides a comprehensive observability stack that includes logging, monitoring, and alerting capabilities.

  • Prometheus: A powerful metrics collection and alerting system that scrapes metrics from various sources and stores them in a time-series database.
  • Loki: A log aggregation system designed for storing and querying logs from various sources.
  • Grafana: A popular open-source platform for monitoring and observability that provides rich visualization capabilities for both metrics and logs.
  • Alertmanager: A component of the Prometheus ecosystem that handles alerts sent by Prometheus and manages notification channels.

There are many ways you can configure and use the observability stack. Here are some best practices that align with UDS Core’s architecture and design principles:

  • Utilize PrometheusRule CRs to define alerting rules and thresholds that are relevant to your specific use cases. These should be deployed alongside your applications to ensure that alerts are contextually relevant. See Metrics Alerting for more details.
  • Configure Alertmanager to route alerts to appropriate notification channels such as email, Slack, or Paging Services. This ensures that the right teams are notified promptly when issues arise. See Alert Management for more details.
  • Utilize Loki Ruler recording and alerting rules to create custom log-based alerts that complement your metrics-based alerts. This can help you detect issues that may not be captured by metrics alone. See Log Alerting for more details.
  • Avoid using Grafana Managed Alerts when possible. Evaluating alerts at source (Prometheus and Loki) is more efficient and provides better context for alerting. Grafana Managed Alerts should be reserved for advanced use cases like trying to correlate multiple data sources.
  • Use Grafana Dashboards to create visualizations that provide insights into the health and performance of your applications and infrastructure. Dashboards should be tailored to the needs of different teams and stakeholders. See Dashboards for more details.
  • Declaratively manage all your Observability resources with GitOps practices. Store your PrometheusRule CRs, Loki Rule ConfigMaps, Grafana Dashboard ConfigMaps in a version-controlled repository. This ensures that your observability configuration is consistent and reproducible across environments.