Prometheus
The de facto open-source metrics collection and alerting system. Prometheus scrapes metrics endpoints, stores time-series data, and evaluates alert rules. The backbone of most Kubernetes observability stacks.
Why Prometheus?
You're running Kubernetes and need metrics collection
You want open-source, self-hosted monitoring with no vendor lock-in
You need fine-grained custom metrics with PromQL
Signal Breakdown
What drives the Trust Score
Download Trend
Last 12 months
Tradeoffs & Caveats
Know before you commitYou want a managed, zero-ops solution (Datadog or Grafana Cloud)
You need long-term metrics storage — Prometheus is short-term by default
Your team lacks ops experience to manage retention and scaling
Pricing
Free tier & paid plans
100% free, open-source
Free & open-source
Grafana Cloud managed: free up to 10K metrics
Alternative Tools
Other options worth considering
Often Used Together
Complementary tools that pair well with Prometheus
Learning Resources
Docs, videos, tutorials, and courses
Get Started
Repository and installation options
View on GitHub
github.com/prometheus/prometheus
brew install prometheusdocker run -p 9090:9090 prom/prometheusQuick Start
Copy and adapt to get going fast
# prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'my-app'
static_configs:
- targets: ['localhost:3000']
alerting:
alertmanagers:
- static_configs:
- targets: ['alertmanager:9093']
rule_files:
- 'alerts.yml'Code Examples
Common usage patterns
Alert rules
Fire an alert when error rate exceeds threshold
# alerts.yml
groups:
- name: my-app
rules:
- alert: HighErrorRate
expr: |
rate(http_requests_total{status=~"5.."}[5m])
/ rate(http_requests_total[5m]) > 0.05
for: 2m
labels:
severity: critical
annotations:
summary: "High error rate on {{ $labels.route }}"
description: "Error rate is {{ $value | humanizePercentage }}"PromQL queries
Common queries for dashboards and alerts
# Request rate (per second over 5 min)
rate(http_requests_total[5m])
# 95th percentile latency
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
# Error rate
sum(rate(http_requests_total{status=~"5.."}[5m])) /
sum(rate(http_requests_total[5m]))
# Requests by route
topk(10, sum by(route) (rate(http_requests_total[5m])))Docker Compose stack
Run Prometheus + Grafana together locally
services:
prometheus:
image: prom/prometheus:latest
ports: ["9090:9090"]
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
grafana:
image: grafana/grafana:latest
ports: ["3001:3000"]
environment:
GF_SECURITY_ADMIN_PASSWORD: secret
depends_on: [prometheus]Community Notes
Real experiences from developers who've used this tool