Grafana + Prometheus + Loki: Build a Complete Observability Stack
Metrics without logs are useless. Logs without metrics are noise. You need both, correlated, in one place. That's what the Grafana observability stack gives you.
The combination of Prometheus (metrics), Loki (logs), and Grafana (visualization) has become the de facto standard for self-hosted observability. It's what Datadog does, except you own it, and it costs a fraction of the price.
Here's how to set it up properly.
The Architecture
Prometheus scrapes metrics from your services and stores time-series data. Loki aggregates logs using the same label-based approach as Prometheus. Grafana sits on top, providing unified dashboards where you can jump from a metric spike directly to the relevant logs.
The beauty is in the correlation: same labels across metrics and logs mean you can click on a CPU spike and immediately see what your application was logging at that moment.
Docker Compose Setup
Here's a production-ready configuration:
version: '3.8'
services:
prometheus:
image: prom/prometheus:latest
container_name: prometheus
ports:
- "172.17.0.1:9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.retention.time=30d'
restart: always
loki:
image: grafana/loki:latest
container_name: loki
ports:
- "172.17.0.1:3100:3100"
volumes:
- ./loki-config.yml:/etc/loki/local-config.yaml
- loki_data:/loki
command: -config.file=/etc/loki/local-config.yaml
restart: always
promtail:
image: grafana/promtail:latest
container_name: promtail
volumes:
- ./promtail-config.yml:/etc/promtail/config.yml
- /var/log:/var/log:ro
- /var/lib/docker/containers:/var/lib/docker/containers:ro
command: -config.file=/etc/promtail/config.yml
restart: always
grafana:
image: grafana/grafana:latest
container_name: grafana
ports:
- "172.17.0.1:3000:3000"
volumes:
- grafana_data:/var/lib/grafana
environment:
- GF_SECURITY_ADMIN_PASSWORD=your-secure-password
- GF_USERS_ALLOW_SIGN_UP=false
restart: always
volumes:
prometheus_data:
loki_data:
grafana_data:
Prometheus Configuration
Create prometheus.yml:
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node'
static_configs:
- targets: ['node-exporter:9100']
- job_name: 'docker'
static_configs:
- targets: ['cadvisor:8080']
- job_name: 'your-app'
static_configs:
- targets: ['your-app:8080']
metrics_path: /metrics
Loki Configuration
Create loki-config.yml:
auth_enabled: false
server:
http_listen_port: 3100
common:
ring:
instance_addr: 127.0.0.1
kvstore:
store: inmemory
replication_factor: 1
path_prefix: /loki
schema_config:
configs:
- from: 2020-10-24
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h
storage_config:
filesystem:
directory: /loki/chunks
Promtail Configuration
Create promtail-config.yml to ship logs to Loki:
server:
http_listen_port: 9080
positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
- job_name: docker
docker_sd_configs:
- host: unix:///var/run/docker.sock
refresh_interval: 5s
relabel_configs:
- source_labels: ['__meta_docker_container_name']
target_label: 'container'
- source_labels: ['__meta_docker_container_log_stream']
target_label: 'stream'
Essential PromQL Queries
Once running, use these queries in Grafana:
CPU usage percentage:
100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
Memory usage:
(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100
Request rate:
sum(rate(http_requests_total[5m])) by (status_code)
Essential LogQL Queries
Query logs in Grafana using LogQL:
Filter by container:
{container="your-app"} |= "error"
Parse JSON logs:
{container="your-app"} | json | level="error"
Count errors over time:
count_over_time({container="your-app"} |= "error" [5m])
Connecting Metrics and Logs
The magic happens when you link them. In Grafana, create a data link from your Prometheus panels to Loki:
- Edit your metrics panel
- Go to "Field" tab → "Data links"
- Add link with URL:
/explore?left=["now-1h","now","Loki",{"expr":"{container=\"your-app\"}"}]
Now clicking a metric spike opens the corresponding logs.
Setting Up Alerts
Create alerting rules in Prometheus that fire when things go wrong:
groups:
- name: infrastructure
rules:
- alert: HighErrorRate
expr: sum(rate(http_requests_total{status_code=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) > 0.05
for: 5m
labels:
severity: critical
annotations:
summary: "Error rate above 5%"
- alert: ServiceDown
expr: up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Service {{ $labels.job }} is down"
Configure Grafana to send these alerts to Slack, PagerDuty, or email through the Alerting section in the UI.
Troubleshooting
Loki not receiving logs: Check Promtail is running and can reach Loki. Verify with curl http://localhost:3100/ready.
High cardinality warnings: Avoid labels with unbounded values. Container name is fine, request ID is not.
Grafana can't connect to Prometheus: Ensure you're using the Docker service name (prometheus:9090) not localhost when configuring datasources.
Logs delayed: Promtail batches logs before sending. Reduce batch_wait in the config for near-real-time ingestion.
Dashboard slow to load: Use recording rules in Prometheus to pre-compute expensive queries. This dramatically improves dashboard performance.
Deploy on Elestio
Setting up and maintaining this stack takes effort. Elestio offers managed versions of all three components:
- Grafana for visualization and dashboards
- Prometheus for metrics collection
- Loki for log aggregation
Each service comes with automated backups, updates, and pre-built configurations. Get your full observability stack running in minutes, starting at ~$16/month per service.
What's Next
Add Tempo for distributed tracing to complete the stack. With metrics, logs, and traces unified in Grafana, you can trace a request from the frontend, through your microservices, to the database, seeing exactly where time is spent and what went wrong.
The days of paying $50,000+/year for Datadog are over. Self-hosted observability works, and it works well.
Thanks for reading!