Grafana + Prometheus + Loki: Build a Complete Observability Stack

Grafana + Prometheus + Loki: Build a Complete Observability Stack

Metrics without logs are useless. Logs without metrics are noise. You need both, correlated, in one place. That's what the Grafana observability stack gives you.

The combination of Prometheus (metrics), Loki (logs), and Grafana (visualization) has become the de facto standard for self-hosted observability. It's what Datadog does, except you own it, and it costs a fraction of the price.

Here's how to set it up properly.

The Architecture

Prometheus scrapes metrics from your services and stores time-series data. Loki aggregates logs using the same label-based approach as Prometheus. Grafana sits on top, providing unified dashboards where you can jump from a metric spike directly to the relevant logs.

The beauty is in the correlation: same labels across metrics and logs mean you can click on a CPU spike and immediately see what your application was logging at that moment.

Docker Compose Setup

Here's a production-ready configuration:

version: '3.8'
services:
  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    ports:
      - "172.17.0.1:9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.retention.time=30d'
    restart: always

  loki:
    image: grafana/loki:latest
    container_name: loki
    ports:
      - "172.17.0.1:3100:3100"
    volumes:
      - ./loki-config.yml:/etc/loki/local-config.yaml
      - loki_data:/loki
    command: -config.file=/etc/loki/local-config.yaml
    restart: always

  promtail:
    image: grafana/promtail:latest
    container_name: promtail
    volumes:
      - ./promtail-config.yml:/etc/promtail/config.yml
      - /var/log:/var/log:ro
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
    command: -config.file=/etc/promtail/config.yml
    restart: always

  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    ports:
      - "172.17.0.1:3000:3000"
    volumes:
      - grafana_data:/var/lib/grafana
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=your-secure-password
      - GF_USERS_ALLOW_SIGN_UP=false
    restart: always

volumes:
  prometheus_data:
  loki_data:
  grafana_data:

Prometheus Configuration

Create prometheus.yml:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node'
    static_configs:
      - targets: ['node-exporter:9100']

  - job_name: 'docker'
    static_configs:
      - targets: ['cadvisor:8080']

  - job_name: 'your-app'
    static_configs:
      - targets: ['your-app:8080']
    metrics_path: /metrics

Loki Configuration

Create loki-config.yml:

auth_enabled: false

server:
  http_listen_port: 3100

common:
  ring:
    instance_addr: 127.0.0.1
    kvstore:
      store: inmemory
  replication_factor: 1
  path_prefix: /loki

schema_config:
  configs:
    - from: 2020-10-24
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

storage_config:
  filesystem:
    directory: /loki/chunks

Promtail Configuration

Create promtail-config.yml to ship logs to Loki:

server:
  http_listen_port: 9080

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://loki:3100/loki/api/v1/push

scrape_configs:
  - job_name: docker
    docker_sd_configs:
      - host: unix:///var/run/docker.sock
        refresh_interval: 5s
    relabel_configs:
      - source_labels: ['__meta_docker_container_name']
        target_label: 'container'
      - source_labels: ['__meta_docker_container_log_stream']
        target_label: 'stream'

Essential PromQL Queries

Once running, use these queries in Grafana:

CPU usage percentage:

100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

Memory usage:

(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100

Request rate:

sum(rate(http_requests_total[5m])) by (status_code)

Essential LogQL Queries

Query logs in Grafana using LogQL:

Filter by container:

{container="your-app"} |= "error"

Parse JSON logs:

{container="your-app"} | json | level="error"

Count errors over time:

count_over_time({container="your-app"} |= "error" [5m])

Connecting Metrics and Logs

The magic happens when you link them. In Grafana, create a data link from your Prometheus panels to Loki:

  1. Edit your metrics panel
  2. Go to "Field" tab → "Data links"
  3. Add link with URL: /explore?left=["now-1h","now","Loki",{"expr":"{container=\"your-app\"}"}]

Now clicking a metric spike opens the corresponding logs.

Setting Up Alerts

Create alerting rules in Prometheus that fire when things go wrong:

groups:
  - name: infrastructure
    rules:
      - alert: HighErrorRate
        expr: sum(rate(http_requests_total{status_code=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) > 0.05
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Error rate above 5%"

      - alert: ServiceDown
        expr: up == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Service {{ $labels.job }} is down"

Configure Grafana to send these alerts to Slack, PagerDuty, or email through the Alerting section in the UI.

Troubleshooting

Loki not receiving logs: Check Promtail is running and can reach Loki. Verify with curl http://localhost:3100/ready.

High cardinality warnings: Avoid labels with unbounded values. Container name is fine, request ID is not.

Grafana can't connect to Prometheus: Ensure you're using the Docker service name (prometheus:9090) not localhost when configuring datasources.

Logs delayed: Promtail batches logs before sending. Reduce batch_wait in the config for near-real-time ingestion.

Dashboard slow to load: Use recording rules in Prometheus to pre-compute expensive queries. This dramatically improves dashboard performance.

Deploy on Elestio

Setting up and maintaining this stack takes effort. Elestio offers managed versions of all three components:

Each service comes with automated backups, updates, and pre-built configurations. Get your full observability stack running in minutes, starting at ~$16/month per service.

What's Next

Add Tempo for distributed tracing to complete the stack. With metrics, logs, and traces unified in Grafana, you can trace a request from the frontend, through your microservices, to the database, seeing exactly where time is spent and what went wrong.

The days of paying $50,000+/year for Datadog are over. Self-hosted observability works, and it works well.

Thanks for reading!