Grafana + Loki: Set Up Centralized Logging on Elestio
Picture the 2 a.m. version of you. Something is down, you have five servers, and your "logging strategy" is SSH plus grep plus a lot of hope. You tail one box, then another, then lose track of which terminal is which. Meanwhile the actual error scrolled past on server three ten minutes ago.
Centralized logging fixes that, and you do not need a giant Elasticsearch cluster to get it. Grafana Loki gives you one place to search every log from every service, and it is cheap to run because of one clever design choice. Let me show you how to stand it up.
Why Loki instead of the usual log stack
Most log systems full-text index everything you send them. That is powerful and also why your storage bill looks like a phone number. Loki does the opposite: it indexes only a small set of labels (like job, host, level) and stores the raw log lines compressed in cheap object storage. You query by label first to narrow things down, then filter the text.
The result is a system that feels like Prometheus but for logs, and costs a fraction of an ELK setup at the same volume.
| Piece | Job |
|---|---|
| Grafana Alloy | Collector. Tails files and containers, adds labels, ships logs to Loki |
| Loki | Storage and query engine. Indexes labels, keeps raw lines compressed |
| Grafana | The UI. Search logs with LogQL, build dashboards, wire up alerts |
One heads-up before you copy any old tutorial: the classic Loki agent, Promtail, reached end of life in March 2026. Do not build anything new on it. The replacement is Grafana Alloy, and if you already run a Promtail pipeline, we wrote a full migration guide here. Everything below uses Alloy.
Stand up the stack
The quickest route is the managed Grafana Loki deploy on Elestio, which starts around $11/month and handles SSL, backups, and updates for you. If you would rather run it yourself, here is a working Docker Compose for all three pieces:
services:
loki:
image: grafana/loki:latest
command: -config.file=/etc/loki/loki-config.yaml
ports:
- "3100:3100"
volumes:
- ./loki-config.yaml:/etc/loki/loki-config.yaml
- loki-data:/loki
restart: unless-stopped
alloy:
image: grafana/alloy:latest
command:
- run
- /etc/alloy/config.alloy
volumes:
- ./config.alloy:/etc/alloy/config.alloy
- /var/log:/var/log:ro
restart: unless-stopped
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
volumes:
- grafana-data:/var/lib/grafana
restart: unless-stopped
volumes:
loki-data:
grafana-data:
I am using latest here to keep it short. In production, pin each image to a specific version so an upgrade never surprises you.
Loki needs a small config. This single-binary setup writes to the local filesystem, which is fine for one node:
auth_enabled: false
server:
http_listen_port: 3100
common:
path_prefix: /loki
storage:
filesystem:
chunks_directory: /loki/chunks
rules_directory: /loki/rules
replication_factor: 1
ring:
kvstore:
store: inmemory
schema_config:
configs:
- from: 2024-01-01
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h
Now the Alloy collector. This tails everything in /var/log, labels it, and pushes to Loki:
local.file_match "system" {
path_targets = [{
__path__ = "/var/log/*.log",
job = "varlogs",
host = "server-1",
}]
}
loki.source.file "system" {
targets = local.file_match.system.targets
forward_to = [loki.write.default.receiver]
}
loki.write "default" {
endpoint {
url = "http://loki:3100/loki/api/v1/push"
}
}
Run docker compose up -d, open Grafana at http://your-server:3000, and add a Loki data source pointing at http://loki:3100. That is the whole pipeline.
Actually finding things with LogQL
This is where centralized logging pays off. In Grafana's Explore view, start broad and narrow down:
# Everything from a label set
{job="varlogs"}
# Only lines containing "error"
{job="varlogs"} |= "error"
# Parse JSON logs, then filter on a field
{job="varlogs"} | json | level="error"
# Turn logs into a metric: error rate over 5 minutes
sum(rate({job="varlogs"} |= "error" [5m]))
That last query is the trick people miss. LogQL can compute metrics from raw log lines, so you can graph error rates and alert on them without shipping a separate metric. Point one query at all your servers by using a shared label, and the "which terminal was that" problem disappears.
Alerting before users notice
Loki ships with a ruler that evaluates LogQL on a schedule and fires through Alertmanager. A rule that pages you when errors spike looks like this:
groups:
- name: app-errors
rules:
- alert: HighErrorRate
expr: sum(rate({job="varlogs"} |= "error" [5m])) > 5
for: 2m
labels:
severity: critical
Now the 2 a.m. version of you gets paged instead of paged-then-blamed.
Troubleshooting
- No logs showing up. Check that Alloy can actually read the files. In Docker that means mounting
/var/log(read-only is fine) and confirming your__path__glob matches real files. Alloy's own UI on port 12345 shows component health. - "per-stream rate limit" errors. You are sending too many lines under one label set, or your labels are too coarse. Add a distinguishing label, or raise
limits_config.per_stream_rate_limitin the Loki config. - Query returns nothing older than a few hours. That is retention. Set
limits_config.retention_periodand enable the compactor, otherwise Loki keeps only recent data. - Too many label values. Never put high-cardinality values (user IDs, request IDs) in labels. That blows up the index and kills performance. Keep those in the log line and filter with
|=or| json.
Worth it?
If you run more than one server, yes, without question. You stop grepping across machines, you get error-rate graphs for free, and you can alert on log patterns before customers open a ticket. Spin up a managed instance on Elestio in a couple of minutes, point Alloy at your services, and give the 2 a.m. version of you a fighting chance.
Thanks for reading ❤️ See you in the next one 👋