175,000 Ollama Servers Are Wide Open Right Now (Is Yours One of Them?)

175,000 Ollama Servers Are Wide Open Right Now (Is Yours One of Them?)

Researchers from SentinelOne and Censys just dropped a report that should make every self-hosted AI enthusiast nervous: 175,108 Ollama instances across 130 countries are sitting on the public internet with zero authentication. No password, no API key, no nothing.

Nearly half of them have tool-calling capabilities enabled. Attackers are already exploiting them through a campaign called "Operation Bizarre Bazaar," hijacking exposed instances for free AI compute and reselling access at 40-60% discounts through underground marketplaces.

If you're running Ollama, here's how to check if you're one of them, and how to fix it.

Why This Keeps Happening

Ollama's default behavior is actually safe. It binds to 127.0.0.1:11434, meaning localhost only. The problem starts when developers change that default to access Ollama from other machines or Docker containers:

# This one line exposes your Ollama to the entire internet
OLLAMA_HOST=0.0.0.0

Or in Docker Compose, the default port mapping is the culprit:

# DANGEROUS - binds to all interfaces
ports:
  - "11434:11434"

# SAFE - binds to localhost only
ports:
  - "127.0.0.1:11434:11434"

Here's the part that catches most people: Docker bypasses UFW and iptables rules when publishing ports. You can have a perfectly configured firewall that blocks port 11434, and Docker will happily punch right through it. Your firewall logs show nothing wrong while your Ollama API responds to the entire internet.

Check If You're Exposed (Right Now)

Run this on your server:

# Check what interface Ollama is listening on
ss -tlnp | grep 11434

If you see 0.0.0.0:11434 or :::11434, you're exposed. You want to see 127.0.0.1:11434.

For Docker deployments, check your port bindings:

docker ps --format "table {{.Names}}\t{{.Ports}}" | grep ollama
# SAFE: 127.0.0.1:11434->11434/tcp
# DANGEROUS: 0.0.0.0:11434->11434/tcp

Then test from outside your network:

curl -s http://<your-public-ip>:11434/api/tags

If you get a JSON response listing your models, anyone on the internet can see them too. And it gets worse: they can also pull your model weights, push poisoned models, and use your GPU for their own inference workloads.

What Attackers Actually Do With Exposed Ollama

This isn't theoretical. SentinelOne documented 7.23 million observations over 293 days. GreyNoise captured 91,403 attack sessions in just four months. Here's what's happening:

Compute hijacking (LLMjacking): Attackers run inference on your hardware while you pay the electricity bill. The Operation Bizarre Bazaar campaign resells stolen compute through a marketplace called silver.inc, offering 30+ LLM providers at steep discounts.

Model theft: The /api/push endpoint has no authorization. Attackers can exfiltrate all your model weights to their own registry with a single API call.

Model poisoning: The /api/pull endpoint accepts HTTP sources. Attackers can inject malicious models into your instance.

Remote code execution: CVE-2024-37032 ("Probllama") allowed attackers to write arbitrary files through path traversal in the pull endpoint. On unpatched instances, that's full system compromise.

The Fix: Nginx Reverse Proxy with Auth

The proper solution is never exposing Ollama directly. Put it behind a reverse proxy with authentication and block dangerous endpoints:

version: "3"
services:
  ollama:
    image: ollama/ollama
    # NO ports exposed - internal only
    volumes:
      - ollama_data:/root/.ollama
    restart: always

  nginx:
    image: nginx:latest
    ports:
      - "172.17.0.1:8443:80"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
      - ./.htpasswd:/etc/nginx/.htpasswd:ro
    depends_on:
      - ollama
    restart: always

volumes:
  ollama_data:

Generate your auth credentials:

docker run --rm httpd:alpine htpasswd -nb ollama-user 'YourStr0ngP@ss' > .htpasswd

And the Nginx config:

events {}
http {
    limit_req_zone $binary_remote_addr zone=ollama:10m rate=10r/s;

    server {
        listen 80;
        limit_req zone=ollama burst=20 nodelay;

        location / {
            auth_basic "Restricted";
            auth_basic_user_file /etc/nginx/.htpasswd;
            proxy_pass http://ollama:11434;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_connect_timeout 300s;
            proxy_send_timeout 300s;
            proxy_read_timeout 300s;
            proxy_buffering off;
        }

        # Block dangerous endpoints
        location /api/push { return 403; }
        location /api/pull { return 403; }
        location /api/delete { return 403; }
        location /api/create { return 403; }
    }
}

This gives you authentication, rate limiting, and blocks the endpoints that enable model theft and poisoning. The extended timeouts (300s) are necessary because LLM inference takes longer than typical HTTP requests.

The Easier Path

If you'd rather not manage reverse proxies and auth configs yourself, Elestio deploys Ollama with all of this handled out of the box. Every instance binds to the Docker bridge network (172.17.0.1), sits behind an Nginx reverse proxy with automated SSL via Let's Encrypt, and never exposes the raw Ollama port to the internet. Starting at $16/month on NVMe storage, with automated backups, updates, and monitoring included.

Troubleshooting

"Connection refused" after binding to localhost: If your Open WebUI or other frontend can't reach Ollama after fixing the binding, use Docker internal networking instead of port mapping. Put both services on the same Docker network and reference Ollama by container name (http://ollama:11434).

Docker bypassing your firewall: Add "iptables": false to /etc/docker/daemon.json to prevent Docker from manipulating iptables rules. Then manage port exposure manually.

Performance drop after adding Nginx: Ensure proxy_buffering off is set in your Nginx config. Buffering breaks streaming responses and can cause timeouts during long inference operations.

The 175,000 exposed instances aren't running some exotic misconfiguration. They're running the most common Docker setup you'll find in any tutorial. The fix takes five minutes. Don't be one of them.

Thanks for reading! See you in the next one.