Airflow + Supabase: Build an Automated Data Pipeline on Elestio

Airflow + Supabase: Build an Automated Data Pipeline on Elestio

If you've ever found yourself writing cron jobs to move data between services, you already know why you need a proper orchestrator. Apache Airflow is the industry standard for defining, scheduling, and monitoring data workflows — and Supabase gives you a production-ready PostgreSQL backend with a REST API, auth, and real-time subscriptions out of the box.

Together, they're a powerful self-hosted stack for building automated data pipelines without paying for Fivetran, Airbyte Cloud, or any other managed ETL service. Here's how to wire them up.

Why This Combo Works

Airflow handles the "when" and "how" — scheduling tasks, managing dependencies, retrying failures, and alerting you when something breaks. It uses DAGs (Directed Acyclic Graphs) to define workflows as Python code, which means your pipeline logic lives in version control, not in a drag-and-drop UI you can't reproduce.

Supabase handles the "where" — it's your data destination (and sometimes your source). Under the hood, it's PostgreSQL with batteries included: auto-generated REST and GraphQL APIs, row-level security, real-time change streams, and built-in auth. Your Airflow DAGs can write directly to Supabase via its API or connect to the underlying Postgres instance.

The result? A fully self-hosted data platform where you own every component, pay only for infrastructure, and can scale each piece independently.

Setting Up the Pipeline

Prerequisites

You'll need both services running. On Elestio, deploy each with one click:

Once deployed, grab your Supabase project URL and API key from the Elestio dashboard.

Step 1: Create a Supabase Connection in Airflow

In your Airflow instance, add a new connection via the admin UI:

Connection Id: supabase_postgres
Connection Type: Postgres
Host: your-supabase-instance.elest.io
Schema: postgres
Login: postgres
Password: your-db-password
Port: 5432

Alternatively, use the Supabase REST API via Airflow's HttpHook if you prefer API-based access over direct database connections.

Step 2: Write Your First DAG

Here's a minimal DAG that extracts data from an external API and loads it into a Supabase table:

from airflow import DAG
from airflow.operators.python import PythonOperator
from airflow.providers.postgres.hooks.postgres import PostgresHook
from datetime import datetime
import requests

def extract_and_load():
    # Extract: fetch data from external API
    response = requests.get("https://api.example.com/data")
    records = response.json()

    # Load: insert into Supabase (PostgreSQL)
    hook = PostgresHook(postgres_conn_id="supabase_postgres")
    hook.insert_rows(
        table="raw_data",
        rows=[(r["id"], r["value"], r["timestamp"]) for r in records],
        target_fields=["id", "value", "created_at"]
    )

with DAG(
    "api_to_supabase",
    start_date=datetime(2026, 1, 1),
    schedule_interval="@hourly",
    catchup=False,
) as dag:
    extract_load = PythonOperator(
        task_id="extract_and_load",
        python_callable=extract_and_load,
    )

This runs every hour, pulls data from an API, and inserts it into your Supabase PostgreSQL database. Airflow handles retries, logging, and scheduling automatically.

Step 3: Add Transformation Steps

For more complex pipelines, chain multiple tasks:

extract = PythonOperator(task_id="extract", ...)
transform = PythonOperator(task_id="transform", ...)
load = PythonOperator(task_id="load", ...)

extract >> transform >> load

Airflow's dependency management ensures each step runs only after the previous one succeeds. If transform fails, load won't execute — and you'll get an alert.

Real-World Use Cases

  • SaaS metrics dashboard — Pull data from Stripe, HubSpot, and GitHub APIs hourly, transform it, and load into Supabase tables that power a Metabase dashboard.
  • IoT data pipeline — Ingest sensor data via MQTT, process it in Airflow, and store aggregated results in Supabase with real-time subscriptions for live monitoring.
  • Content sync — Keep your Supabase database in sync with a headless CMS by running Airflow DAGs that detect changes and update records.

Troubleshooting

  • Connection refused errors: Make sure your Supabase instance allows external connections. On Elestio, check that the PostgreSQL port (5432) is accessible from your Airflow instance. If both services run on the same Elestio account, use the internal Docker network IP.
  • DAG not appearing in Airflow UI: Ensure your Python file is in the dags/ folder and has no syntax errors. Check Airflow scheduler logs with docker-compose logs -f airflow-scheduler.
  • Slow inserts: For bulk data loads, use COPY commands via PostgresHook.bulk_load() instead of row-by-row inserts. This can be 10-100x faster for large datasets.
  • Airflow worker memory issues: Increase worker memory in your Elestio configuration if processing large datasets. The 4 CPU / 8 GB RAM tier handles most production workloads.

The Bottom Line

Airflow + Supabase gives you a self-hosted data platform that rivals managed services costing hundreds of dollars per month. You get Airflow's battle-tested orchestration (used by Airbnb, Lyft, and thousands of data teams) paired with Supabase's developer-friendly PostgreSQL backend — all running on your infrastructure, with your data never leaving your servers.

Total cost on Elestio? Starting at around $45/month for both services. Compare that to Fivetran's per-connector pricing or Airbyte Cloud's usage-based billing, and the math isn't even close.

Deploy both on Elestio and start building your first pipeline today.

Thanks for reading ❤️ See you in the next one 👋