Airflow + Supabase: Build an Automated Data Pipeline on Elestio
If you've ever found yourself writing cron jobs to move data between services, you already know why you need a proper orchestrator. Apache Airflow is the industry standard for defining, scheduling, and monitoring data workflows — and Supabase gives you a production-ready PostgreSQL backend with a REST API, auth, and real-time subscriptions out of the box.
Together, they're a powerful self-hosted stack for building automated data pipelines without paying for Fivetran, Airbyte Cloud, or any other managed ETL service. Here's how to wire them up.
Why This Combo Works
Airflow handles the "when" and "how" — scheduling tasks, managing dependencies, retrying failures, and alerting you when something breaks. It uses DAGs (Directed Acyclic Graphs) to define workflows as Python code, which means your pipeline logic lives in version control, not in a drag-and-drop UI you can't reproduce.
Supabase handles the "where" — it's your data destination (and sometimes your source). Under the hood, it's PostgreSQL with batteries included: auto-generated REST and GraphQL APIs, row-level security, real-time change streams, and built-in auth. Your Airflow DAGs can write directly to Supabase via its API or connect to the underlying Postgres instance.
The result? A fully self-hosted data platform where you own every component, pay only for infrastructure, and can scale each piece independently.
Setting Up the Pipeline
Prerequisites
You'll need both services running. On Elestio, deploy each with one click:
- Apache Airflow on Elestio — Starting at $29/month (4 CPU, 8 GB RAM recommended for Airflow's scheduler and workers)
- Supabase on Elestio — Starting at $16/month (2 CPU, 4 GB RAM)
Once deployed, grab your Supabase project URL and API key from the Elestio dashboard.
Step 1: Create a Supabase Connection in Airflow
In your Airflow instance, add a new connection via the admin UI:
Connection Id: supabase_postgres
Connection Type: Postgres
Host: your-supabase-instance.elest.io
Schema: postgres
Login: postgres
Password: your-db-password
Port: 5432
Alternatively, use the Supabase REST API via Airflow's HttpHook if you prefer API-based access over direct database connections.
Step 2: Write Your First DAG
Here's a minimal DAG that extracts data from an external API and loads it into a Supabase table:
from airflow import DAG
from airflow.operators.python import PythonOperator
from airflow.providers.postgres.hooks.postgres import PostgresHook
from datetime import datetime
import requests
def extract_and_load():
# Extract: fetch data from external API
response = requests.get("https://api.example.com/data")
records = response.json()
# Load: insert into Supabase (PostgreSQL)
hook = PostgresHook(postgres_conn_id="supabase_postgres")
hook.insert_rows(
table="raw_data",
rows=[(r["id"], r["value"], r["timestamp"]) for r in records],
target_fields=["id", "value", "created_at"]
)
with DAG(
"api_to_supabase",
start_date=datetime(2026, 1, 1),
schedule_interval="@hourly",
catchup=False,
) as dag:
extract_load = PythonOperator(
task_id="extract_and_load",
python_callable=extract_and_load,
)
This runs every hour, pulls data from an API, and inserts it into your Supabase PostgreSQL database. Airflow handles retries, logging, and scheduling automatically.
Step 3: Add Transformation Steps
For more complex pipelines, chain multiple tasks:
extract = PythonOperator(task_id="extract", ...)
transform = PythonOperator(task_id="transform", ...)
load = PythonOperator(task_id="load", ...)
extract >> transform >> load
Airflow's dependency management ensures each step runs only after the previous one succeeds. If transform fails, load won't execute — and you'll get an alert.
Real-World Use Cases
- SaaS metrics dashboard — Pull data from Stripe, HubSpot, and GitHub APIs hourly, transform it, and load into Supabase tables that power a Metabase dashboard.
- IoT data pipeline — Ingest sensor data via MQTT, process it in Airflow, and store aggregated results in Supabase with real-time subscriptions for live monitoring.
- Content sync — Keep your Supabase database in sync with a headless CMS by running Airflow DAGs that detect changes and update records.
Troubleshooting
- Connection refused errors: Make sure your Supabase instance allows external connections. On Elestio, check that the PostgreSQL port (5432) is accessible from your Airflow instance. If both services run on the same Elestio account, use the internal Docker network IP.
- DAG not appearing in Airflow UI: Ensure your Python file is in the
dags/folder and has no syntax errors. Check Airflow scheduler logs withdocker-compose logs -f airflow-scheduler. - Slow inserts: For bulk data loads, use
COPYcommands viaPostgresHook.bulk_load()instead of row-by-row inserts. This can be 10-100x faster for large datasets. - Airflow worker memory issues: Increase worker memory in your Elestio configuration if processing large datasets. The 4 CPU / 8 GB RAM tier handles most production workloads.
The Bottom Line
Airflow + Supabase gives you a self-hosted data platform that rivals managed services costing hundreds of dollars per month. You get Airflow's battle-tested orchestration (used by Airbnb, Lyft, and thousands of data teams) paired with Supabase's developer-friendly PostgreSQL backend — all running on your infrastructure, with your data never leaving your servers.
Total cost on Elestio? Starting at around $45/month for both services. Compare that to Fivetran's per-connector pricing or Airbyte Cloud's usage-based billing, and the math isn't even close.
Deploy both on Elestio and start building your first pipeline today.
Thanks for reading ❤️ See you in the next one 👋