Airflow 3 on Elestio: Build Production Data Pipelines with TaskFlow API, Dynamic Task Mapping, and Deferrable Operators

Airflow 3 on Elestio: Build Production Data Pipelines with TaskFlow API, Dynamic Task Mapping, and Deferrable Operators

Every data team I've talked to in the past year has the same story: they started with cron jobs, graduated to a janky Python script runner, and eventually hit a wall when pipeline number forty-seven failed silently at 3 AM. That's the moment most teams discover Apache Airflow.

With Airflow 3, the project took a massive leap forward. The TaskFlow API makes DAGs feel like actual Python code, dynamic task mapping eliminates copy-paste parallelism, and deferrable operators stop your workers from burning resources while waiting on external systems. If you've been putting off upgrading (or deploying Airflow entirely), now's the time.

Here's how to get a production-ready Airflow 3 stack running on Elestio in minutes, and how to use its three most powerful features.

Deploy Airflow 3 on Elestio

Skip the hours of Docker Compose YAML wrangling. Elestio gives you a fully managed Airflow instance with automated SSL, backups, and monitoring out of the box.

  1. Go to the Airflow service page and click Create Service
  2. Pick your cloud provider (Hetzner, Netcup, DigitalOcean, Vultr, etc.)
  3. Select at least 4 CPU / 8 GB RAM for production workloads. Airflow's scheduler and workers are memory-hungry, especially with CeleryExecutor
  4. Click Deploy and wait a few minutes

Once deployed, you'll get full SSH access to the VM, a running Docker Compose stack, and the Airflow web UI behind an Nginx reverse proxy with automatic SSL.

You can access your instance's configuration by running:

docker exec -it airflow-webserver bash
cd /opt/airflow

The TaskFlow API: DAGs That Read Like Python

Before TaskFlow, writing an Airflow DAG meant defining operators, wiring XCom pushes and pulls manually, and ending up with code that looked nothing like the data logic you were trying to express. TaskFlow fixes that entirely.

Here's a production-ready ETL pipeline using TaskFlow:

from airflow.decorators import dag, task
from datetime import datetime

@dag(schedule="@daily", start_date=datetime(2026, 1, 1), catchup=False)
def customer_etl():

    @task
    def extract():
        """Pull raw data from your source."""
        import requests
        response = requests.get("https://api.example.com/customers")
        return response.json()

    @task
    def transform(raw_data: dict):
        """Clean and normalize records."""
        return [
            {"name": r["name"].strip().title(), "email": r["email"].lower()}
            for r in raw_data["results"]
        ]

    @task
    def load(clean_data: list):
        """Insert into your database."""
        from airflow.providers.postgres.hooks.postgres import PostgresHook
        hook = PostgresHook(postgres_conn_id="my_db")
        for record in clean_data:
            hook.run(
                "INSERT INTO customers (name, email) VALUES (%s, %s)",
                parameters=(record["name"], record["email"])
            )

    raw = extract()
    cleaned = transform(raw)
    load(cleaned)

customer_etl()

No XCom boilerplate. No operator wiring. Dependencies are inferred from function calls. If you know Python, you already know TaskFlow.

Dynamic Task Mapping: Parallel Work Without the Copy-Paste

Here's a scenario that used to be painful: you need to process a variable number of files that land in your S3 bucket every day. Some days there are 3 files, some days 300. Before dynamic task mapping, you'd either hardcode a max number of tasks or write custom logic to generate the DAG file at parse time.

Now you just use .expand() inside your DAG:

from airflow.decorators import dag, task
from datetime import datetime

@dag(schedule="@daily", start_date=datetime(2026, 1, 1), catchup=False)
def s3_file_processor():

    @task
    def list_files():
        """Return whatever files are available today."""
        from airflow.providers.amazon.aws.hooks.s3 import S3Hook
        hook = S3Hook(aws_conn_id="my_s3")
        return hook.list_keys(bucket_name="incoming-data", prefix="daily/")

    @task
    def process_file(file_key: str):
        """Process a single file."""
        from airflow.providers.amazon.aws.hooks.s3 import S3Hook
        hook = S3Hook(aws_conn_id="my_s3")
        content = hook.read_key(file_key, bucket_name="incoming-data")
        # Your processing logic here
        return f"Processed {file_key}"

    files = list_files()
    process_file.expand(file_key=files)

s3_file_processor()

Airflow creates one task instance per file at runtime. Three files? Three tasks. Three hundred? Three hundred tasks. Zero changes to your DAG code.

Deferrable Operators: Stop Wasting Worker Slots

This one's subtle but impactful. Traditional operators that wait on external systems (an API call, a file sensor, a database query) keep a worker slot occupied the entire time. If your sensor is polling every 30 seconds for a file that arrives hours later, that's a worker doing nothing useful.

Deferrable operators hand off the waiting to a lightweight triggerer process. The worker gets freed up immediately.

from airflow.providers.http.sensors.http import HttpSensor

check_api = HttpSensor(
    task_id="wait_for_api",
    http_conn_id="my_api",
    endpoint="/status",
    response_check=lambda response: response.json()["ready"] is True,
    deferrable=True,
    poke_interval=60,
    timeout=7200,
)

The triggerer uses asyncio under the hood, so a single triggerer process can handle hundreds of deferred tasks simultaneously. For teams running lots of sensors or external waits, this alone can cut your infrastructure costs significantly.

Make sure you have the triggerer running in your Docker Compose setup:

airflow-triggerer:
  image: apache/airflow:3.0.1
  command: triggerer
  restart: always
  environment:
    - AIRFLOW__CORE__EXECUTOR=CeleryExecutor

Monitoring Your Pipelines

Airflow exposes a StatsD-compatible metrics endpoint. On Elestio, you can pair it with a Grafana + Prometheus stack for full pipeline observability.

Key metrics to track:

Metric What It Tells You
dag_processing.total_parse_time How long it takes to load your DAGs
executor.open_slots Available worker capacity
scheduler.tasks.running Currently executing tasks
scheduler.tasks.starving Tasks waiting for a free slot

If tasks.starving stays high, you need more workers. If total_parse_time keeps growing, you've got too many DAGs or complex imports slowing down the scheduler.

Troubleshooting

Tasks stuck in "queued" state: Usually means your Celery workers can't connect to the message broker (Redis). Check your Redis connection:

docker exec -it airflow-worker bash
python -c "from celery import Celery; app = Celery(); app.config_from_object('airflow.settings')"

DAGs not appearing in the UI: The scheduler parses DAGs from the dags/ folder. Make sure your Python files are in the correct mounted volume and have no import errors:

docker exec -it airflow-scheduler airflow dags list

High memory usage on workers: Large XCom payloads between tasks are the usual culprit. For datasets over a few MB, write to external storage (S3, database) instead of passing through XCom.

Triggerer not picking up deferred tasks: Confirm the triggerer service is running and connected to the same metadata database:

docker compose logs airflow-triggerer

Wrapping Up

Airflow 3 isn't just an incremental update. TaskFlow, dynamic task mapping, and deferrable operators represent a fundamental shift in how you build data pipelines: less boilerplate, more flexibility, better resource efficiency.

And you don't need to spend a week configuring Docker Compose, Celery, Redis, and Postgres yourself. Deploy Airflow on Elestio and get a production-ready stack in minutes, starting at $29/month for a 4 CPU / 8 GB RAM instance.

Thanks for reading! See you in the next one.