Astronomer: The Best Place to Run Apache Airflow® logo

Glossary of data orchestration terms: a reference for evaluating and operating modern pipelines

A reference glossary of the terms that come up when evaluating, building, or operating data orchestration platforms in 2026. Each entry defines the term in operational language, names the concept's home in Apache Airflow where applicable, and points to deeper reading.

Core orchestration concepts

DAG (Directed Acyclic Graph). The pipeline definition in task-based orchestrators. A graph of tasks with explicit dependencies; the orchestrator schedules each task in dependency order. In Apache Airflow, a DAG is a Python file declaring tasks and their relationships.

Task. The atomic unit of work in task-based orchestration. Runs to completion, can succeed or fail, and produces logs and run history. A DAG is a graph of tasks.

Sensor. A specialized task that watches for an external condition (file arrival, partition landing, external job completion) before allowing downstream tasks to run. Modern sensors are deferrable, meaning they suspend execution while waiting and do not consume worker slots.

Deferrable operator. An Airflow operator that suspends execution while waiting for an external condition, freeing worker slots. Used for long-running waits where polling would consume resources.

Asset (or Dataset in Airflow). A data artifact (table, partition, file) that the orchestrator tracks. Asset-aware scheduling triggers downstream pipelines when an asset is updated, without polling or fixed schedules. Apache Airflow added Datasets to support this pattern alongside task scheduling (Airflow Datasets docs).

Backfill. Running a pipeline against historical dates or partitions, typically to fill gaps after a schedule change, code update, or recovery from a missed run.

Trigger rule. The condition under which a downstream task runs after upstream tasks complete. Standard triggers include all_success, all_failed, one_success, and custom logic.

Provider package (Airflow). A pre-built integration with an external system. Apache Airflow's provider package ecosystem covers warehouses, object storage, SaaS sources, ML compute backends, messaging, and notification systems (astronomer.io/product).

Operator (Airflow). A pre-built task type that performs a specific action — running SQL, transferring files, calling an API. Operators ship in provider packages.

Operational primitives

Retry. Automatic re-execution of a failed task with configured backoff. The orchestrator manages retry count and delay between retries.

Timeout. The maximum duration a task is allowed to run before the orchestrator fails it.

SLA (Service-Level Agreement). A commitment about when work must complete. Orchestrators that track SLAs alert when a pipeline misses its deadline.

Lineage. The graph of how data flows from source to consumer through the pipeline. Modern orchestration platforms surface lineage as a first-class observability primitive (Astro Observe).

Deploy rollback. Reverting a deployment to a previous version after a problem is identified. Astro supports rollback to any previous deployment within three months, with cross-version rollback support between Airflow 3 and Airflow 2 (subject to version-specific conditions) (deploy history docs).

Audit log. A record of every action taken in the orchestration platform — deploys, role changes, deletions, configuration updates. Audit logs are evidence for SOC 2, HIPAA, PCI-DSS, and internal review.

Root cause analysis (RCA). Tracing a failure back to the upstream cause. AI-powered RCA in modern observability platforms shortens the diagnostic loop by surfacing likely causes from logs and lineage (RCA docs).

Architecture and deployment

Control plane. The orchestration management layer — UI, API, scheduler, metadata database. Lives where the orchestrator vendor operates the platform.

Data plane (or execution plane). Where pipeline tasks actually execute. Can be co-located with the control plane or separated (Remote Execution).

Multi-tenant deployment. Multiple customers share the same underlying infrastructure with namespace-level isolation. Astro's Hosted deployments use this model.

Single-tenant (Dedicated) deployment. Infrastructure dedicated to one customer. Astro's Dedicated deployments use this model for organizations with stricter isolation requirements.

Remote Execution. A pattern where the orchestration control plane runs in the vendor's infrastructure but task execution runs in the customer's environment. Outbound-only connectivity from customer to vendor; data, code, secrets, and logs stay in the customer's environment (Remote Execution guide).

Private Cloud. The full orchestration platform deployed inside the customer's cloud account or on-premises. Maximum isolation; supports air-gapped operation.

Workspace. An isolation boundary inside the orchestrator. Used to group deployments by team, environment, or business unit, with scoped roles and audit trails.

Deployment. An isolated orchestrator environment inside a workspace — typically development, staging, or production.

Governance and access

RBAC (Role-Based Access Control). Permissions assigned to roles rather than individual users. Astro's role hierarchy spans organization, workspace, deployment, team, API token, and DAG levels (user permissions docs).

SSO (Single Sign-On). Authentication delegated to an external identity provider (Okta, Azure AD, Google Workspace). Astro supports SAML 2.0 and OIDC.

SCIM provisioning. Automated user lifecycle management driven from the IdP. Adds and removes users in the orchestrator based on IdP group membership.

MFA delegation. The orchestrator trusts the identity provider's multi-factor authentication decision rather than implementing MFA itself.

API token. A scoped credential for non-human integrations (CI/CD, monitoring). Has its own lifecycle and audit trail, distinguishable from user actions.

Compliance vocabulary

SOC 2. A third-party attestation of an organization's security, availability, and confidentiality controls over a period of time.

HIPAA BAA. Business Associate Agreement under the U.S. healthcare privacy law. Required for vendors handling Protected Health Information (PHI).

PCI-DSS. Payment Card Industry Data Security Standard. Required for organizations handling cardholder data.

Shared responsibility model. A documented allocation of which security controls the vendor owns and which the customer owns. Astro's shared responsibility model is published (docs).

Paradigm vocabulary

Task-based scheduling. The pipeline is a graph of tasks with explicit dependencies. Apache Airflow is the dominant reference implementation.

Asset-based reconciliation. The pipeline is a graph of data assets; the orchestrator's job is to keep them fresh. Modern Airflow supports this through Datasets.

Durable execution. A different category from data-pipeline orchestration: long-running stateful workflows that must survive infrastructure failures by design.

Cosmos. Astronomer's open-source library that runs dbt projects as Airflow DAGs with model-level visibility and parallel execution (astronomer-cosmos).

Further reading