“Reverse ETL.” We’ve created an entire category of tooling to acknowledge that data flowing in one direction is so fundamental to our thinking that moving it the other way requires special designation. This is not innovation. This is evidence of a problem so deeply embedded in our data architectures that we can no longer see it.
The pipeline is the unifying building block of modern data infrastructure. It’s how we conceptualize and think about data infrastructure. But this same framework constrains us. By thinking in pipelines, we limit our imagination for what data infrastructure could be and how it might work differently.
When data warehousing emerged as a concept in the mid-1980s, it solved a real problem. Running analytics queries against production databases would grind operations to a halt. Storage was prohibitively expensive, measured in thousands of dollars per megabyte. Systems were designed to conserve it at all costs. The solution was elegant for its time: Extract data from the source, transform and model it to make it efficient for analytical queries, load it into a dedicated analytical system. Point to point. Single purpose. One way.
These constraints made sense 40 years ago. They make no sense today.
This is conceptual inertia. The pipeline model carries embedded assumptions from an era when getting data was genuinely hard, when storage was genuinely expensive, when use cases were stable, and when businesses changed more slowly. None of these conditions remain true. But the pipeline endures and, with it, all of its constraints.
Listen to how people talk about their data infrastructure. They say “the BI pipeline” or “the ML pipeline” or “the analytics pipeline.” The use case is baked into the architecture itself. This is not a feature. This is a fundamental limitation masquerading as organization.
The pipeline encodes assumptions that now constrain every data organization:
Single direction. Data flows one way. Upstream to downstream. Source to destination. The existence of reverse ETL as a category proves how deeply this unidirectional thinking has penetrated. We treat bidirectional data flow as an exception requiring specialized tooling rather than a natural pattern of how organizations actually use data.
Single destination. Each pipeline connects system A to system B. Point to point. This made sense when you had one production database and one data warehouse. It makes no sense in a world of distributed systems, specialized data stores, AI models, vector databases, and constantly evolving use cases. Yet tools like Fivetran, Airbyte, and Hightouch have built entire businesses on top of this model, and we persist in building data architectures as collections of bespoke point-to-point connections, each engineered for a specific target.
Extraction as the model. The “E” in ETL is not neutral. It encodes a power relationship. The source is something to be extracted from, a bone from which marrow must be pulled. The team managing that source system is not a partner in the data flow. They are merely an obstacle to be worked around. This extraction model creates organizational friction and technical complexity that compounds with every new integration.
Batch processing as default. We still do nightly dumps. In 2026. We batch data because that is what pipelines did in the 1980s, when moving data was expensive and computing resources were scarce. Streaming data continuously makes more sense for most use cases today, especially AI systems that need fresh, accurate data to perform. But batch processing remains the default because it’s what the pipeline model assumes.
Transformation as hidden choices. In traditional ETL, transformations occur inside the pipeline, opaque and intermingled with connection logic. Application-specific business logic becomes embedded in infrastructure code. The result is data architectures where no one can fully reason about what transformations have been applied, in what order, or why. This hidden complexity is a primary driver of the brittleness that every data team experiences when trying to change or extend their systems, and it’s precisely why AI projects stall long before a model is ever trained.
Mutable state as the foundation. Current state architectures require distributed systems to coordinate updates everywhere data exists simultaneously. When Alice updates her email address, every system must overwrite the old value with the new one. This coordination is complex, error-prone, and fundamentally at odds with how distributed systems actually want to work. Yet, we treat it as inevitable because the earliest data systems, designed when storage was expensive, made this trade-off.
A CEO asks to change how monthly active users are calculated. The head of data says this takes six months. The CEO asks why. “Don’t we already have this data? How is this possibly six months?”
This is the data paradox. We have more data than ever. We have more powerful tools than ever. We have larger data teams than ever. Yet, organizations consistently report that using data remains painfully difficult. A large athletic wear company employs over 800 data engineers and still reports deep dissatisfaction with their ability to use data at scale.
The problem is not the people. The problem is not the tools. The problem is the foundational pattern. When you build everything on pipelines, every new use case requires a new pipeline. Every change requires modifying existing pipelines. Every experiment requires engineering resources to build, test, and maintain yet another bespoke integration. Organizations become afraid to change their data systems because the cost and complexity of coordination is too high.
This fear is not irrational. It is the rational response to infrastructure built on assumptions that no longer hold.
Consider what becomes possible if you deconstruct the pipeline. Separate collection from emission. Store data immutably in time-ordered logs rather than as current state. Keep data in source-aligned domain models rather than pre-transforming it for specific use cases.
You can now collect data without a use case. You can pause destinations without stopping collection. You can add new uses without building new pipelines. You can replay data to populate systems that did not exist when the data was originally generated, including AI models, vector stores, and RAG pipelines your team is building today. You can run multiple transformations of the same data simultaneously. You can experiment freely because adding a new model or destination does not require coordinating with source systems or modifying existing flows.
This is not theoretical. Organizations that have moved beyond pipeline thinking report completing projects in months instead of two years. New use cases that would have required weeks of engineering effort become available in hours. And AI teams that were previously blocked on data access (submitting tickets, waiting for sprints) suddenly ship models instead. Legacy migrations become trivial because new systems can be populated from immutable logs without touching existing pipelines. Product strategies change because data that was locked in specific systems can suddenly be reused in unexpected ways.
The pattern breaks when you stop thinking about data as something that lives in places and start thinking about it as something that flows through time. Current state is just a materialization of accumulated facts. If you store the facts immutably, you can materialize any view you need, whenever you need it, without coordinating with the original source.
Breaking free of pipeline thinking requires recognizing that the constraints that shaped our data architectures no longer apply. We have new cost and power realities for storage and computation. Use cases change constantly. AI has added an entirely new class of consumers to your data: Consumers that are insatiable, real-time, and deeply unforgiving of stale or poorly shaped data. Organizations are genuinely distributed systems now, not single databases with occasional reporting needs.
The question is not whether to continue using pipelines. The question is whether to continue allowing 40-year-old assumptions to constrain what is possible with your data.
Every time you hear someone say “that will take six months” to answer a simple question about data you already have, you are witnessing the cost of conceptual inertia. Every time you need to build a new data pipeline for AI experiments, new use cases, or existing data, you are paying the price of point-to-point thinking. Every time you hesitate to experiment because of the complexity of data integration, you are constrained by assumptions from an era that no longer exists.
The splinter in your mind is this: You already know something is wrong. You have felt it every time a simple change or ad hoc request required weeks of engineering work. You have experienced it every time experimenting with data in a new system required months of pipeline development. You have witnessed it in the gap between what should be possible and what your current architecture allows.
The pipeline is not inevitable. It’s a choice. And like all choices made under constraints that no longer exist, it can be unmade.
Ready to see what an AI-ready data infrastructure looks like without the pipeline? Talk to a Matterbeam engineer and ask about our AI Data Prep Guarantee. If your AI experiments aren’t running faster in the first 60 days, we refund 100%.