Picture this: You’re in an executive meeting. The company just acquired another business, and the CEO wants to change how you calculate monthly active users to include the new customer base. Simple request, right?
“That’ll be six months,” comes the response from the data team.
Six months?! To change a KPI?! For data that already exists?!
This isn’t incompetence. It’s not a staffing problem. It’s the inevitable result of building modern data systems on mental models that haven’t evolved since the 1980s.
Every piece of data infrastructure today is built around the same core concept: the pipeline. Extract, transform, load. (Or extract, load, transform.) Point A to point B. Single purpose, one direction, one destination.
This made sense when databases were expensive, storage was scarce, and compute was precious. You planned your data flows carefully because changing them was costly. You built pipelines for specific use cases because building them at all was a major undertaking.
But that world doesn’t exist anymore. Today’s requirements change constantly. Use cases evolve daily. Teams need data in formats that weren’t imagined when the original pipeline was built.
We’re still building systems as if it’s 1985.
The pipeline paradigm embeds three assumptions that seem reasonable but create cascading problems:
Assumption 1: Getting data is the hard part This leads us to centralize everything in massive warehouses and lakes. Once we’ve done the hard work of extracting data, we never want to do it again. So we build these gravitational centers where all data must flow.
Assumption 2: You know your use case upfront Pipelines are built for specific destinations and specific transformations. They encode business logic, data models, and assumptions about how the data will be used. Change any of these, and you’re re-architecting or rebuilding the pipeline.
Assumption 3: Transformation is understood at pipeline creation time Traditional ETL transforms data when it’s extracted. This locks the data into a specific shape for a specific purpose. Want the same data in a different format? Build another pipeline.
These assumptions made sense forty years ago. They’re actively harmful today.
When your mental model is wrong, every solution you build reinforces the problem. Consider what happens when you accept pipeline thinking:
You centralize data because extracting it is “hard.” This creates data gravity — everything gets pulled toward the center because moving it again is expensive. Teams can’t get data in the formats they need, so they create shadow copies. Those copies drift out of sync, creating data quality problems. You build governance frameworks to manage the chaos instead of questioning why the chaos exists.
Meanwhile, simple changes become major engineering projects. Want to add a field to a report? Modify the pipeline, test the changes, coordinate deployments. Want to experiment with a new data model? Plan for months of migration work.
The system optimizes for stability over agility. Change is the enemy. Innovation slows to a crawl.
The reality of how we use data today: It’s messy, unpredictable, and constantly changing. Marketing needs customer data in one format for campaign analysis, another for attribution modeling, and yet another for predictive scoring. Sales wants the same customer data joined with opportunity records for pipeline forecasting. Product needs it combined with usage analytics for feature planning.
None of these teams know exactly what they’ll need next month. And now add AI to the mix: Data scientists need training sets rebuilt from historical data, ML engineers need real-time features for inference, and every team wants to experiment with embeddings, agents, or retrieval systems. Requirements evolve based on business conditions, competitive pressure, and new opportunities, and increasingly, based on which AI capabilities unlock next.
We need systems that embrace this reality instead of fighting it.
What if we flipped our assumptions? What if getting data was easy, transformation was dynamic, and systems were built for change rather than stability?
This isn’t theoretical. Consider how we think about read replicas in databases. Creating a new replica is trivial. If it falls behind, we understand it. If it gets corrupted, we rebuild it. There’s no fear around read replicas because the mental model is sound: They’re disposable, replaceable, and designed for their specific purpose.
Now imagine applying this thinking to all data flows. Instead of permanent pipelines that lock data into specific shapes, you have late-transformations that create data in the format you need at the moment you need it. Instead of fearing change, you embrace it because transformation is mechanical and predictable.
The industry’s response to pipeline problems has been to build better pipelines. Smarter ETL tools. Shuffle the E’s, T’s, and L’s. More sophisticated orchestration. Automated data catalogs. Each promises to solve the problems created by pipeline thinking, but they all accept the fundamental model.
This is like trying to solve traffic problems by building faster horses. The issue isn’t execution — it’s the mental model itself.
Real progress requires stepping back and questioning the assumptions we’ve inherited. Why do we accept that changing a KPI takes six months? Why do we build systems that make experimentation expensive? Why do we create architectures where simple changes become complex engineering projects?
Every organization has projects sitting in backlogs marked “impractical” or “impossible.” Not because the technology doesn’t exist, but because the cost and complexity of implementation is too high given current architectures.
Every team has workarounds and shadow systems because getting data through official channels is too slow or too rigid. Every data engineering team spends more time maintaining existing pipelines than building new capabilities. Meanwhile, AI teams wait months for training data that should take hours to prepare.
This isn’t the inevitable cost of working with data. It’s the cost of working with data using mental models designed for a different era.
The assumptions that shaped data infrastructure in the 1980s served their purpose. Computing was expensive, storage was limited, and requirements changed slowly. Rigid, purpose-built pipelines got the job done.
But we’re not living in the 1980s anymore. We have abundant compute, cheap storage, and requirements that change daily. We need mental models that match this reality.
The question isn’t whether your current data infrastructure works. The question is whether it’s built on assumptions that still make sense. And if it’s not, how much longer are you willing to accept the cost of outdated thinking?
If six-month timelines for simple changes sound familiar, we should talk.