Data Unchained

Unleash the full potential of your data.

Latest Posts

The Tyranny of the Pipeline: Breaking Free from 40 Years of Conceptual Inertia

“Reverse ETL.” We’ve created an entire category of tooling to acknowledge that data flowing in one direction is so fundamental to our thinking that moving it the other way requires special designation. This is not innovation. This is evidence of a problem so deeply embedded in our data architectures

How to Stop Rebuilding Pipelines Every Time AI Techniques Change

Challenge: Your Pipelines Can’t Keep Up With AI’s Pace Two years ago, your team built pipelines to chunk documents for RAG. Carefully tuned: 512 tokens per chunk, 20% overlap, semantic boundaries respected. It took three months to get right. Then context windows exploded to 200K tokens. Suddenly, aggressive

How to Turn Dark Data Into AI Training Gold

Challenge: Your Best AI Training Data Is Gathering Dust Most companies have been hoarding data for years. Event logs from 2019. Customer interactions from three product versions ago. Raw sensor data that nobody ever queried. This dark data sits in S3 buckets and data lakes, technically accessible but practically useless.

An Honest Architecture

For decades, the language around data has barely changed. Every few years a new architecture or philosophy rises. We hear about data lakes, warehouses, meshes, fabrics, and observability platforms. Each is a promise to finally tame the chaos of data management. Billions have been invested across multiple generations of tooling

We Lost the Thread on the Data Lake

In 2014, my last startup was acquired. We joined a fast growing organization with a top-notch data team. They had invested heavily in data infrastructure. Data was strategic. They had "the hub," a Hadoop cluster built on HDFS. I thought: here's a company doing things right.

From Data Hoarding to AI-Ready: Making Your Data Actually Useful

Live Webinar ᐧ January 8 ᐧ 11 AM PT / 2 PM ET You’re storing terabytes of data “just in case.” But when AI initiatives launch, that data is inaccessible, poorly formatted, or locked behind a six-month pipeline project. Sound familiar? You’re paying thousands in storage costs for data

How to Make AI Training Data Reproducible and Debuggable

The Challenge 70% of AI projects fail on data quality and integration, not models. But even teams with clean data struggle when they can’t reproduce their training runs. Your model’s accuracy dropped from 87% to 64%. But why? Was it bad training data? A schema change upstream? Preprocessing

You’re Not Bad at Data. Your Infrastructure Just Makes You Think You Are.

I wrote a post about thinking past medallion architectures. That one went a little deeper about the architectural characteristics that make thinking in “medallions” unnecessary. You don’t need to internalize all that. I’m guessing you sense that data just doesn’t work, even with the fancy medallion architecture.

Your Teams Are Making Shadow Copies of Everything

Let’s talk about something nobody wants to admit. Your marketing team has their own copy of customer data. Sales has a different version. Product is maintaining yet another extract. Finance built their own dashboard using data they pulled last month. Each team has created their own shadow copy of

Popular Tags