"Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it…"
Let's trace the evolution of the "data lake" concept The term "data lake" was coined by James Dixon, then CTO of Pentaho, in 2010. Dixon used the metaphor of a lake to contrast with the more structured "data mart" (which he compared
The developments of generative AI in the past few years have been amazing. There are promises of AI agents revolutionizing everything from customer service to software development. Now they're coming for your data infrastructure. The pitch is seductive: AI agents will automatically handle data integration, quality, and governance.
You’re Doing ETL Wrong – Here’s a Better Way If you’re still managing your ETL/ELT pipelines like it’s 2015, it’s time to rethink your approach. The recent pricing change by Fivetran is just another glaring example of why the current trend in data integration is
The lakehouse idea springs from a common pain point: warehouses excel at handling structured data and delivering strong analytics performance, but they falter when faced with unstructured data and scalability challenges. On the other hand, lakes shine with unstructured data and flexibility but struggle with governance, consistency, and transactional integrity.
“It’s just a quick ask.” Anyone who has worked with data has heard this phrase—a seemingly simple request: “Can we get a slightly modified report?” “What’s the regional breakdown for this other region?” “Just pull the data.” It sounds like something that should take minutes, maybe hours
The way we handle data is fundamentally broken. Across industries, businesses are trapped in a cycle of centralization—pushing everything into a data lake, warehouse, or some shiny new “lakehouse.” Why? Because getting data out of systems is so painful that the instinct is to store it all in one
Data is the lifeblood of modern business, driving everything from customer insights to operational efficiency. Yet, despite all the shiny tools, tech, and talent, most companies are stumbling. They’ve built impressive systems, but when it comes to making data work, they’re failing. Badly. Let’s be clear: this