Opinion & Analysis
Written by: David Tuppen | Group Chief Data Officer, Enstar Group
Updated 7:34 PM UTC, Thu December 19, 2024
The data management landscape has come full circle. From the structured and business-focused world of traditional data warehousing to the flexible but chaotic era of big data and back again, today’s data strategies reflect a blending of old principles with new technologies.
The rise of democratized architectures like the data mesh builds on the strengths of traditional approaches, such as Ralph Kimball’s domain-driven methodologies, while addressing modern scalability and autonomy challenges. Kimball is one of the original architects of data warehousing.
But this journey hasn’t been a straight line. Along the way, with the lengthy wait of failing data warehouse projects, we detoured into the era of big data, where the promise of flexibility, and faster ingestion of data, often outpaced the reality of usability. As organizations learn from the past, the industry is rediscovering the value of domain-led thinking while striving to modernize its application.
In the early 2010s, big data technologies like Hadoop shifted the industry’s focus. The appeal was clear: A distributed, scalable way to store and process vast amounts of data, with schema-on-read enabling flexibility and experimentation. Organizations embraced centralized data lakes to house everything, structured, semi-structured, and unstructured data alike.
Yet, this flexibility came with challenges:
Lack of structure: Without predefined schemas or strong governance, data lakes often lacked the organization needed for meaningful insights.
Data swamps: Over time, many lakes devolved into swamps, where the sheer volume of raw, uncurated data made it nearly impossible to navigate or use effectively.
Limited business relevance: The focus on technology over alignment with business needs left many big data implementations disconnected from decision-making processes.
While these platforms addressed scalability, they sacrificed the clarity and usability that had been hallmarks of traditional data warehousing.
As organizations faced the realities of big data’s limitations, they began to revisit the principles of traditional data warehousing. Among these, for me, Kimball’s approach stands out for its enduring relevance. His methodologies emphasized:
Domain-specific data marts: Organizing data by business function (e.g., finance, claims…) ensured alignment with specific needs.
Shared dimensions: Common dimensions like time, product, and policy created consistency across domains, enabling cross-functional analysis.
Business-driven design: Integrated collaboration between IT and business stakeholders ensured data models reflected real-world processes and language.
These principles addressed many of the issues that plagued data lakes, offering a way to structure data for both usability and scalability.
Modern architectures like the data mesh borrow heavily from the domain-driven thinking of traditional data warehousing (Although, I don’t think many like to admit it). They aim to address the challenges of centralized models while introducing new capabilities:
Domain ownership: Data mesh decentralizes responsibility, making individual domains accountable for their own data products.
Distributed infrastructure: Instead of a central repository, data is stored and processed across a distributed network, leveraging cloud-native tools.
Self-service for users: Democratized architectures prioritize accessibility, enabling business users to directly interact with data without relying on IT.
While these innovations solve many scalability and agility issues, they also introduce challenges, particularly around consistency and alignment.
One of the biggest drawbacks of data mesh is the lack of shared dimensions, or shared central master data, a feature that was central to Kimball’s methodology. In a distributed model, each domain often develops its own definitions and data structures, leading to:
Inconsistencies across domains: Without shared dimensions, it becomes difficult to compare or aggregate data between functions.
Reinvention of common concepts: Domains may duplicate efforts to create their own versions of common dimensions, like customer or product.
Fragmented insights: The absence of unified dimensions can hinder cross-domain reporting and strategic decision-making.
This is where modern architectures can learn from the past. By incorporating shared dimensions into the data mesh model, organizations can maintain the consistency needed for cross-functional analysis while still benefiting from decentralized ownership.
The evolution of data platforms, from traditional warehouses to data lakes, and now to democratized architectures, reflects the industry’s constant search for balance. While big data highlighted the potential of flexibility, it also showed us the dangers of abandoning structure.
Many organizations found themselves overwhelmed by data swamps, with massive repositories of information but no practical way to use it.
Today, the pendulum is swinging back toward structure and business alignment. Democratized architectures like data mesh offer the scalability and autonomy demanded by modern enterprises, but they succeed best when paired with the lessons of traditional warehousing, particularly the need for shared dimensions and strong governance.
The return to domain-led architectures represents a reconciliation of old and new. By revisiting principles like Kimball’s shared master data, and integrating them into modern frameworks, organizations can build data platforms that are both scalable and aligned with business goals.
This isn’t about rejecting the lessons of big data or embracing the rigid models of the past. Instead, it’s about finding balance, a way to harness the flexibility of modern tools while maintaining the structure and clarity that makes data useful.
The future of data management will be built on this foundation, where the best ideas from every era come together to create platforms that truly deliver value, to each unique business unit, within an organization. There is no one-size-fits-all.
About the Author:
At the helm of Enstar Group’s data strategy, David Tuppen’s role as Chief Data Officer encompasses spearheading data platform modernization and guiding enterprise-wide transformation initiatives. Championing enterprise data management and client solutions, he drives the realization of data’s full potential to underpin decision-making and operational excellence.
Tuppen’s previous leadership roles include organizations such GFT Technologies, Wipro, and Athene, particularly in Data & AI and the insurance sector, which has enriched his approach to customer centric data solutions. His team’s commitment to delivering transformative data solutions aligns with his goal of propelling organizations to the forefront of data-driven innovation.