Opinion & Analysis
Written by: Keerthi Penmatsa | Sr. Data Solution Architect, McDonald's
Updated 5:26 PM UTC, April 30, 2026

Over the last decade, most large enterprises have adopted modern data platforms such as Snowflake, Databricks, and Google BigQuery to support expanding analytics and AI use cases. Early platform decisions were often driven by feature comparisons. Over time, those differences have narrowed, while a more structural issue has emerged.
Data platform sprawl is no longer a byproduct of growth. It is becoming a hidden risk to enterprise data strategy, with no clear ownership.
In global enterprises, different teams and markets have selected platforms based on local needs, skills, and vendor relationships. What started as flexibility has evolved into fragmentation.
In the short term, this flexibility helped teams accelerate rolling out new features to the market as they used familiar tools and platforms. However, as the enterprise data needs grow and AI becomes more central to the business strategy, the costs of a fragmented platform landscape start to compound.
In a multi-platform enterprise, different markets adopt different platforms to host their customer data lakes, and often, they might have overlapping and interdependent subject areas. For example, engineering teams might source all the customer data from a particular market to be available on Snowflake while the data science teams need to pull the same data into Databricks to power their ML/AI pipelines.
Another scenario would be for the same global mobile app backend data; some markets adopt a cloud-native SaaS solution to store their transaction data whereas, others store the same subject area on Snowflake.
Across organizations, the following three patterns emerge repeatedly:
1. Duplicated data and redundant pipelines: To keep platforms synchronized, teams create and maintain cross-platform ingestion and replicate data engineering jobs. Data moves from operational systems into one platform, is transformed, and then copied again into another platform for a different team’s needs.
Each step introduces cost, latency, and risk of inconsistency. For the end consumption layer and ML/AI layer, maintaining a single source of truth becomes increasingly difficult when the “truth” exists in three different engines.
2. Slower delivery through cross-team dependencies: Teams are more tightly coupled with inter-dependencies, and a change to a core customer record in one platform ripples through multiple downstream workloads in others. Project timelines stretch as teams need to coordinate schema changes and releases.
Tight data completeness should be in place to ensure there is no data loss between these layers. Hence, what was once an easy decision now slows the entire organization.
3. Governance and compliance that don’t keep up: Implementing regulatory obligations around privacy, data residency or model explainability becomes more challenging when data is scattered across multiple clouds and engines. Strong governance frameworks should be implemented consistently across all the platforms for access control, lineage, and data quality monitoring.
Without these, stakeholders lose confidence in the integrity of the data powering critical decisions and AI products.
To make cross-platform integration seamless and reduce data duplication, many organizations started to adopt open-table formats such as Apache Iceberg (and similar technologies). These formats offer a way for different platforms to be able to read and write from a single shared storage layer.
The approach can deliver real benefits. It can reduce the number of full copies of large data sets, making it easier to share data across different platforms. However, there are still important trade-offs:
Some of these challenges with the open table format might be overcome with time; we will need to wait for the framework to be more mature.
With the evolution of AI and as the SaaS platforms mature and converge in capabilities, the core data strategy is shifting. Leaders should think about enterprise-level choices that are beyond technical features alone.
In the AI era, choosing a data platform is no longer just an engineering-only decision. It is a strategic bet that shapes how quickly the organization can turn data into business outcomes, how predictably it can manage costs, and how credibly it can demonstrate control to regulators and partners.
The following is a three-lens framework that organizations should use at the heart of the data strategy and platform decisioning. Together, these lenses offer a more realistic way for executives to think about their data platform landscape:
These choices set the foundation for how scalable, resilient, and adaptable the platform will be as enterprise data and AI demands evolve.
This lens focuses on whether the platform costs are understandable, predictable, and tied to business value.
This lens focuses on the platform’s ability to strengthen the organization’s ability to manage data responsibly across all business units.
Together, these three lenses offer a more realistic way for executives to think about their data platform landscape than any static feature comparison can.
About the Author:
Keerthi Penmatsa is a Senior Platform Engineer and strategic advisor specializing in the architecture of large-scale data and analytics ecosystems. She partners with product and technology leaders to eliminate data silos and build high-efficiency cloud ecosystems.
With a focus on modernizing enterprise data lakes and streamlining fragmented pipelines, Penmatsa bridges the gap between technical execution and business outcomes. Her work is grounded in the “Triple Threat” of modern data leadership: operational efficiency, FinOps discipline, and rigorous enterprise governance. A passionate advocate for scalable innovation, she helps organizations transform technology investments into sustainable competitive advantages.