Opinion & Analysis

How to Fix Data Platform Sprawl: 3 Patterns and 3 Steps for Better Platform Decisions

Written by: Keerthi Penmatsa | Sr. Data Solution Architect, McDonald's

Updated 5:26 PM UTC, April 30, 2026

Over the last decade, most large enterprises have adopted modern data platforms such as Snowflake, Databricks, and Google BigQuery to support expanding analytics and AI use cases. Early platform decisions were often driven by feature comparisons. Over time, those differences have narrowed, while a more structural issue has emerged.

Data platform sprawl is no longer a byproduct of growth. It is becoming a hidden risk to enterprise data strategy, with no clear ownership.

In global enterprises, different teams and markets have selected platforms based on local needs, skills, and vendor relationships. What started as flexibility has evolved into fragmentation.

Data scientists are inclined towards a platform that offers a good unified experience across engineering, ML/AI capabilities and notebooks.
Data engineering teams preferred a platform that is optimized for SQL queries and offers broader system integrations.
Other business units and market agencies made independent platform decisions aligned to their local vendor strategy.

In the short term, this flexibility helped teams accelerate rolling out new features to the market as they used familiar tools and platforms. However, as the enterprise data needs grow and AI becomes more central to the business strategy, the costs of a fragmented platform landscape start to compound.

The hidden costs of platform sprawl

In a multi-platform enterprise, different markets adopt different platforms to host their customer data lakes, and often, they might have overlapping and interdependent subject areas. For example, engineering teams might source all the customer data from a particular market to be available on Snowflake while the data science teams need to pull the same data into Databricks to power their ML/AI pipelines.

Another scenario would be for the same global mobile app backend data; some markets adopt a cloud-native SaaS solution to store their transaction data whereas, others store the same subject area on Snowflake.

Across organizations, the following three patterns emerge repeatedly:

1. Duplicated data and redundant pipelines: To keep platforms synchronized, teams create and maintain cross-platform ingestion and replicate data engineering jobs. Data moves from operational systems into one platform, is transformed, and then copied again into another platform for a different team’s needs.

Each step introduces cost, latency, and risk of inconsistency. For the end consumption layer and ML/AI layer, maintaining a single source of truth becomes increasingly difficult when the “truth” exists in three different engines.

2. Slower delivery through cross-team dependencies: Teams are more tightly coupled with inter-dependencies, and a change to a core customer record in one platform ripples through multiple downstream workloads in others. Project timelines stretch as teams need to coordinate schema changes and releases.

Tight data completeness should be in place to ensure there is no data loss between these layers. Hence, what was once an easy decision now slows the entire organization.

3. Governance and compliance that don’t keep up: Implementing regulatory obligations around privacy, data residency or model explainability becomes more challenging when data is scattered across multiple clouds and engines. Strong governance frameworks should be implemented consistently across all the platforms for access control, lineage, and data quality monitoring.

Without these, stakeholders lose confidence in the integrity of the data powering critical decisions and AI products.

Why open table formats help but don’t solve the problem

To make cross-platform integration seamless and reduce data duplication, many organizations started to adopt open-table formats such as Apache Iceberg (and similar technologies). These formats offer a way for different platforms to be able to read and write from a single shared storage layer.

The approach can deliver real benefits. It can reduce the number of full copies of large data sets, making it easier to share data across different platforms. However, there are still important trade-offs:

Performance and reliability can lag behind best-of-breed native table implementations, especially for complex workloads or strict SLAs.
Operational complexity can increase because the organization must manage different catalogs while managing the governance across multiple compute engines.
Governance, lineage, and cost management still need to be consistently implemented across the ecosystem, not just at the file/table layer.

Some of these challenges with the open table format might be overcome with time; we will need to wait for the framework to be more mature.

With the evolution of AI and as the SaaS platforms mature and converge in capabilities, the core data strategy is shifting. Leaders should think about enterprise-level choices that are beyond technical features alone.

A three-lens framework to move towards enterprise-level choices

In the AI era, choosing a data platform is no longer just an engineering-only decision. It is a strategic bet that shapes how quickly the organization can turn data into business outcomes, how predictably it can manage costs, and how credibly it can demonstrate control to regulators and partners.

The following is a three-lens framework that organizations should use at the heart of the data strategy and platform decisioning. Together, these lenses offer a more realistic way for executives to think about their data platform landscape:

1. Business considerations and operational fit

These choices set the foundation for how scalable, resilient, and adaptable the platform will be as enterprise data and AI demands evolve.

Assess the operational complexity & maintainability of the platform. They should determine to what extent they want the platform to be managed on its own versus relying on internal SRE and platform engineering teams. The additional headcount, skillset, and support covered should be a forethought.
Assess platform observability and built-in capabilities and agentic workflows for auditing, data quality monitoring, and alerting so the issues can be detected and resolved before they impact the customers or AI products.
Weigh in on choosing a cloud-native platform with the provider’s broader AI and analytics ecosystem, but also assess the vendor lock-in and cost and effort to migrate from the platform.

2. Economics and FinOps alignment

This lens focuses on whether the platform costs are understandable, predictable, and tied to business value.

Modern platforms offer consumption-based pricing, which offers a lot of flexibility, and as the data volumes, queries, and AI workloads grow, costs can rise very quickly. Leaders should focus on ease of tracking the spend and allocating it to business units.
Leaders should examine the ability to predict costs and integration with enterprise FinOps practices for centralized governance on platform spend and budget alerts.

3. Data governance and security

This lens focuses on the platform’s ability to strengthen the organization’s ability to manage data responsibly across all business units.

First, examine the platform capabilities around access control and policy enforcement. How quickly can we roll out the governance and ease of changing the policies?
Next, consider regulatory and privacy restrictions around the use of AI and assess native out-of-the-box capabilities from the platform that help achieve control and audit over data residency, consent, and data retention.

Together, these three lenses offer a more realistic way for executives to think about their data platform landscape than any static feature comparison can.

About the Author:

Keerthi Penmatsa is a Senior Platform Engineer and strategic advisor specializing in the architecture of large-scale data and analytics ecosystems. She partners with product and technology leaders to eliminate data silos and build high-efficiency cloud ecosystems.

With a focus on modernizing enterprise data lakes and streamlining fragmented pipelines, Penmatsa bridges the gap between technical execution and business outcomes. Her work is grounded in the “Triple Threat” of modern data leadership: operational efficiency, FinOps discipline, and rigorous enterprise governance. A passionate advocate for scalable innovation, she helps organizations transform technology investments into sustainable competitive advantages.