Opinion & Analysis
Written by: Mihir Rajopadhye and Sagar Balan
Updated 11:19 AM UTC, Wed August 9, 2023
Business leaders know that there is a limited window of opportunity to get Artificial Intelligence (AI) right in their organizations. As technology accelerates the pace of market change, AI solutions that deliver better customer and employee experiences, improve decision-making, and streamline processes are becoming critical for competitiveness. As per a recent McKinsey report, AI adoption has more than doubled since 2017, with a greater number of applications having AI embedded in them.
However, despite this sense of urgency, industry adoption of AI solutions in production remains slow and uneven. A 2022 study by NewVantage Partners suggests that only 26% of enterprises have AI systems in widespread production.
Data management is usually cited as a top challenge companies face when integrating AI into existing workflows and processes. Respondents in IBM’s 2022 survey on AI adoption cited data complexity as one of the top five barriers to operationalizing AI at scale. As a result, enterprise teams are now focusing data management efforts on generating, operationalizing, and driving adoption of insights that drive faster return on investment (ROI) instead of focusing narrowly on creating foundational data or AI capabilities.
Success with AI starts with data
Focusing on value-driven use cases that demonstrate the transformative power of AI-based solutions is a critical early step for organizations looking to scale their AI portfolio. Equally important (and often overlooked) is the approach companies take to modernizing their data ecosystem. The right approach can unlock faster progress, while choosing the wrong approach can result in additional roadblocks.
Enterprises are increasingly adopting medallion architectures which organize data in zones based on progressively better quality. By categorizing data in bronze, silver, and gold zones, teams improve data quality as it flows through each zone.
The medallion architecture offers an exceptional framework for curating rich data assets, which serve different data consumers at their desired quality and service levels. However, enterprise teams have to unravel decades of poor data hygiene to migrate data assets to this architecture.
To create faster momentum, progressive CDOs and CIOs should:
Understand data consumption personas to define silver and gold zones
Build ‘gold zone’ data assets that power multiple use cases, thereby
optimizing cloud economics
Adopt a data migration and modernization roadmap that accelerates scaling of AI solutions
Developing data personas
When migrating and modernizing data assets, a key first step is to identify data consumers in the organization: who they are, what roles they have, and what their data needs are. This information should guide what data is included in silver and gold data zones.
As an example, some data scientists may want clean and enriched (silver) data for exploratory purposes, while other data scientists may need more accurate gold data to build and train AI and machine learning models. Similarly, some business users and development teams may be satisfied with silver data for analysis, while other users will require highly modeled and aggregated data for their reports and dashboards. Being able to make these distinctions is key to unlocking greater value, as well as controlling costs.
Designing the medallion zones based on an understanding of the organization’s level of data maturity, the various data consumers, and their data quality requirements and preferences helps establish a robust and scalable framework for data asset development.
Creating a fit-for-purpose definition of Silver and Gold zones
In the medallion architecture, data is categorized based on quality and usage. The bronze zone contains raw data, the silver zone typically has validated, de-duplicate, and enriched data, while the gold zone has aggregated and modeled data.
To ensure proper data quality across the silver and gold zones, teams need to establish a governance process and integrate it throughout the data lifecycle. Getting data governance right requires a delicate balance across roles, tools, processes, metrics, and the definition of data observability. Investment in these organizational capabilities is essential for a successful data governance program, and the pay-off for this investment is better data quality for analytics and AI.
Onboarding data with a domain-based approach
Traditional methods for migrating data to the medallion architecture typically adopt a data-first approach, rebuilding the data foundation in the target environment before building the consumption zone data assets. These approaches are usually driven by enterprise teams seeking to retire their legacy data stack. They result in teams focusing on addressing technical complexities of ingestion pipelines and data structures rather than on the business value that can be delivered by enhancing use case delivery.
An alternative approach is to onboard data by domains, establishing metrics and key performance indicators (KPIs) that are associated with the operationalization of each domain.
The roadmap for developing domain-centric data assets uses analytic use cases as a primary driver of prioritization. Although this approach requires teams to invest upfront time developing domain-specific data assets that have enterprise relevance, they gain a head start transforming data into intelligence. Enterprise teams using a domain-centric data approach can expect to reduce data preparation times significantly compared to traditional methods.
A framework-driven approach to data onboarding can further expedite the development of high-quality data assets. Commonly used frameworks include:
Metadata-driven data ingestion: A configuration-driven ingestion framework supports both batch and real-time streaming constructs. This type of framework focuses on data source types rather than the number of sources. As a result, it enables enterprises to scale seamlessly with data growth, while preventing point-to-point data pipelines. Companies that use this approach can accelerate build time and reduce costs across every use case.
Integrated data quality: This framework incorporates technical data quality rules progressively throughout the ingestion process to improve data quality along the bronze, silver, and gold data zones. Applying quality rules to filter out substandard data promotes the development of high-quality data assets, creating user trust in data and preventing further downstream discrepancies.
Audit balance and control: Logging and monitoring events as data moves through the pipelines enables auditability and troubleshooting of issues as they occur. This framework also includes automated alerts that are sent to DataOps, data owners, and data consumers on data quality, data delivery service-level agreement (SLA) adherence, and other typical issues. Teams that receive alerts can resolve issues quickly, reducing data downtime.
Optimal cluster selection: Migrating data assets to a cloud platform provides exceptional scale and advanced tools that typically drive a greater level of exploration and experimentation. Using frameworks that implement rigorous checks at the beginning of the data initiative and throughout its lifespan ensure that the right cluster topology and processing capabilities are used and avoid cost surges due to data problems that build up over time.
To scale AI, focus on data
Enterprise data leaders can use their competitors’ delay in scaling AI to create leap-ahead capabilities with data and AI. The past several years have served as a global case study in the importance of creating data and AI capabilities. Organizations that could quickly create insights were able to respond to fast-moving market events, such as supply chain snarls and geopolitical developments, out-positioning those that could not.
The good news is that enterprises don’t have to figure this out by themselves. Vendor solutions and partner services can help teams solve their data challenges, removing barriers to operationalizing AI solutions across more use cases and scaling these solutions across the enterprise.
By adopting medallion architectures, organizing data in zones, creating data domains, and using onboarding frameworks, enterprises can create data processes that improve data quality and usability. They also enable teams to ingest data at scale, for use in AI models and big-data analytics. This in turn, expedites the evolution to the data driven enterprise: using predictive capabilities to anticipate and proactively respond to events, unlocking new avenues of value across myriad business opportunities.
In a future article, we will explore how data accelerators can help enterprise teams achieve even faster gains by deploying advanced data frameworks and evolving data science capabilities.
About the Authors
Mihir Rajopadhye is the Chief Data and Analytics Officer (CDAO) at Rich Products. He is a business focused transformation leader with demonstrated success applying technology to enable business strategies for Fortune 500 companies.
At Rich Products, Rajopadhye is responsible for establishing enterprise data capabilities and accelerating value realization through the deployment of analytics solutions. He joined Rich’s from Walmart, where he led the development of Walmart’s data strategy, managed a portfolio of internal and external data monetization initiatives, and established an internal customer success team to drive data literacy across the organization.
He is based out of Dallas, Texas. In his free time, he enjoys the outdoors, traveling and learning about different cultures.
Sagar Balan is a Chief Business Officer for the Consumer-Packaged Goods vertical at Tredence. He has led AI Transformation programs that have created real impact across numerous CPG firms. He focuses on working with senior leaders to drive growth through commercial and supply chain initiatives. He is passionate about breaking siloed operations in Fortune 500 CPGs through prescriptive decision systems and change management.
Balan is an entrepreneur at heart and passionate about creating high-performance teams. He drives simplicity in strategy formulation and execution. He has an MBA from XLRI, Jamshedpur in India and he enjoys traveling, trekking, and martial arts.