Opinion & Analysis
Written by: Maria C. Villar | Co-founder and Managing Partner, Business Data Leadership, Mike Alvarez | CTO and Head of Product, NeuZeit, Elizabeth Hiatt (Beth) | Head of Global Data Governance, Paypal, Christine Legner | Professor of Information Systems HEC, University of Lausanne
Updated 2:00 PM UTC, Thu August 28, 2025
This is the third article in a four-part series exploring the transformative role of AI Agents and their potential to address persistent challenges in Data Governance. It focuses on four critical aspects of data stewardship — data quality, metadata management, master data, and data retention — and reviews what AI Agents already can do, while providing an outlook on the future.
As data continues to grow in complexity and importance, organizations need smarter, more agile, and scalable approaches to manage it. The Digital Data Steward (DDS), powered by AI Agents, represents the next evolution in Data Management — combining the best of human expertise with the power of artificial intelligence. In this article, we deep-dive into four critical areas of data stewardship: data quality, metadata management, master data, and data retention.
These areas are critical for executing the domain data strategy (see article 2) and building upon its core input, specifically strategic themes and critical data elements (CDEs). Despite the advances in tooling, these four areas still demand considerable manual efforts and high domain knowledge from data stewards.
Today, we are already witnessing numerous examples of specialized AI agents and Large Language Models (LLMs) taking over tasks that were traditionally handled by data stewards. This article illustrates what is already possible today, and how these capabilities may be further extended with Agentic AI, bringing us closer to the vision of the Digital Data Steward (see Table 1).
We specifically propose four agentic AI systems for key data steward responsibilities:
Screenshot
Fig 1. Digital Data Steward Agent map (Zoom on Data Quality, Metadata, Master Data, and Data Retention)
Data quality is a critical aspect for any digitalization and AI initiative: without accurate, consistent, and reliable data, these initiatives will ultimately fail. AI agents can significantly automate and augment many of the highly manual tasks performed by data stewards, as well as enhance the curation of data assets for operational and analytical purposes.
Many data quality tools leverage AI to perform the foundational capabilities needed to perform basic data quality tasks, such as:
Detection of (simple) data quality rules: Identifying basic rules, specifically for validity, completeness, uniqueness, or consistency, and straightforward business logic – as illustrated in Figure 2 for an HR dataset with rules that were detected by ChatGPT.
Figure 2: Data quality rules in an HR dataset detected by ChatGPT
While AI agents are already powerful, they are mostly focused on simple and clearly defined repetitive tasks. What is needed for our vision of the Data Quality Agent — as a component of the Digital Data Steward — is the capability for cross-agent orchestration and feedback to predict, listen, alert, and of course, correct.
The Data Quality Agent requires orchestration and collaboration of agents that are trained to perform more complex tasks and truly augment human data stewards’ capacities (see Figure 3).
Figure 3: Data Quality Agent
Example scenario for the Data Quality Agent: Imagine a financial institution deploying the Data Quality Agent to augment its human data stewards’ abilities and improve the quality of its customer data. In this scenario, the Data Quality Agent could be responsible for: Scanning vast amounts of customer data from different sources — including structured data from CRM systems as well as unstructured data from e-mails or customer interactions — to identify patterns, relationships, and anomalies.
Metadata provides context and meaning to data, which is critical for users to discover, understand, and leverage information effectively. Similar to data quality tools, many of the metadata management tools leverage AI today in the following areas:
A more comprehensive Metadata Management Agent – within the Digital Data Steward framework – extends and orchestrates these specialized agents to assist human data stewards in creating and consistently maintaining the data dictionaries and metadata repositories for their domain.
To provide meaningful metadata, the Metadata Management Agent needs to be familiar with the internal vocabularies and glossaries, continuously learn glossary definitions and enrichment rules from one domain, and proactively suggest them in another. This is specifically important for the various unstructured data sources — such as documents, e-mails, and reports — that feed an increasing number of LLMs and GenAI applications.
Example scenario for the Metadata Management Agent: Imagine a large e-commerce company with vast amounts of customer data. In this scenario, the Metadata Management Agent could be responsible for:
Being the most critical data objects used across the organization’s business, master data is the focus of most data steward work. Today’s MDM tools increasingly embed AI along the entire data life-cycle for:
A Master Data Management Agent within the Digital Data Steward framework aims at making the lifecycle management of Critical Data Elements (CDEs) more automated, efficient, and reliable. This includes orchestration and collaboration of agents to manage the domain CDE’s Create, Read, Update, and Delete processes.
However, compliance checks are needed, specifically for sensitive and business-critical master data. Hence, the most business-critical and complex steps still rely on human expertise and contextual information, derived from the domain data strategy.
Data retention becomes a critical responsibility of the data steward role, especially as organizations face increasingly complex legal, regulatory, and ethical obligations. To meet these challenges, many modern data management tools — such as enterprise data catalogs (e.g., Collibra, Microsoft Purview, Informatica) and Master Data Management tools — now embed AI-driven features to automate and enhance compliance with data retention policies:
Within the Digital Data Steward framework, the Data Retention Agent collaborates closely with the Metadata and Master Data Management Agents, reads the metadata, and triggers specialized agents for deletion, anonymization, or archiving. It goes beyond enforcing existing data retention policies to proactively optimize the existing data retention policies and procedures based on usage patterns.
Example scenario for the Data Retention Agent: In the healthcare industry, a Data Retention Agent can support human data stewards in data retention tasks to comply with regulations like HIPAA. The tasks include:
By automating these tasks, the AI agent can help healthcare organizations reduce the risk of non-compliance, improve data security, and free up valuable resources.
AI agents are already transforming key data management tasks — especially data quality, metadata management, master data workflows, and data retention. The value of the Digital Data Steward as an agentic system is realized by how well it learns your data environment, how much information and training on your data ecosystem it is given, and the various data management issues encountered within your organization, whether that is through your domain data strategy, issue and incident management system, or other Internal Audit reports, etc.
Over time, the Digital Data Steward is able to understand where your company has risk and ensure that agents are deployed to protect and defend the company. Fully autonomous, general-purpose agents for complex, cross-organizational data management remain a work in progress, with most organizations adopting a “human-in-the-loop” approach for critical decisions.
The last part of our four-part series will explore what it really takes to move from a promising vision of agentic AI in data stewardship to measurable, enterprise-scale value. We’ll examine the evolving role of the Chief Data Officer and the rise of collective intelligence platforms.
We’ll also address the nuanced debate between augmentation and automation, spotlight innovation in the startup ecosystem, and provide a practical timeline and recommendations for CDOs ready to lead this transformation. If you’re wondering where to start — or how to scale — the augmented future of data stewardship, Part 4 will bring it all together.
About the Authors:
Maria C. Villar brings over 30 years of experience as a transformational technology executive, having served as Chief Data Officer in both the technology and financial sectors. Currently, she is Co-founder and Managing Partner of Business Data Leadership, a firm committed to enhancing effective data and AI management practices through training, writing, coaching, and consulting. Her expertise includes enterprise data strategy, data and AI governance, business value realization, organization and change management, and ESG and Sustainability.
Recognized as a leader in the data and AI industry, Villar is a frequent speaker and author. Her accomplishments include co-authoring the book “Managing Your Business Data from Chaos to Confidence” with Theresa Kushner, developing online master classes, e-learning modules, and webinars, contributing to “Latin Business Today” since 2010, and serving as the WLDA Ventures Program Manager for an accelerator program focused on data and AI startups.
Mike Alvarez is a data and AI transformation leader with over 20 years of experience driving innovation at the intersection of data science and commercial product development. He helps organizations unlock transformative value from their data, technology, and human resources. His career spans pioneering data leadership roles at Fortune 20 companies where he delivered hundreds of millions in business value through data/AI initiatives.
As CTO and Head of Product at NeuZeit, he is focused on accelerating the value and adoption of AI for organizations with acceleration frameworks. Alvarez is passionate about helping companies navigate their data and AI transformation journey by establishing robust data foundations, deploying scalable AI solutions, and creating platforms that democratize insights to drive competitive advantage. Mike is also a board member of the AI Freedom Alliance (https://aifalliance.org/) advocating for the fair and ethical use of Artificial Intelligence.
Elizabeth (Beth) Hiatt is Head of Global Data Governance at PayPal. She has close to 30 years of experience building and deploying enterprise-wide data management and governance programs. Beth has held various data management and governance roles across business and technology in financial services, telecommunications, and hospitality. She has implemented enterprise data management programs end-to-end, developing and enabling critical functions such as data governance, data quality, and master and metadata programs. She has deep technical expertise in enterprise data architecture, helping organizations “connect the dots” across the data lifecycle. Beth is a strong, results-driven leader with experience managing large, complex organizations specifically focusing on growing a company’s data management maturity while changing the organization’s data culture. She has written articles including “Time to Level Up: The Evolving Role of the Chief Data Officer” published by TDWI, spoken at many conferences including the Women Data Leaders Global Summit, Chief Data Officer and Information Quality (CDOIQ) Symposium and was on CDO Magazine’s Global Data Power Women List.
Christine Legner is a Professor of Information Systems at the Faculty of Business and Economics (HEC), University of Lausanne, in Switzerland. Her research fields are data management, enterprise architecture, and business software. She is the co-founder and academic director of the Competence Center Corporate Data Quality (CC CDQ), an industry-funded research consortium and expert community dedicated to advancing the field of data management. In this role, Legner leads a research team that collaborates closely with industry experts from 20 Fortune 500 companies (BASF, Bayer, Bosch, Nestlé, Schaeffler, SAP, Siemens, and Tetrapak, among others) to develop innovative concepts, tools and methods for data management.
Together with Dr. Richard Wang, Legner also serves as the Co-Chair of the annual CDOIQ European Symposium, which brings together CDOs, CAOs, CAIOs, and senior leaders shaping the data, analytics, and AI landscape in Europe.