Opinion & Analysis

AI Can Automate AI and Data Governance — Here’s How

Despite the business interests, the potential of AI and generative AI to automate AI and data governance is often overlooked.

Written by: Chathuri Daluwatte

Updated 8:24 AM UTC, February 28, 2024

Advancements in AI, specifically Generative AI between 2022 – 2023 have organizations and data leaders aiming to capitalize on data using AI products, especially Generative AI.

The legal and compliance landscape around AI for all industries is fast changing, highlighting the need for data and AI governance. Given data governance is already lagging in various industries, AI could be an accelerator for both AI and data governance.

The discussion around generative AI is largely concentrated on use cases with business impact. However, despite the business interests, the potential of AI and generative AI to automate AI and data governance is often overlooked.

Life science and healthcare industry which went through digital transformation under compliance and regulation can be studied by other industries on how to implement AI with compliance, especially on the use of AI to automate AI and data governance.

Using AI for AI Governance

The 2019 book called “Team Topologies” by Manuel Pais and Matthew Skelton, promoted the idea of “internal developer platform (IDP)” and a dedicated platform team that can enhance software delivery practices and overcome the limitations of DevOps. IDPs adhere to a Platform as a Product approach, to deliver developer self-service while making the underlying tech accessible for developers via golden paths and without abstracting away context. With this approach, IDPs lower the cognitive overload across the engineering organization leading to expense reduction via ZeroOps.

Many tech organizations (e.g. Google, Meta, Airbnb, Netflix, Spotify) use IDPs in their engineering practice and the use of self-service platforms is cited as a common characteristic of high-performing teams.

IDPs are not only for software engineering. An IDP can be adopted for AI product development as well (referred to as MLOps) and this is especially advantageous in regulated industries (e.g. life sciences and pharmaceutical manufacturing) and organizations that are tackling “AI’s long tail problem” (i.e solving for a range of low-value problems that sums up to a larger value).

An MLOps IDP also helps data scientists who are not often trained in software development principles to develop production-grade AI. In regulated industries like pharmaceutical manufacturing, the quality and regulatory process (i.e, good practice, quality guidelines, and regulations, GxP) can be integrated into the MLOps IDP, enabling efficient, effective, agile, and fast AI product development and deployment cycles while being compliant and enforce a monitor by design product development culture to monitor the data and model drift of the AI products in use.

IDPs also enable having all enterprise production code, AI models as well as AI experiments in one place with discoverability and auditability, meeting enterprise governance standards along with documentation on software and AI as well as regulatory and quality reports. Collaboration and peer learning among data scientists and the reusability of codes and AI models are promoted with an IDP.

Years before the tipping point of generative AI in 2023, MLOps IDPs have used AI to automate AI governance. This includes automation of code scanning before AI deployment to production that ensures code quality and security. Such AI-powered code scans can further enforce good coding practices and enterprise standards as well as documentation.

Further, feedback loops in the integrated development environment (IDE) of the IDP can be implemented for proactively prompt-highlighting violations of enterprise governance standards during development, such that the developer can proactively address them. Such checks and feedback loops can be realized with improved capabilities and effectiveness using novel generative AI and multimodal foundational models.

Using AI for Data Governance

A self-served data platform along with federated data governance is an approach promoted and adopted by modern enterprises. Self-served data platforms follow a Data-as-a-Product approach, to get rid of the bottleneck of a centralized data team via domain-oriented decentralization for analytical data.

AI can be used within self-served data platforms during the DataOps processes (e.g. clean data, transformations, and ingesting) at the policy automation process of the data platform.

An example from the pharmaceutical industry is in the customer relationship data management (CRM) area. The pharmaceutical industry CRM process happens via two organizations separated by compliance:

The medical field force – responsible for CRM around the science and is prohibited from customer interactions around products and commercial aspects
The commercial field force – responsible for CRM around sales and is prohibited from customer interactions around products based on science

Both field teams generate increasing amounts of natural language data (e.g. recorded calls, field interaction summaries, insights, customer complaints, and inquiries). This natural language data often stays locked in operational data systems due to the burden of compliance:

● Privacy: Presence of Personal Identifiable Information (PII) in the natural language data

● Compliance:

Medical field force discussing commercial topics
Commercial field force discussing medical topics
Prohibited topics such as racism

● Pharmacovigilance: Discussions about adverse events

No or low review (via random sampling) of the data at operational data systems occurs to meet the compliance commitments as a review of all data is not humanly possible.

An AI (AI or GenAI) enabled compliance engine developed using the Large Language Model (LLM) Bidirectional Encoder Representations from Transformers (BERT)

automatically removed PII from natural language data
increased compliance via objective review via AI (as opposed to subjective human interpretation of compliance rules)
increased review frequency (no review/quarterly review to weekly automated review)
reduced human review time via review by exception
increased review by 90% (from 0% no review or ~10% random sampling to 100% via automation)
enabled cost avoidance for the enterprise (fines/penalties)

The automatically compliance-checked data could be fed in real time to the enterprise data platform as well as an AI insights engine product, providing timely insights and data access to the enterprise beyond the operational teams.

Such AI-powered compliance engines can be achieved with better quality with the availability of more powerful LLMs than BERT and multimodal generative AI.