AI Governance
Written by: Mansi Agarwal | Global Head of Analytics and AI at Carrier
Updated 12:58 PM UTC, May 19, 2026

For most of the last century, corporate governance has been treated as plumbing. Necessary, rarely admired, noticed only when it fails.
In the age of agentic AI, that posture is no longer tenable. Every enterprise is a trust engine. Governance is how that trust is safeguarded. In years gone by, authority was obviously vested in identifiable humans. Mistakes were attributable.
But now, as agentic AI breaks that assumption, focusing on setting the goals for AI governance success is one of the most important things an organization can do.
This means setting some new metrics specifically around AI governance to measure whether that strategy is actually working, and fit for the purposes needed.
AI governance pertains to system behavior, not the documents we’re used to. Policies in PDFs cannot govern actors operating at machine speed.
They must be encoded directly into systems: least-privilege access, continuous observability, lifecycle controls, escalation paths, and immutable logging. This is not bureaucracy. It is what makes deployment safe, and therefore scalable.
Laying the guardrails for any rapidly evolving system is simultaneously an exercise in control and allowance for flexibility. While AI use cases are only limited by imagination, tolerances need to be strict, especially with proprietary or sensitive financial data.
To get it right, first build a framework for success. This could be from the ground up, if adoption of these systems is nascent, or adapting current systems, as outlined in this guide to AI governance frameworks.
Before scaling anything, organizations need a clear view of what already exists. Every agent deployed in the enterprise should be registered as a Digital Agent on Record (DAR), an agent with a formal identity inside the organization’s governance systems.
The registry assigns each agent:
Version history tracks its evolution the way a personnel file tracks role changes. Decommission dates are recorded like offboarding. Visibility status (active, suspended, under review or retired) is maintained in real time.
The registry becomes the single source of truth that answers foundational governance questions:
The pressure to scale AI fast leads organizations to deploy broadly before understanding where agents actually work. The opposite approach succeeds far more often.
Start with use cases where the workflow is well understood, the cost of failure is bounded, and partial automation still delivers meaningful value.
Evaluate agents not only on task performance but on behavior over time, edge-case handling, and interaction with real enterprise data.
Scale only after both performance and governance hold under real conditions.
Most agentic failures can be traced to the environment and not the model itself. Typical issues include:
Deploy agents in governed data sources, enforcing least-privilege access by default and maintaining traceability across every step.
The governance layer is what bridges the gap between demonstration and deployment.
While agents execute decisions, accountability will always stay with people. The most important governance shift underway is explicitly mapping where human responsibility sits in every autonomous workflow.
Each agent needs:
This does not slow systems down but rather ensures that when something fails, the organization responds with clarity rather than confusion.
This is where we begin looking at AI governance success metrics, to track availability and performance. Firstly, remember that agentic systems need a lens of outcomes:
These outcome metrics must be tracked alongside governance signals of policy compliance rates, human intervention frequency, and behavioral drift indicators.
The combination determines whether an agent is genuinely delivering value or merely operating.
An agent that maintains perfect uptime but routinely escalates, violates policy boundaries, or produces inconsistent results is not reliable.
It is a source of hidden cost and is essentially an operational overhead disguised as automation.
Legacy AI metrics – accuracy, precision, recall – were designed for stateless models producing single outputs. Agentic systems demand a fundamentally different measurement architecture.
The following AI governance metric framework organizes AI governance KPIs into five categories:
Not in a lab but in real workflows, with real data, under real conditions.
These are the first metrics, tracking:
Tracking these metrics alone is not enough. To do it effectively organizations must implement a multi-layered observability strategy combining:
However, there is little need to build these capabilities in-house. There are mature agent observability platforms already available to track these metrics.
The thresholds for these metrics are also converging rapidly. Systems achieving goal accuracy above 85%, with hallucination rates low enough to be operationally negligible, are increasingly being deployed in customer-facing environments.
These thresholds should be defined collaboratively by business owners, AI engineering teams, and risk functions, based on the agent’s risk tier, domain sensitivity, and the potential cost of failure.
Organizations must establish structured operational boundaries. SEPARATEHigh-risk processes, such as finance or healthcare, should require human-in-the-loop validation, while low-risk, well-defined tasks may operate autonomously under controlled conditions.
Importantly, the system should not receive the benefit of the doubt. Any signs of drift, unreliable behavior, or policy violations should automatically trigger human review and, where necessary, rollback procedures.
The metrics to watch here are all around the decay of human-in-the-loop dependency.
Monitoring these will tell data leaders whether the system is maturing or quietly breaking.
Most organizations still measure usage. That is the wrong lens. What matters is how much work the agent can complete without intervention.
Over time, intervention should decline. If it rises, the organization’s guardrails are failing.
Next, data leaders need to know whether the system is staying within the organization’s boundaries, even if they’re not explicitly defined. These are the metrics that allow that to be policed:
These tell CDOs and data leaders whether the system is staying within the organization’s boundaries, whether explicitly defined or not.
In high-risk domains, the tolerance is zero. Elsewhere, thresholds can be tiered, but the signal remains the same.
Violations that persist across model updates are not model issues. They are structural governance gaps.
Agents are deployed to deliver value. That value has to be measured in terms the business understands:
But value must be paired with success rates. A low-cost agent that fails half the time is expensive in disguise. The governance signal here is straightforward: ROI without trust metrics is incomplete.
Value, trust, and adoption have to be tracked together, or the picture is distorted.
Agents rarely fail in a single moment. They degrade over time, so monitoring this signal is key. Set up tracking for:
The right approach is not point-in-time evaluation, but baseline tracking over 30 to 60 day windows, looking for sustained deviation.
Building AI governance to detect patterns, not just incidents, will avoid costly issues down the line that data leaders could have spotted earlier.
The following table outlines the core KPI categories, benchmark thresholds, governance signals, and ownership structures organizations can use to measure and manage autonomous AI systems at scale.

Figure: AI Agent Governance: KPIs, Signals, and Ownership
This bears repeating: the consolidated AI governance KPIs must sit with a single accountable entity — with clear AI Governance roles (or a governance committee) chaired by a Chief AI Officer or equivalent executive with cross-functional authority.
Without this, each function optimizes its own metrics in isolation: engineering watches reliability, compliance tracks violations, finance measures ROI, and nobody sees the systemic picture.
Only a cross-cutting owner can detect divergence and intervene before governance debt compounds into organizational liability.
Additionally, every agent operating in the enterprise needs a named human counterpart who is accountable for its decisions.
Governance requires analog representations of digital actors: people who own what the agents do, the way managers own what their teams do.
If CDOs and data leaders want to make a big impact in their organization, focusing on setting the goals for AI governance success is one of the most important things they can do.
The paradox is that the most ignored discipline in corporate life has become the one on which competitive advantage will turn.
Those that rethink governance as the enabling infrastructure of autonomy will be the ones that actually scale.
It means:
The emerging role of the AI Governance Engineer reflects this shift. These practitioners are not simply writing policy. They are building the measurement systems that allow organizations to observe, evaluate, and control agentic behavior at scale.
Their mandate is to translate governance into operational signals: reliability thresholds, intervention triggers, policy compliance rates, drift detection, traceability, and audit-ready evidence that can be acted on in real time.
Agentic AI promises organizations that are faster, more adaptive, and capable of operating at scales human coordination alone cannot sustain.
But that promise depends on whether enterprises can measure trust as rigorously as they measure performance. In the end, governance metrics are not reporting artifacts.
They are the control layer that determines whether autonomous systems remain reliable, accountable, and safe as they scale.
About the author:
Mansi Agarwal is currently the Global Head of Analytics and AI at Carrier, where she drives digital transformation by translating advanced technologies and data into measurable business outcomes. A proven leader in building world-class teams and scaling AI solutions across enterprises, she brings a distinctly human-centered approach to AI leadership—one grounded in holistic transformation across technology, people, and culture. Her two decades of experience from Nike, REI, and Infosys demonstrate a consistent ability to reshape business functions through data-driven innovation while fostering organizational alignment and change. Named one of CDO Magazine’s Global Data Power Women 2026, Mansi is recognized as a thought leader and sought-after keynote speaker.