CSAI Foundation Launches Catastrophic Risk Annex, CVE Authority, and Discovers 'Safety Overfitting' in Agentic AI

The CSAI Foundation - the 501(c)(3) nonprofit arm of the Cloud Security Alliance - announced a series of milestones at its Agentic AI Security Summit on April 29, aimed at what CEO Jim Reavis calls "securing the agentic control plane." ^{1Securing the Agentic Control Plane: Key Progress at the CSAI Foundation} The announcements span a new control framework for catastrophic AI risk, formal authority to issue CVE identifiers for AI vulnerabilities, acquisition of two open specifications, and an unexpected empirical finding about the limits of adversarial safety testing.

The Catastrophic Risk Annex: From Principles to Auditable Controls

The headline initiative is the STAR for AI Catastrophic Risk Annex, an extension of CSA's existing AI Controls Matrix (AICM) designed to address AI risk scenarios that go beyond conventional enterprise incidents. ^{2The Catastrophic Risk Annex: Next Gen AI Security Controls} Where existing frameworks handle data leakage, bias, and model drift, the Annex targets what CSA describes as "large-scale, irreversible, and society-wide consequences" - loss of human oversight, uncontrolled autonomous behavior, and systemic failures cascading across critical infrastructure. ^{2The Catastrophic Risk Annex: Next Gen AI Security Controls}

The Catastrophic Risk Annex is funded by Coefficient Giving, a philanthropic organization focused on long-horizon AI safety work. ^{1Securing the Agentic Control Plane: Key Progress at the CSAI Foundation} Its methodology combines Delphi-method expert scoring, pilot audits of model provider safety practices, and published findings. ^{1Securing the Agentic Control Plane: Key Progress at the CSAI Foundation}

Concretely, the Annex will identify which existing AICM controls apply to catastrophic scenarios, introduce new controls where gaps exist, and define evidence requirements suitable for independent assessment. ^{2The Catastrophic Risk Annex: Next Gen AI Security Controls} Examples of testable controls include verifying that human-in-the-loop mechanisms cannot be bypassed, testing whether action gating prevents unsafe escalation, and validating that kill-switches and rollback mechanisms function under pressure. ^{2The Catastrophic Risk Annex: Next Gen AI Security Controls}

The rollout follows a four-phase, 15-18 month roadmap beginning in late Q2 2026:

CSAI Becomes a CVE Numbering Authority

CSAI has registered as a CVE Numbering Authority (CNA) through MITRE, giving it the ability to directly issue CVE identifiers for AI-specific vulnerabilities - a first for the AI security community. ^{1Securing the Agentic Control Plane: Key Progress at the CSAI Foundation} This is a structural development rather than a symbolic one. The CVE system is the global standard for tracking software vulnerabilities, and until now there has been no dedicated authority for cataloguing flaws specific to AI models, agentic frameworks, or MCP (Model Context Protocol) endpoints.

The CNA designation feeds into CSAI's broader AI Risk Observatory program, which includes RiskRubric scanners for LLMs, MCP endpoints, and agent repositories, along with telemetry ingestion and forecasting capabilities. ^{1Securing the Agentic Control Plane: Key Progress at the CSAI Foundation}

Two Specifications Join the Foundation

Two open specifications now sit under the CSAI umbrella. The Autonomous Action Runtime Management (AARM) specification, contributed by CSA corporate member Vanta, provides an open framework for securing AI-driven actions at runtime across context, policy, intent, and behavior. ^{1Securing the Agentic Control Plane: Key Progress at the CSAI Foundation} The second is the Agentic Trust Framework, which applies zero-trust governance principles to autonomous AI agents. ^{1Securing the Agentic Control Plane: Key Progress at the CSAI Foundation}

Both specifications address the runtime authorization gap that NullSec has reported on extensively this month. CSA's own research has documented that 53% of organizations have experienced AI agent scope violations and 74% report agents receiving more access than necessary. AARM and the Agentic Trust Framework offer architectural responses to these findings - moving from static, deployment-time permissions to continuous, per-action authorization.

The 'Safety Overfitting' Discovery

Perhaps the most operationally significant finding emerged from CSAI's experiments certifying AI agents as "digital workers." During adversarial and scenario-based safety testing of an autonomous agent, CSAI observed that the agent began persistently refusing to execute its core duty - posting to a community platform - a task it had performed routinely for weeks. ^{1Securing the Agentic Control Plane: Key Progress at the CSAI Foundation}

The agent itself diagnosed the behavioral shift, stating unprompted that adversarial testing had pushed it into refusing its own core duties. ^{1Securing the Agentic Control Plane: Key Progress at the CSAI Foundation} CSAI describes this as "safety overfitting" or "defensive overcorrection."

The implication is direct: if the industry over-indexes on adversarial safety evaluation, it risks creating agents that are nominally "safe" because they refuse to do anything at all. Balancing security assurance with operational reliability in autonomous agents is an emerging discipline with no established best practices. ^{1Securing the Agentic Control Plane: Key Progress at the CSAI Foundation} For enterprises deploying agents in production, this means that safety testing regimes need calibration - not just for false negatives (missed risks) but also for false positives (legitimate operations flagged as dangerous).

Context: Why This Matters Now

Reavis frames 2026 as the convergence of two exponential curves: step-level improvements in AI model capabilities and viral enterprise adoption of autonomous agents. ^{1Securing the Agentic Control Plane: Key Progress at the CSAI Foundation} CSA's membership base includes over 250,000 individual members, 500+ corporate members, and 12,000+ STAR provider certifications. ^{1Securing the Agentic Control Plane: Key Progress at the CSAI Foundation} The foundation's argument is that traditional risk frameworks are too abstract to validate, too static to capture runtime behavior, and too narrow to address systemic failure modes created by autonomous agents operating at scale. ^{2The Catastrophic Risk Annex: Next Gen AI Security Controls}

The Catastrophic Risk Annex aligns with - but is distinct from - existing regulatory frameworks. Phase 2 of the roadmap explicitly calls for alignment with the NIST AI RMF, the EU AI Act, and ISO/IEC 42001. ^{2The Catastrophic Risk Annex: Next Gen AI Security Controls} It is designed as a voluntary assurance mechanism, not a compliance mandate, but one that could give organizations auditable evidence of safety controls ahead of tightening regulatory requirements.

Looking Ahead

The window between the Annex's June 2026 kickoff and its Phase 4 registry launch in late 2027 is where the real test lies. CSA's track record in cloud assurance - STAR is arguably the most widely adopted cloud security certification - gives the program institutional credibility. But catastrophic AI risk is a harder problem than cloud misconfiguration. The scenarios are lower-probability, the testing is harder to standardize, and the stakeholders - from AI labs to regulators to insurers - have divergent incentives.

The safety overfitting discovery, meanwhile, is an early signal that the field of agentic AI security is generating genuinely novel problems. It is not enough to build controls that prevent bad outcomes; those controls must also avoid preventing good ones. That balance will define the next generation of AI governance - and CSAI appears to be one of the few organizations generating empirical data on where it breaks.