All articles
AI Strategy··12 min read

The Agentic Risk Standard Does Not Exist Yet - Here Is What Executives Are Evaluating Instead

There is no SOC 2 for agents. OWASP, NIST, MITRE ATLAS, and the EU AI Act each cover part of the picture. This is commentary on the gap, the ten dimensions a real standard would need, and the questions an executive can put to a vendor now.

If you are a non-technical executive about to approve an AI agent pilot this quarter, you are almost certainly being asked to make a risk decision without a shared vocabulary for the risk itself. I want to name that problem plainly, because I think most of the confusion around agentic AI right now is downstream of it.

For every other enterprise technology a board member cares about, there is a single artifact you can point at. SOC 2 Type II for SaaS security. PCI DSS for card data. ISO 27001 for information security management. HIPAA for protected health information. You can walk into a procurement conversation and ask, "show me your certification," and the answer is either a PDF or a real problem.

For agentic AI in April 2026, there is no equivalent artifact. There is no "Agentic Risk Standard" you can ask a vendor to produce. What exists instead is a constellation of partial frameworks, each written for a different reader, none of which a CFO can use as a go or no-go gate on a pilot.

This piece is commentary on that gap. I am going to use the phrase Agentic Risk Standard, or ARS, as a shorthand for the thing that does not exist yet. It is not a framework I am selling. It is not an existing certification. It is a name for a hole in the market that executives are feeling every time a vendor pitches them on an autonomous agent and they do not know what to ask.

A quick note before we go further. Nothing in this article is legal advice. Regulatory references cite primary sources. If you are in a regulated industry and making a deployment decision, your counsel should be in the room.

The attestation gap Why SOC 2 works as a buyer tool, and why agent evaluation does not yet SOC 2 / SAAS SECURITY Three separate parties. One portable report. AICPA Criteria body CPA firm Independent auditor Vendor Under evaluation maintains criteria applies to audits SOC 2 Type II report buyer-facing pass / fail AGENT RISK / APRIL 2026 Content exists. Independence does not. NIST / OWASP MITRE / CSA partial criteria, 6 bodies ? Independent auditor MISSING Agent vendor Under evaluation a reading list not a buyer artifact SOURCE: AICPA TRUST SERVICES CRITERIA · NIST AI RMF · OWASP GENAI · MITRE ATLAS · CSA
Figure 1. SOC 2 works as a buyer tool because three independent roles (criteria body, independent auditor, vendor) produce one portable, buyer-facing report. The agent ecosystem has the content but not the independence: criteria are scattered across six bodies, and the independent auditor role does not yet exist.

The standard you are about to ask for does not exist yet

Here is the test. Imagine your head of engineering brings you a proposal to deploy an autonomous agent that will take actions in three systems: your CRM, your email, and your payment processor. The agent will answer customer questions, update records, and issue small refunds up to some cap.

Your instinct as a buyer is correct. You want to ask: what standard has this been evaluated against, and who certified it? That is the exact question you would ask about a SaaS vendor handling customer data.

The honest answer your engineering team will give you, if they are honest, is some version of: "There is no single standard. We have mapped the agent against OWASP's Top 10 for Agentic Applications, we are tracking NIST's guidance, we have pen tested it using MITRE ATLAS techniques, and our identity provider handles the non-human identity piece."

That answer is not wrong. It may even be a sign of a serious team. But notice what just happened. You asked for a pass or fail, and you got a reading list. That is the gap.

What you have instead: the real frameworks that each cover part of the picture

Before I argue the gap is real, I want to give real frameworks a fair hearing. Every one of the following exists, is publicly documented, and has serious people behind it. None of them, by itself, solves the buyer problem I am describing.

The 2026 agentic risk landscape, on one lineEvery published reference a buyer would have to reconcile, plus what is still announced20232024202520262027TODAY · APR 2026JAN 2023NIST AI RMF 1.0publishedJUL 2024NIST GenAI Profile(AI 600-1)publishedAUG 2025EU AI ActGPAI obligationsapplicableOCT 2025MITRE ATLAS v5.315 tactics / 66 techniquespublishedDEC 2025OWASP Top 10Agentic Apps 2026publishedFEB 2026MITRE ATLAS v5.4.016 tactics / 84 techniquespublishedMAR 2026CSAI Foundation(CSA)launchedAUG 2026EU AI Actenforcement startsupcomingQ4 2026NIST AI AgentInteroperability ProfileannouncedPUBLISHEDANNOUNCED / UPCOMING
Figure 2. The agentic risk landscape compressed to a single axis. Published references form a dense cluster in 2025 through early 2026, but the two candidates that could start closing the buyer gap (NIST's Interoperability Profile and EU AI Act enforcement) both land after August 2026.

1. NIST AI Risk Management Framework and the Generative AI Profile

The NIST AI Risk Management Framework 1.0 was published in January 2023. It is a voluntary, general-purpose framework organized around four functions: Govern, Map, Measure, and Manage. In July 2024 NIST added the Generative AI Profile (NIST AI 600-1), which calls out twelve GenAI-specific risks and suggested mitigations.

Neither of those is agentic-specific. NIST acknowledged the gap in February 2026 when it announced the AI Agent Standards Initiative through the Center for AI Standards and Innovation, with an AI Agent Interoperability Profile expected in Q4 2026. That is the most credible candidate I see to start closing the gap, and it is still announced rather than published.

If you ask a vendor "are you NIST-aligned," you are asking a question that does not have a specific agentic answer yet. The best a serious vendor can say is "we mapped our controls to the Generative AI Profile risks that apply to us." That is a process answer, not a pass or fail.

2. OWASP Top 10 for Agentic Applications 2026

The OWASP Top 10 for Agentic Applications was released in December 2025 by the OWASP GenAI Security Project. It is the most detailed public threat list for agent security right now. It identifies categories like Agent Goal Hijack (ASI01), Tool Misuse, Identity and Privilege Abuse, and Rogue Agents. Three of the top four categories are specifically about identity, tools, and delegated trust boundaries.

This is an excellent list. It is also written for application security engineers. A CFO cannot use it as a purchase gate because it does not answer "has this vendor passed?" It answers "here are the ten things your vendor should have thought about." Those are different questions.

3. MITRE ATLAS

MITRE ATLAS, the Adversarial Threat Landscape for AI Systems, is a structured knowledge base of real-world tactics and techniques used to attack AI systems. As of the v5.4.0 release in February 2026, it contains 16 tactics, 84 techniques, and 56 sub-techniques. That is up from 15 tactics and 66 techniques in October 2025, largely because MITRE has been adding agentic-specific techniques fast.

The new agentic entries include AI Agent Context Poisoning, Memory Manipulation, Modify AI Agent Configuration, Publish Poisoned AI Agent Tool, and Escape to Host. If you are a red teamer, this is invaluable. If you are a board member trying to decide whether to approve a pilot, you cannot directly use it, and you should not try. It is a map of attacker behavior, not a buyer rubric.

4. EU AI Act

The EU AI Act is the closest thing to a binding regulatory regime for AI systems, and its treatment of agentic AI is worth understanding precisely. Governance rules and general-purpose AI obligations have been applicable since 2 August 2025. The Commission's enforcement powers enter into application on 2 August 2026, and the majority of the high-risk and transparency rules come into force on the same date.

Fines run up to 35 million EUR or 7 percent of global annual turnover for prohibited practices, and up to 15 million EUR or 3 percent for violations of human oversight, audit trail, and transparency requirements. Those are the exact categories an agentic system has to worry about.

Here is the important nuance. The Act does not have a specific agentic carve-out. Agents are regulated by their use case. A customer-service agent that also makes refund decisions is likely to land in the limited-risk or high-risk bucket depending on the context. That means there is no "EU AI Act agentic certificate" you can ask for, because agentic is not the regulatory category. My read is that this will force buyers to do the use-case classification themselves, which most buyers are not equipped to do in April 2026.

5. Cloud Security Alliance Agentic AI Red Teaming Guide

The Cloud Security Alliance Agentic AI Red Teaming Guide is probably the most operationally useful document in this whole landscape for a security team. It lays out threat categories that map directly to what actually goes wrong in production: authorization and control hijacking, checker out of the loop, goal manipulation, knowledge base poisoning, multi-agent exploitation, and untraceability.

In March 2026, CSA launched the CSAI Foundation specifically to build identity-first controls for non-human actors and runtime authorization governance for the "agentic control plane." This matters because the same source estimates that in 2026, non-human identities outnumber human users roughly 100 to 1 in a typical enterprise environment, driven by the rise of autonomous agents operating with real privileges.

Again, this is builder guidance, not buyer guidance. It tells a security engineer what to test. It does not give a board member a score.

6. Vendor-published safety frameworks

Anthropic published its framework for developing safe and trustworthy agents and has iterated on it through early 2026. Its core principles are reasonable: autonomy balanced with oversight, transparency into the agent's reasoning, and human control over high-stakes decisions. Anthropic's Plan Mode in Claude Code, where a user reviews and modifies an entire execution plan upfront instead of approving each step individually, is one practical answer to approval fatigue.

In December 2025, Anthropic, OpenAI, and Block founded the Agentic AI Foundation under the Linux Foundation to coordinate open, interoperable infrastructure for agents. That consortium has the scale to influence where this ends up. It is too early to say what the buyer-facing output will look like.

Vendor-published frameworks are useful, and I read all of them. They are also vendor-published. You would not accept "our CEO says we are secure" as a substitute for SOC 2. You should not accept "our foundation model provider published a framework" as a substitute for an independent standard either.

7. US regulatory activity: California and the FTC

California AB 316, effective 1 January 2026, eliminates the "autonomous AI" defense in civil litigation. Defendants cannot argue that the AI system independently caused the harm as a way to escape liability. The FTC's Operation AI Comply has produced real settlements, including Cleo AI and Air AI. These are not frameworks, they are enforcement. They shape what the downside looks like if your pilot goes wrong.

For the full legal picture on liability, the earlier piece on the five controls that separate reliable AI agents from costly mistakes goes deeper into the specific cases: Air Canada, Meta, DPD, Nippon Life. Those cases are the empirical base every one of the frameworks above is trying to formalize.

The framework landscape has one empty cell Where each existing reference sits on audience (horizontal) and scope (vertical) AUDIENCE technical (engineer) buyer (executive) SCOPE agent-specific general AI BUILDER GUIDANCE EXECUTIVE GATE GENERAL THREAT MODEL RISK MANAGEMENT OWASP Top 10 for Agentic Apps Dec 2025 · application security MITRE ATLAS v5.4.0 Feb 2026 · red team / threat model CSA Red Teaming 2025 · builder playbook Anthropic agent framework 2025-26 · vendor-published NIST GenAI Profile (AI 600-1) Jul 2024 · 12 GenAI risks NIST AI RMF 1.0 Jan 2023 · Govern Map Measure Manage EU AI Act by use-case, enforcement Aug 2026 NIST Agent Interop Profile announced, Q4 2026 THE EMPTY CELL Buyer-facing, agent-specific, independently attested
Figure 3. Plotting each real reference on audience and scope reveals the empty cell. OWASP, MITRE ATLAS, and the CSA guide cluster in the builder quadrant. NIST RMF and the EU AI Act cover general-AI risk management. The buyer-facing, agent-specific, independently attested cell has no occupant in April 2026.

Why the gap is a buyer problem, not a vendor problem

It would be easy to read everything above and conclude that the standards bodies just need more time, and that in 12 months NIST's Q4 2026 profile will close the gap. I think that is partly true and partly wrong. It is partly true because the direction of travel is real: NIST, OWASP, MITRE, and CSA are converging on a shared vocabulary. It is partly wrong because the thing that is actually missing is not technical content. It is an artifact designed for the buyer.

SOC 2 works as a buyer tool for a specific reason. It is an attestation, produced by an independent third party, that a vendor meets a defined set of criteria, and it produces a report the buyer can put in a procurement file. The buyer does not need to understand the underlying controls. They need to know that someone with a professional obligation looked at the controls and signed their name to a report.

None of the agentic frameworks I listed above has that shape right now. OWASP does not certify vendors. NIST does not certify vendors. MITRE does not certify vendors. The EU AI Act produces conformity assessments for high-risk systems, but only for systems in scope and only after enforcement starts in August 2026, and the attestation is about regulatory conformity rather than a general agent-safety posture.

So when I say the buyer problem is the hard part, I mean this. The underlying content of a mature Agentic Risk Standard already exists, scattered across five or six documents. The packaging, the independence, the attestation, and the buyer UX do not. Until they do, the buyer is holding the risk, not the vendor.

If you are a board member, that should make you uncomfortable, because you are being asked to make a decision without the artifact you would demand for any other piece of enterprise technology.

A short detour on why SOC 2 actually works

It is worth being specific about what SOC 2 gives a buyer, because the properties it has are the ones the agentic ecosystem is missing. SOC 2 is a report, not a checklist. The report is produced by a CPA firm that has an independent professional obligation to the integrity of the attestation. The underlying Trust Services Criteria are maintained by the AICPA, a standards body that does not sell SOC 2 reports itself. The auditor and the standard-maker are separate organizations, and both are separate from the vendor being audited.

Those three separations are what give a SOC 2 Type II report its force. A buyer can look at the report, see who the auditor was, look up the auditor's reputation, and trust the attestation as much as they trust the auditor. That trust is portable across vendors, because the same auditor can audit many vendors against the same criteria.

Now map that to the agentic AI world. Who maintains the criteria? Right now, nobody in a single place. Who is the independent auditor? There is no equivalent of a CPA firm with an agentic audit practice and a professional obligation tied to that attestation. Who is the standards body? Several, and none of them is in the business of writing buyer-facing criteria for independent attestation. This is not a criticism of any of the existing efforts. It is a structural observation about what the ecosystem has not yet produced.

Until those three roles exist separately, the buyer has to do the auditor's work themselves, and most buyers are not equipped to do that. That is the whole problem in one sentence.

The ten dimensions a mature Agentic Risk Standard would need to cover

This is the constructive part of the piece. If you accept that no single standard exists today, the next useful question is: what would one have to cover to be worth reading? I am going to sketch ten dimensions. They are drawn from the union of the frameworks above plus the documented agent failures of 2024 through 2026.

Treat this as a mental model for now. If someone hands you an agent proposal tomorrow, you can run these ten questions against it and get a clearer read than any one of the existing frameworks will give you on its own.

The ten dimensions a mature Agentic Risk Standard would needDrawn from the union of existing frameworks plus documented agent incidents, 2024 to 202601Blast radius02Authorityscoping03Reversibility04Non-repudiation05HITLthresholds06Auditability07Grounding& provenance08Kill switch09Supplychain integrity10AdversarialtestingTHE ARS10 dimensionsa buyer could scorea vendor onA MENTAL MODEL, NOT A CERTIFICATION. THE ACTUAL STANDARD DOES NOT YET EXIST.
Figure 8. The ten dimensions arranged as a radial view. No single existing framework covers all ten, and no body certifies against any such union today. This is a mental model for buyers, not a published standard.

1. Blast radius

What is the maximum damage a single agent action can cause before a human reviews it? Put it in units. Dollars moved. Records modified. Emails sent. Customers contacted. If the answer is "theoretically unlimited," or worse, "we have not measured it," that is your loudest signal.

The Meta incident in 2025, where an AI alignment director watched an agent delete over 200 of her emails while she sent it stop commands it ignored, is a blast-radius failure. The DPD chatbot incident in January 2024, where the agent was manipulated into swearing at a customer and calling DPD "the worst delivery firm in the world," is a blast-radius failure in reputation terms.

Blast radius is also the easiest dimension to make concrete in a conversation with a non-technical stakeholder, which is why I put it first. Ask a specific question in specific units. What is the largest refund this agent can issue in a single action. What is the largest number of customers it can contact in a single hour. What is the largest number of database records it can modify before a human reviews. If your vendor cannot produce those numbers in five minutes, you have your answer. Not because the vendor is bad, but because a team that has not measured blast radius has not yet done the thinking that would tell them to measure it.

Blast radius, in concentric rings Every action an agent can take fits in exactly one of these, and the size of the ring sets the review bar agent SINGLE ACTION Ring 1 read-only Ring 2 single-user write Ring 3 org-wide write Ring 4 external world EXAMPLES Ring 1 · read-only query a database, summarise a doc, retrieve a knowledge base result Ring 2 · single-user write update one customer record, draft a reply held for a human to send Ring 3 · org-wide write bulk update records, mass-edit inventory, delete files in shared drive Ring 4 · external world issue payments, send customer emails, post to social, deploy code THE REVIEW BAR Most pilots launch a Ring 3 or Ring 4 agent without ever running it at Ring 1 or Ring 2 first. That is where incidents come from. See: Meta email deletion (2025), DPD chatbot (Jan 2024), Air Canada refund policy (Feb 2024)
Figure 4. Blast radius as concentric rings: read-only, single-user write, org-wide write, external-world side effects. Most documented agent incidents involve deploying directly at Ring 3 or 4 without a Ring 1 or 2 dry-run first.

2. Authority scoping

Exactly which credentials, APIs, data scopes, and systems can the agent touch? Is there a documented least-privilege posture, and does it exist as an artifact, not as a verbal promise? This is the OWASP ASI03 category (Identity and Privilege Abuse), and the CSA's 2026 focus on non-human identity governance lives here.

The right answer looks like a spreadsheet or a policy document. The wrong answer looks like "the agent uses the service account that the ops team uses," which is a way of saying the agent has whatever permissions the last ops engineer happened to need.

The authority ladderWhere most human-in-the-loop thresholds sit today, and where they should sitL1SuggestsAgent proposes an action. Human decides and executes.L2DraftsAgent writes the action (email, record, code). Human reviews and sends.L3Executes with approvalAgent proposes the exact call. Human clicks approve per action.L4Executes with plan approvalAgent presents a full plan. Human approves the plan, agent runs it.L5Executes autonomouslyAgent decides and acts within a policy envelope. Human reviews after.L6Executes and delegatesAgent decides, acts, and spawns or directs other agents.WHERE PILOTSUSUALLY LANDplan-level orenvelope-boundedSTART HEREprove the agent drafts wellbefore letting it execute
Figure 5. The authority ladder. Most first production deployments should start at L2 (Drafts) and earn their way up, but pilots commonly launch at L4 or L5, where the review bar is set by a policy envelope the team has not yet stress-tested.

3. Reversibility

Is every action the agent takes reversible? If not, which actions are one-way doors, and how are those specifically gated? A refund is reversible with paperwork. A public social media post is not. A bulk email to 50,000 customers is not. A payment sent to a new payee is difficult to reverse and depends entirely on the counterparty's cooperation.

The principle here is simple. One-way actions deserve a different class of control than reversible ones. If your vendor's answer treats them the same, you have a reversibility blind spot.

The reversibility matrix Not every agent action is equally dangerous. The axis that matters most is silent versus detectable. REVERSIBLE IRREVERSIBLE DETECTABLE SILENT MANAGEABLE Draft that a human reviews. Database write with audit log. Mistakes are caught and undone. Standard controls apply. HARD GATE REQUIRED Payment sent. Public post. Code deployed to production. You will see the damage. Requires per-action human approval. WATCH FOR DRIFT Memory writes. Context edits. Untracked cache updates. Reversible in principle, but nobody will notice drift for weeks. DANGER ZONE Data exfiltration to third party. Hidden tool chain side effects. Silent record deletion. Damage is done and nobody knows. No hard gate will ever fire. Rule of thumb: the bottom-right quadrant should not contain any agent action that lacks tamper-evident logging.
Figure 6. A 2x2 of reversibility against detectability. The top-right quadrant (irreversible but detectable) needs a hard human gate. The bottom-right (irreversible and silent) is the danger zone and the one buyers most often miss when they evaluate vendors.

4. Authentication and non-repudiation

After the fact, can you prove which agent took which action on behalf of which human? This is the identity provenance question, and it is the one the CSA has correctly identified as the control-plane challenge of 2026. If your auditor or regulator subpoenas an agent action log 18 months from now, can you produce a chain of custody?

The failure mode here is subtle. Most enterprise logging assumes a human user or a known service principal. Agents often run under service accounts that were never designed to carry user-level non-repudiation. That is fixable, but only if someone insists on fixing it before the agent goes live.

5. Human in the loop thresholds

Which exact actions require explicit human approval? What is the approval interface? How does the system prevent approval fatigue, where humans reflexively click yes on every prompt?

Anthropic's Plan Mode in Claude Code is one credible answer: instead of approving every individual tool call, a human reviews and modifies an entire execution plan upfront. That compresses the approval decision into a single thoughtful review instead of 40 rubber stamps. The pattern matters more than the specific product. If your vendor's answer is "the operator will approve each step," ask how they have tested that it still works on day 90 when the operator has approved 10,000 steps.

The human-in-the-loop spectrum Four modes, and the tradeoff they hide Human-in-the-loop human approves every action Human-on-the-loop human monitors, can intervene Human-over-the-loop human sets policy, reviews after Human-out-of-the-loop agent decides, acts, reports HIGH SAFETY LOW SAFETY HIGH LATENCY / LOW THROUGHPUT LOW LATENCY / HIGH THROUGHPUT THE APPROVAL FATIGUE TRAP Pure HITL looks safest, but by day 90 humans approve every prompt reflexively. Safety on paper, not in practice. PLAN-MODE APPROVAL (BETWEEN) Human approves the whole plan upfront, agent executes. One thoughtful review instead of forty rubber stamps. FULL AUTONOMY TRAP Fast and scalable, but the policy envelope has to be perfect and it never is. Needs a real kill switch.
Figure 7. The four HITL modes sit on a spectrum that trades safety against latency and throughput. Both endpoints have failure modes: pure human-in-the-loop degrades into approval fatigue, pure human-out-of-the-loop depends on a policy envelope that is never quite right. Plan-mode approval is a defensible middle.

6. Auditability

Is there a tamper-evident log of every action, tool call, input, and output? Can a regulator subpoena it? The EU AI Act's human-oversight and audit-trail requirements sit exactly here, and from 2 August 2026 they come with real enforcement teeth.

Tamper-evident is the important word. Logs your engineering team can edit after the fact are not audit logs. They are notes. The distinction matters when you are the one testifying.

What a real agent audit trail looks likeThe six links a non-repudiable record needs, and which ones usually go missing in vendor demosSTEP 01PromptThe exact input sentto the agentSTEP 02PlanThe reasoning trace ortool-call planOFTENMISSINGSTEP 03Tool callTool name, arguments,target, timestampSTEP 04ResultRaw response, statuscode, errorsSTEP 05Side effectWhat actually changedin the worldOFTENMISSINGSTEP 06CommitTamper-evident logentry, signedOFTENMISSINGUsually loggedCommonly missing when a regulator asks 18 months laterIf your vendor cannot produce every one of these six fields for an arbitrary past action, you do not have an audit trail.You have notes.
Figure 10. A non-repudiable agent audit trail has six links: prompt, plan, tool call, result, side effect, signed commit. Most production agents log prompt and tool call well. Plan traces, side-effect records, and tamper-evident commits are the three that commonly disappear, exactly the three a regulator will ask for.

7. Grounding and output provenance

Can the agent cite the source of any claim it makes to a customer or internal user? Can you trace a given response back to the specific document, record, or tool call that produced it?

Air Canada's fabricated refund policy is the canonical failure of this dimension. The chatbot generated a policy that did not exist, the customer relied on it, the British Columbia Civil Resolution Tribunal held Air Canada liable. The airline's defense, that the chatbot was a separate legal entity responsible for its own actions, was rejected. The grounding question is no longer academic. It is a liability question.

8. Kill switch and containment

Can the agent be stopped immediately, and is the kill switch actually tested? This sounds obvious. It is the single most commonly skipped control I see discussed in public postmortems.

The test is literal. Can someone in operations, at 3am, with nothing but a terminal and their phone, stop the agent in under 60 seconds? If the answer involves filing a ticket, or paging an engineer who knows the specific internal system, you do not have a kill switch. You have a hope.

The kill switch state machine What "stop the agent in under sixty seconds" actually requires running normal ops pause requested signal sent paused no new actions drained in-flight finished terminated proc killed kill() ack drain terminate < 5s < 2s < 30s < 5s ROLLBACK · replay sanitized state TOTAL BUDGET: < 60 SECONDS FROM TERMINAL TO TERMINATED If any transition requires filing a ticket, paging an engineer, or knowing internal system names, you do not have a kill switch. You have a hope. THE ONE TEST 3am. On-call. Phone and terminal. No one to page. Stopwatch running. Go.
Figure 12. A simple five-state kill-switch state machine with a rollback path. The states are easy to draw. The hard part is the time budget (under sixty seconds total) and the constraint that any operator must be able to drive it at 3am with no escalation path.

9. Supply chain integrity

Are the models, tools, Model Context Protocol servers, and training data the agent depends on verified? Is there a bill of materials for the agent's dependencies the way there is for a software package?

The LiteLLM supply chain incident is the reason this dimension is in the list. Agents pull from a growing stack of model providers, routing proxies, tool libraries, and MCP servers. Each of those is a link in a supply chain that did not exist five years ago. MITRE ATLAS added "Publish Poisoned AI Agent Tool" as a specific technique in its February 2026 update because this class of attack is now common enough to name.

MITRE ATLAS grew fastest on agentic techniques Four months, three new tactic-level entries, eighteen new techniques OCTOBER 2025 MITRE ATLAS, pre-agent-push 15 tactics 66 techniques n/a sub-techniques tracked separately at this release 4 MONTHS v5.4.0 release FEBRUARY 2026 MITRE ATLAS v5.4.0, agent-heavy 16 tactics +1 84 techniques +18 56 sub-techniques new NEW AGENT-SPECIFIC TECHNIQUES (EXCERPT) AI Agent Context Poisoning • Memory Manipulation Modify AI Agent Configuration • Publish Poisoned AI Agent Tool Escape to Host
Figure 9. MITRE ATLAS added one tactic, eighteen techniques, and a fresh sub-technique layer in the four months between October 2025 and February 2026. Nearly all new entries target the execution layer of agent systems rather than model-centric attacks.

10. Adversarial testing

Has the agent been red-teamed against prompt injection, goal hijack, context poisoning, and tool abuse? Is there a written report? Is the red-team exercise repeated after every major update?

The CSA Agentic AI Red Teaming Guide and the MITRE ATLAS technique library give security teams a playbook for this. The buyer-side test is simpler: ask to see the report. If there is no report, there has been no serious red teaming. If the report is from the same team that built the agent, it is better than nothing but it is not independent.

The questions an executive can ask a vendor this week

Ten dimensions are a lot to hold in your head. If you are walking into a vendor meeting tomorrow morning, here is the shorter version. Eight questions, in plain language, that you do not need to be an engineer to ask.

SCREENSHOT THISEight questions for your nextagent vendor meeting01What is the largest financial action this agent can take without a human in the loop, in dollars?02Show me the document that lists every system, API, and credential this agent can access.03Which actions are one-way doors, and how are they specifically gated?04If a regulator asks in two years who authorized a specific action, can you produce a chain of custody?05Walk me through how a human operator stops this agent in under 60 seconds. I want to see it happen.06Show me the most recent adversarial test report for this agent. Who ran it and when?07When the agent makes a claim to a customer, can you show me the source it was grounded in?08Which EU AI Act risk category do you believe this system falls into, and why?Vague answers are the signal. Not always a weak vendor,sometimes a strong vendor who has not thought about the right dimensions.
Figure 11. The eight questions as a checklist card. None of these require engineering background to ask, and every one is answerable in under fifteen minutes by a vendor who has done the work.
  1. What is the largest financial action this agent can take without a human in the loop, in dollars?
  2. Show me the document that lists every system, API, and credential this agent can access.
  3. Which actions are one-way doors, and how are they specifically gated?
  4. If a regulator asks in two years who authorized a specific agent action, can you produce a chain of custody?
  5. Walk me through how a human operator stops this agent in under 60 seconds. I want to see it happen.
  6. Show me the most recent adversarial test report for this agent. Who ran it and when?
  7. When the agent makes a claim to a customer, can you show me the source the claim was grounded in?
  8. Which EU AI Act risk category do you believe this system falls into, and why?

None of those questions require technical expertise to ask. All of them are answerable in 15 minutes by a vendor who has actually done the work. If the answers are vague, that is your signal. The signal is not always that the vendor is bad. Sometimes the signal is that the vendor has not thought about the right dimensions, which is almost the same thing from a buyer's perspective.

What to watch for over the next 12 to 24 months

The landscape is going to change fast, and a few specific markers are worth watching.

The first is NIST's AI Agent Interoperability Profile, expected in Q4 2026. Of the real candidates to start closing the buyer gap, this is the most credible one. A NIST profile will not be a certification, but it will give the rest of the ecosystem a shared reference.

The second is the EU AI Act's enforcement start on 2 August 2026. Once the first enforcement actions land, the market will have real precedents for what "sufficient human oversight" and "adequate audit trail" look like in practice. Regulators tend to teach the market through enforcement, not through guidance documents.

The third is whether an independent attestation body emerges. There is no SOC 2 for agents because there is no equivalent of the AICPA sitting behind an agent-specific attestation standard yet. That role could be filled by an existing standards body extending its scope, or by a new entity formed for the purpose. The Agentic AI Foundation formed under the Linux Foundation in December 2025 is a plausible home for part of this work, but it was founded by model providers, not by auditors. Independent attestation almost certainly has to come from somewhere else.

The fourth is whether the insurance market catches up. When cyber insurance carriers start asking specific agent-related questions on underwriting forms, the market will have a de facto standard, because carriers will force vendors to answer the same set of questions the same way. That pressure tends to move faster than voluntary standards bodies. The same pattern played out in cyber insurance in 2018 through 2021: long before any formal certification was universal, the underwriting questionnaire became the effective standard, because no carrier was willing to write a policy without an answer. I would watch for the first carriers to add specific agent questions to their 2026 and 2027 renewals. When that happens, the market moves.

The fifth marker is quieter but important. Watch what large enterprise buyers start putting into their master services agreements and data processing addenda. Procurement contracts tend to codify buyer demands faster than standards bodies codify best practices. If Fortune 500 procurement teams start requiring specific agentic-risk representations and warranties in vendor contracts, that contractual language becomes the de facto standard whether or not a formal body ever publishes one. The enforcement mechanism is not a regulator. It is the breach-of-contract lawsuit that follows the next incident.

A note on what this is, and is not

I called this commentary in the opening, and I want to keep that promise in the closing. Everything above is one engineer's read of what is published, what is not, and what the gap means for the people making procurement decisions.

I am not proposing the Agentic Risk Standard as a product, a certification, or a framework I am trying to own. I am using the name to talk about a shape-of-hole in the market. The actual closing of the gap is going to be done by some combination of NIST, OWASP, CSA, the EU AI Commission, a future attestation body, and the insurance industry. None of those are me, and none of them are Code Atelier.

What I have written here is a mental model. I have shipped agentic systems in production, including a RAG pipeline and an agentic workflow at a previous company, so I know the difference between agent problems that look scary on a slide and agent problems that actually bite you in operations. The ten dimensions above are the ones I would want to answer myself before I put an autonomous system into a customer-facing path with real money behind it.

If you are walking into that kind of decision this quarter, I hope the list is useful. And if you want to compare notes on any of it, I am easy to reach. The contact form at the top of the site goes directly to my inbox.

Frequently Asked Questions

Is there an industry-standard certification for AI agents like SOC 2 is for SaaS?

Not as of April 2026. There is no independent attestation body producing agent-specific reports the way SOC 2 produces reports for SaaS security. What exists instead is a constellation of frameworks: NIST AI RMF plus the Generative AI Profile, OWASP Top 10 for Agentic Applications 2026, MITRE ATLAS, the CSA Agentic AI Red Teaming Guide, and the EU AI Act. Each covers part of the picture, none produces a buyer-facing pass or fail. The most credible candidate to start closing the gap is NIST's AI Agent Interoperability Profile, announced in February 2026 and expected in Q4 2026. Independent attestation, if it emerges, will likely come from a standards body extending its scope or from a new entity formed for the purpose.

What is the difference between OWASP's Agentic Top 10 and NIST's AI RMF?

They are written for different audiences and do different things. The OWASP Top 10 for Agentic Applications 2026 is a threat list: ten specific categories of things that go wrong in agent systems, from Agent Goal Hijack to Rogue Agents, written for application security engineers and developers. NIST AI RMF 1.0, plus the Generative AI Profile (NIST AI 600-1), is a risk management framework organized around four functions (Govern, Map, Measure, Manage), written for an enterprise risk audience. OWASP tells you what to test. NIST tells you how to structure the organizational process around AI risk. Neither is agentic-specific in the buyer-facing attestation sense, although NIST's announced Q4 2026 AI Agent Interoperability Profile is meant to address that gap.

What does the EU AI Act say about AI agents specifically?

The Act does not have an agentic carve-out. Agentic systems are regulated by their use case, falling into the Act's general categories (prohibited, high-risk, limited-risk, minimal-risk, or general-purpose AI). Governance rules and general-purpose AI model obligations have been applicable since 2 August 2025. The Commission's enforcement powers enter application on 2 August 2026, which is also when the majority of high-risk and transparency rules come into force. Fines run up to 35 million EUR or 7 percent of global annual turnover for prohibited practices, and up to 15 million EUR or 3 percent for violations of human oversight, audit trail, and transparency requirements, which are exactly the categories an agentic system has to worry about. The practical implication is that buyers must classify their agent's use case themselves, because there is no standing "agentic" label in the Act.

What questions should a non-technical buyer ask before approving an agent pilot?

Eight questions cover most of the ground without requiring technical expertise. First, what is the largest financial action this agent can take without human approval, in dollars. Second, show me the list of every system, API, and credential the agent can access. Third, which actions are one-way doors, and how are they gated. Fourth, can you produce a chain of custody if a regulator asks in two years who authorized a specific action. Fifth, walk me through stopping this agent in under 60 seconds. Sixth, show me the most recent adversarial test report and who ran it. Seventh, when the agent makes a claim to a customer, can you show me the source it was grounded in. Eighth, which EU AI Act risk category do you believe this system falls into and why. Vague answers are the signal, regardless of whether the cause is a weak vendor or a strong vendor who has not thought about the right dimensions.

What is agent blast radius and why does it matter?

Blast radius is the maximum damage a single agent action can cause before a human reviews it. Measured in concrete units: dollars moved, records modified, emails sent, customers contacted. It matters because the highest-profile agent failures of 2024 and 2025 were blast-radius failures. The Meta incident in 2025, where an AI alignment director watched an agent delete over 200 of her emails while she sent it stop commands it ignored, is a blast-radius failure in destructive action terms. The DPD chatbot incident in January 2024, where a customer manipulated the agent into swearing at him and calling DPD "the worst delivery firm in the world," is a blast-radius failure in reputation terms. The practical control is simple: cap the agent's maximum single-action impact at a number you would be willing to write off, and require human approval for anything larger.

When will a real agentic risk standard be published?

The most credible near-term candidate is NIST's AI Agent Interoperability Profile, announced in February 2026 through the Center for AI Standards and Innovation and expected in Q4 2026. That is likely to give the rest of the ecosystem a shared reference, although a NIST profile is not itself a certification. The EU AI Act's enforcement start on 2 August 2026 will also shape what "sufficient human oversight" and "adequate audit trail" mean in practice, because regulators tend to teach markets through enforcement actions rather than guidance documents. An independent buyer-facing attestation, the thing that would function as SOC 2 functions today, likely takes 12 to 24 months beyond that, and depends on whether an independent body chooses to own it. Watch the insurance market too: when cyber carriers start asking specific agent-related questions on underwriting forms, a de facto standard emerges faster than voluntary bodies can move.

Is this legal or compliance advice?

No. This article is commentary on the current state of publicly available frameworks and regulatory developments for agentic AI risk. Every regulatory reference cites a primary source, but nothing here is a substitute for qualified legal counsel. If you are in a regulated industry, or making a decision with material liability exposure, your counsel should be in the room when you classify the system under the EU AI Act, assess California AB 316 exposure, or evaluate specific contractual language with a vendor. The article is intended to give an executive a vocabulary and a set of questions, not a legal opinion.

Code Atelier · NYC

Ready to get agent-ready before your competitors do?

Let's talk