Skip to main content
EU AI Act Article 12 binds 2026-08-02 101 days. What Article 12 requires →
MIT-licensed Python SDK

Your agents do the work.
We make sure they follow the rules.

Five lines to a pre-execution enforcement gate that writes to your existing Postgres. Enforcement gates for every action routed through the SDK — budget caps, scope checks, loop detection, human-in-the-loop approvals, agent presence monitoring, and a tamper-evident audit trail that compliance can verify themselves. See the threat model for what the SDK does and does not protect against.

Current version — v0.7.2

The v0.7 series adds an opt-in platform bridge that dual-writes audit events to the hosted Code Atelier Governance platform while keeping your local Postgres as the authoritative source of record. v0.7.1 extends the bridge to a write-and-poll HITL approvals inbox. v0.7.2 ships the codeatelier-governance recipe agt CLI scaffolder for Microsoft Agent Framework starters, plus HMAC-signed budget-alert webhooks. Upgrade: pip install --upgrade "code-atelier-governance[platform]>=0.7.2". No new migrations; every v0.7 addition is additive and opt-in. See the hosted-platform setup or the changelog for the full list.

MIT licensed. Free and open source.

quickstart.pypython
import os
from codeatelier_governance import GovernanceSDK, AuditEvent

async with GovernanceSDK(database_url=os.environ["DATABASE_URL"]) as sdk:
    await sdk.audit.log(AuditEvent(agent_id="my-agent", kind="hello"))
Full example with scope & budget enforcement
from codeatelier_governance import ScopePolicy, BudgetPolicy

sdk.scope.register(ScopePolicy(
    agent_id="billing-agent",
    allowed_tools=frozenset({"read_invoice", "send_email"}),
    hidden_tools=frozenset({"delete_all"}),  # hidden from LLM context
))

sdk.cost.register(BudgetPolicy(
    agent_id="billing-agent",
    per_session_usd=0.50,
    per_session_seconds=300,  # 5-minute session time limit
    per_agent_usd_daily=10.00,
))

# Auto-compute cost from model name — no manual usd= needed:
await sdk.cost.track_usage("billing-agent", session_id,
    model="gpt-4o", input_tokens=1000, output_tokens=500)
Sync wrapper for Flask / Django
import os
from codeatelier_governance import GovernanceSDKSync, AuditEvent

with GovernanceSDKSync(database_url=os.environ["DATABASE_URL"]) as sdk:
    sdk.audit.log(AuditEvent(agent_id="my-agent", kind="tool.call"))
    sdk.scope.check("my-agent", tool="send_email")
    sdk.cost.check_or_raise("my-agent", session_id)
The problem

Most tools tell you what your agent did. After the damage is done.

LangSmith, Langfuse, and Helicone are observability platforms — they log what happened, after the fact. That does not stop a runaway agent from burning $600 overnight, calling a tool it should never touch, or modifying a patient record without approval. The Governance SDK is different: it gates decisions before the LLM call fires.

Agent frameworks — including Microsoft Agent Framework, which now ships DevUI, Aspire, and first-class HITL — solve the build-an-agent problem. Our differentiation is narrower and earlier: HMAC-chained tamper-evident audit + just-Postgres + scope, budget, and HITL gates in one primitive, callable from any framework, with evidence a regulator can verify.

What you get

Enforcement modules. One substrate.

Core Enforcement

Decision Audit Trail

shipped

Every agent action is an HMAC-chained, append-only row in your Postgres. Tamper with any past row and every subsequent row's verification fails. Chain fork detection catches parallel branch insertion. Compliance can verify the chain themselves.

Action Scope Enforcement

shipped

Whitelist which tools and APIs each agent can call. Exact-match or explicit prefix. No regex, no eval. Violations are blocked and audit-logged automatically. Hidden tool policies remove tools from the LLM's context entirely.

Spend Limits & Budget Gates

shipped

Token and USD caps per session and per agent/day, plus session time limits. Built-in pricing for 24 models auto-computes USD. The gate denies the call before the LLM fires if the budget is exceeded. Fail-closed by default.

Human-in-the-Loop Gates

shipped

High-risk actions require a signed, single-use approval token from a human. Tokens are HMAC-bound to the specific action, time-limited, and replay-proof. Self-approval prevention blocks the operator who owns the agent from approving their own agent's requests (fail-closed).

Extended Modules

Loop & Anomaly Detection

shipped

Detect when an agent calls the same tool repeatedly in a sliding window. Kill runaway loops automatically or log them for review. Configurable per-agent thresholds.

Agent Presence & Kill Switch

shipped

Real-time visibility into which agents are live, idle, or unresponsive. Operator-triggered kill switch (v0.5.4): clicking Halt in the console fail-closes the agent's next enforcement gate within 5 seconds. Heartbeat-based lifecycle with automatic stale detection and operator identity tracking. No background worker needed.

Behavioral Contracts

shipped

Declarative pre/post conditions that wrap any tool call. Enforce budget, scope, and approval requirements in a single context manager.

EU AI Act Compliance

shipped

Automated Article 12 evidence reports from your audit data. Seven-section mapping with compliant/partial/non-compliant status per requirement.

Anthropic Adapter

shipped

Native wrap_anthropic() integration. Auto token tracking, USD cost estimation, and budget enforcement for Claude API calls.

Performance

Fast enough to gate every call.

Shared connection pool

Single SQLAlchemy engine with ~15 connections per SDK instance. No per-request connection overhead.

Concurrent audit writes

Pre-call audit events are backgrounded. Post-call audit, cost tracking, and presence updates run in parallel.

Combined budget query

Session + daily counters checked in one DB round-trip. No separate queries per cap type.

300+ tests passing. MIT licensed.

Just Postgres. Nothing else.

The only infrastructure dependency is a Postgres connection string. No ClickHouse, no Redis, no Kafka, no S3, no sidecar, no background worker. We write to the database your application already has. Even the optional console GUI reads from the same Postgres — zero new infrastructure, ever.

How we compare

Enforcement, not just tracing.

Feature comparison between Code Atelier Governance and observability / framework competitors: LangSmith, Langfuse, Helicone, and Microsoft Agent Framework. Each row marks whether the feature is supported by each product.
FeatureCode AtelierLangSmithLangfuseHeliconeMS AGT
In-process enforcement (blocks before LLM call)
HMAC-chained tamper-evident audit
Just Postgres (no ClickHouse/Redis/Kafka)
Cost / budget caps with fail-closed gate
Pre-execution approval gates (block before action)
Tool / API scope enforcement
Self-hosted (open source)
Read-only governance console GUI
OpenAI / LangChain / Anthropic adapters
Framework-agnostic (decorator API)
Behavioral contracts (pre/post conditions)
EU AI Act Article 12 evidence reports
Sync wrapper (Flask / Django)
Self-approval prevention (fail-closed)

Or get started in five minutes

Install the SDK, apply the schema, log your first event.