Governing Autonomous Agents that Guard Us

Written by
Vlad Ligai
Published on
March 31, 2026

A few months ago, my biggest concern was alert fatigue. The EDR console was generating dozens of detections a week, the CSPM was surfacing cloud misconfigurations faster than we could prioritize them, AppSec tooling was flagging leaked credentials in repos that hadn't been touched in years, and our CI/CD pipeline was producing hundreds of SAST, DAST, and SCA findings per sprint that nobody had time to properly triage. The human loop was breaking on every front simultaneously.

So we built our way out.

This post covers two things: how we built VLAD**, the AI agent that now handles L1 triage across five security domains, and how we secured it before its growing credential footprint became a liability.

Part 1: Building VLAD

Meet VLAD 2.0

Meet VLAD 2.0, my personal L1 SecOps analyst, so I can focus on actual security work. It even takes the night shift.

Figure 1: Snapshot of the Admin console
Figure 1: Snapshot of the Admin console
Figure 2: CrowdStrike Detection Response
Figure 2: CrowdStrike Detection Response

VLAD 2.0 can:

  1. Listen to Slack and CI/CD pipeline events, pulls enrichment from security platforms, and uses an LLM to triage findings before a human ever sees them
  2. When an EDR detection fires, VLAD fetches the full detection context, enriches it, and posts a formatted verdict back to the alert thread within seconds.
  3. Pipeline SAST and DAST findings get the same treatment: VLAD pulls the finding detail, cross-references it against asset context and historical data, and delivers a verdict on whether it is a true positive worth blocking a release or noise that has been seen and assessed before.

It covers Detection and Response (EDR, SIEM), Cloud Security (CSPM), Application Security (AppSec, SAST, DAST, SCA), Threat Intelligence, and Security Operations.

The agent works. The numbers make the case plainly.

VLAD's Impact: What the Numbers Show

Before VLAD, each EDR or SIEM alert averaged anywhere between 10 to 40 minutes of analyst time: read the alert, pull enrichment from the console, assess context, document, close or escalate. SAST and DAST findings from CI/CD pipelines were worse. Most went unreviewed entirely because there was no capacity to triage hundreds of findings per sprint.

Today VLAD triages an EDR detection in under 30 seconds. A SIEM correlation alert, including cross-provider log retrieval and enrichment, completes in under a minute. Pipeline SAST and DAST findings get an inline verdict during the CI run itself. The Security Impact Assessment, which pulls an entire sprint's worth of stories, fetches every MR diff from GitLab, runs per-story security analysis, and synthesizes a written report, finishes in under five minutes for a typical sprint.

| Task | Before VLAD | With VLAD |
|------|-------------|-----------|
| EDR detection triage | 10 to 40 min | < 30 seconds |
| SIEM correlation alert | 10 to 40 min | < 1 minute |
| Pipeline SAST/DAST finding | Unreviewed | Inline verdict during CI |
| Sprint Security Impact Assessment | Not done | < 5 minutes |

The human team now reviews VLAD's verdicts rather than performing raw triage. Engineers spend their attention on the cases VLAD marks "Needs Investigation," the actual hard calls, not the noise.

The capacity headroom this created changed how we operate. We onboarded detection rules we had previously deprioritized because we lacked analyst bandwidth. We added CSPM, AppSec, and pipeline security integrations within weeks of the first skill going live, because the incremental cost of a new integration is engineering time, not analyst capacity.


Development Velocity Enabled by Claude Code

The other number worth stating clearly: VLAD's core agent took two weeks from first commit to production triage of live EDR alerts. That pace was only possible because we used Claude Code, the same Claude model that powers VLAD's triage, as the primary development tool throughout.

This matters because it directly shapes how quickly a security team can respond to new threat coverage requirements. With an LLM handling the mechanical parts of development, a new skill from spec to tested integration now takes one to three days. The Security Impact Assessment skill, which pulls sprint stories, fetches code diffs, runs per-story analysis, and synthesizes a full security report published to the team's knowledge base, went from proposal to production-ready in under a week. SAST, DAST, and SCA finding triage followed shortly after on the same pattern.

The velocity compounds. Each new skill benefits from the patterns established by previous ones. SIEM multi-provider abstractions, pipeline security integrations, cross-platform context correlation, structured prompt design, these were built and iterated at a pace that would have required a dedicated team and a multi-month roadmap not long ago.

That speed is the argument for AI-assisted development in security tooling. And, as a Security Engineer, the speed and power of Claude Code has also made it extremely clear why the governance problem is urgent.


Part 2: Securing VLAD or How to Govern Agents

VLAD's architecture follows a pattern that anyone building in this space will recognize:

Alert event  >  Parser  >  Enrichment  >  LLM triage  >  Reply

Each enrichment step is an outbound API call from a process running without human supervision. The EDR client uses OAuth2 tokens cached in memory. The SCM client holds an API token loaded at startup. The CSPM client manages its own OAuth refresh cycle.

All of these credentials live in the agent's runtime memory. The agent decides, autonomously, when to use them, which endpoints to call, and what data to pull. There is no human in the loop between "alert received" and "API called." That is the whole point, speed requires removing human latency from the critical path.

But here is the security implication: if the agent process is compromised, or if the LLM receives a prompt injection payload through a security alert, an attacker inherits API access to every platform the agent monitors. This is not a theoretical risk in SecOps automation; it is a practical one. The more capable the agent becomes, the more integrations it holds, the wider that blast radius grows.

We built VLAD with this threat model in mind:

  • Credentials are loaded from a secrets manager at runtime, never stored in code or configuration files
  • All API clients enforce TLS with certificate validation
  • Prompt templates separate system instructions from untrusted alert data to limit injection surface.
  • Each integration requests only the API scopes it needs.
  • Alert payloads are sanitized before logging or posting to Slack.

These are standard best practices and they reduce risk. But they do not solve the structural problem: the agent still possesses credentials it should never need to see.

This is the industry-wide problem, not a VLAD-specific one. Every security team moving toward AI-native operations is building agents that accumulate credentials, make autonomous decisions, and operate at a scale and speed that puts them well outside any traditional access governance model. The tooling to handle this does not yet exist natively in most environments.


The Root Problem: Autonomous Agents Have No Identity

Traditional access governance assumes a human is behind every privileged action. A person authenticates, a session is established, actions are logged against that identity. That model breaks completely when the actor is an autonomous agent running continuously, calling APIs on behalf of multiple users, and making decisions faster than any audit cycle can track.

What we are missing is a way to answer four questions about every action an autonomous agent takes:

  1. Who authorized this? Not which API key was used, but which human identity, on which device, under which policy, was responsible for this agent session.
  2. Is the device still trusted? Device posture at session start tells you nothing about posture at request time. A laptop that was compliant when the agent launched may not be compliant three hours later.
  3. Is the agent allowed to do this specific thing, right now? Not "does the API key have the scope" but "does this agent, in this context, with this user, have permission to make this call."
  4. Can we stop it immediately if something goes wrong? Not "rotate the credentials and redeploy" but actually sever the session in real time, without touching the agent code.

These are the questions that any serious AI agent governance framework has to answer. They are also exactly what Ceros was built to address.


Eating Our Own Dogfood: Securing VLAD with Ceros

Full disclosure: I helped build Ceros here at Beyond Identity. This gives me an architectural perspective in why and how Ceros is purpose built to secure autonomous agents.

Ceros binds AI agents to hardware-rooted identity and continuously authorizes them against policy. The core design goal: the agent never possesses the LLM credentials it depends on.

The architecture runs as a four-actor chain:

Ceros (launcher)  >  Agent  >  Authenticator (local proxy)  >  Cloud Proxy (Guardian)  >  Provider

The agent routes LLM traffic through a local proxy. The proxy signs every request using a hardware-backed session key that lives only in memory, with the root key confined to the device TEE and never exported. The Cloud Proxy validates that proof, evaluates policy, injects the LLM API key from an HSM, and forwards the request. The agent never contacts the LLM provider directly and never holds that credential.

If the agent process is compromised and an attacker dumps memory, they find an access token and a session private key that rotates every 8 hours. Neither is useful without the hardware credential, which cannot leave the TEE. The Cloud Proxy can revoke the compromised session immediately and require re-enrollment. The LLM provider API key never leaves the cloud infrastructure.

The credential hierarchy that makes this work:

Hardware-Bound Credential (permanent, TEE-confined)
   └── Session Private Key (memory-only, 8hr rotation)
        └── Access Token (256-bit random, DPoP-bound)

Each layer is only useful in combination with the layer it derives from. An intercepted access token is useless without the session key to generate a valid DPoP proof. The session key is useless without the hardware credential. The hardware credential cannot leave the TEE. This is what makes Ceros credentials non-extractable under normal threat conditions.

The request flow for a single agent API call:

Agent Process (no API key)
|
|  POST to localhost proxy
|  Authorization: Bearer <session-token>
|
v
Authenticator (local proxy)
|  Signs request with DPoP proof derived from TEE-backed session key
|  Forwards over mTLS
|
v
Cloud Proxy (Guardian)
|  Validates mTLS certificate
|  Verifies DPoP signature and token binding
|  Evaluates policy: RBAC, device posture, rate limits, token budgets
|  Injects LLM provider API key from HSM
|  Forwards to provider
|
v
Provider API


Policy as a Control Plane for Autonomous Agents

The credential model solves the "what gets stolen" problem. Policy solves the "what is the agent allowed to do" problem, and it does it continuously, not just at session start.

Ceros evaluates policy on every request. If a check fails mid-session, the session is severed. This is the answer to the "can we stop it immediately" question above, and it is the capability that matters most as agents become more capable.

For an agent like VLAD, the policy layer enables controls that simply do not exist today:

Scope enforcement at the network layer: As Ceros extends beyond LLM traffic to cover additional provider integrations, the same model applies to every API the agent touches. The SCM integration should only read diffs during impact assessment workflows. It should never push code, delete branches, or touch repositories outside its defined scope. Today that constraint lives in code review and developer discipline. Under Ceros it would be enforced at the network layer on every request, with a full audit trail.

  • Token budgets: LLM API calls should have token budgets. An agent that enters an enrichment loop can exhaust a month's API budget in an hour. A budget policy at the proxy layer stops this without any changes to the agent code.
  • Launch context validation: The agent should only run from a known launch context, on a device with current posture, tied to an enrolled user. A process started outside of that context has no path to the Cloud Proxy.
  • Instant revocation: And when something does go wrong, whether it is a prompt injection, an anomalous call pattern, or a policy violation, the session can be severed from the admin interface in seconds. No credential rotation scramble, no redeployment, no waiting.

The Velocity Problem Goes Both Ways

Here is the tension that sits at the center of AI-native security operations: the same velocity that makes AI agents valuable is the reason they outpace existing governance models.

VLAD added a new integration roughly every few days during active development. Each integration is a new credential, a new API surface, a new blast radius. The pace of capability addition outran the pace of access governance by design. That is not a failure of process; it is the natural consequence of building with AI tooling. The agent grows faster than traditional security review cycles can track.

The answer is not to slow down. The answer is governance infrastructure that scales with velocity. When a new skill is added to VLAD, the Ceros policy author adds the corresponding access rule. No credential rotation, no blast radius expansion from storing a new token. The policy layer is maintained independently of the agent code and takes effect immediately.

This is what it means to have a control plane for autonomous agents, not a set of controls baked into each individual agent, but a centralized enforcement layer that governs all of them consistently, regardless of how fast they evolve.

Where This Goes

VLAD 2.0 is in production, triaging findings across five security domains in seconds to minutes where analysts previously spent 10 to 40 minutes each. Claude Code is deployed across the engineering team. The same Ceros architecture that governs VLAD governs those coding agents too: every LLM call, every tool invocation, proxied, signed, policy-evaluated, and logged. The direction of travel in our industry is clear: autonomous AI agents are becoming operational infrastructure in security teams, not experiments.

The governance gap is real and it is growing. Every new agent integration, every new capability, every new deployment environment adds to an attack surface that current tooling is not designed to address. The traditional model of credential management, access control, and audit logging was built for human operators. It does not translate to autonomous agents operating at machine speed across multiple platforms simultaneously.

Ceros is the architecture that closes that gap: hardware-rooted identity, credential escrow, continuous authorization, per-request policy enforcement, and a full audit trail. Not as a bolt-on to existing agent code, but as infrastructure that governs agents regardless of how they are built.

The machines are already guarding us. The work now is making sure we can still govern the machines.

If you want to give Ceros a try, sign up for free here: https://agent.beyondidentity.com/.

Vlad Ligai