You Just Bought Claude Code for Your Team. Now What?

Written by
Colton Chojnacki, Product Manager
Published on
February 13, 2026

I've been using AI tools to do my work for years now, well before Claude Code existed. Different tools, different providers, whatever was good at the time. I'm not the CTO. I’m not the head of security. I don't have a clean title for what I do. I float between engineering, product, and marketing, and I spend a lot of time figuring out how new tools can make us faster. At some point I started showing people what was possible with AI, and it caught on.

Pretty soon the whole company was in this weird organic adoption phase. Everyone was using different agents, different accounts, different providers. People started submitting expense reports and I'd just put "AI credits" as the line item. Then other people started doing the same thing. Then more people. Eventually we had this sprawl of individual subscriptions and API keys and reimbursement requests, and someone said what everyone was thinking: we should probably just pick one provider and go all in.

We went with Claude Code. Got a Teams subscription. Real seats for the org. No more patchwork of personal accounts on company machines.

It felt like the grown-up move. Consolidate. Standardize. Everyone's productive, everyone's on the same platform.

Then something happened that I didn't expect.

The moment it clicked: the risk of tabbing blindly

We started rolling out Claude Code beyond the engineering team. Our go-to-market team, our operations people, anyone who wanted to use it. And here's the thing about developers using Claude Code: they generally understand what it's doing under the hood. When Claude asks for file access, a developer has an intuition for what that means. When it wants to run a shell command, they can read it and make a judgment call. Rolling it out to engineers felt fine.

Rolling it out to non-developers was a different experience entirely.

I was sitting with someone on our team, helping them set up Claude Code for a workflow they wanted to build. As we were working through it together, Claude started doing its thing. It needed API keys. "Can Claude have access to this folder?" It was writing code, installing dependencies, doing all the normal stuff that Claude Code does. At one point Claude pushed something up to git and I paused. "Wait, did Claude just check an API key into that repo?"

"Oh, I don't know. I just said yes to whatever Claude asked me."

"Do you know if this repo is public or private?"

"...no?"

Sitting right there, watching the screen, I felt fine. I could see everything Claude was doing. I could catch the moments where it was about to do something we'd want to think twice about. I was basically acting as a human security layer, reading every action, making judgment calls in real time.

And then my stomach sank a little. Because I realized I couldn't always be there to do that.

This person was going to keep using Claude Code tomorrow, and the next day, and they were going to get more comfortable and more autonomous with it. And they weren't going to have someone looking over their shoulder asking "wait, is that repo public?" They wouldn't know to ask that question themselves. Not because they're careless. Because that's not their world.

I wasn't second-guessing the decision to roll out Claude Code. The productivity gains were real. I just realized that we'd been thinking about adoption without thinking about what it looks like when adoption actually works. When everyone's using it, not just the people who instinctively know what a public repo is.

The questions nobody was asking

I started writing down everything I'd been watching for over that person's shoulder, and I realized I was basically running a mental checklist that didn't exist anywhere:

When your company adopted Slack, you knew the security drill. SSO, data retention, access controls, DLP. When you rolled out GitHub, same deal. There's a playbook. AI agents don't fit that playbook because they're not applications your team uses. They're things that act on behalf of your team, with the full permissions of whoever launched them.

I'd been doing the security evaluation manually, in person, by watching a screen. That obviously doesn't scale to an entire org. So we sat down as a team and tried to figure out what questions we'd actually need to answer:

  1. Do we even know which agents are running? Not which licenses we bought. Which agents are actually running, right now, across all our machines. Including the ones nobody told us about.
  2. What can they access? Which tools are connected? Which MCP servers? What can those servers reach? Are there MCP servers attached that nobody approved?
  3. Who is running them? Can we tie each agent session back to a specific person on a specific device? Not a username in a log. A real, verified identity.
  4. Are those devices healthy? Is the machine running the agent encrypted? Is the firewall on? Is endpoint protection running? Not "was it healthy when they logged in this morning." Is it healthy right now.
  5. Can we prove what happened? If something goes sideways, can we reconstruct who did what, when, on which device, with which tools? And can we prove that record hasn't been tampered with?
  6. How fast can we answer these questions? Minutes? Days? "Let me get back to you after I talk to three teams and check four different logging systems"?

When we wrote these out, we realized something uncomfortable: we couldn't answer most of them. And we build security products.

The progression we figured out

After staring at these questions for a while, we realized they naturally group into stages. You can't jump to the end. Each one builds on the last.

First, you have to see what's happening.

Before you can write a single policy, before you can evaluate risk, before you can answer anyone's question about security, you need visibility. And not visibility at the network edge. You need to see what's happening on the actual machine where the agent runs.

That means knowing which agents are running across all your devices, including the shadow AI that people installed before the official rollout (and kept using after). It means seeing every tool call, every file read, every shell command, every MCP interaction in real time. It means knowing which MCP servers are connected, which ones are approved, and which ones somebody grabbed from GitHub last Thursday without telling anyone.

When we first got real visibility into our own environment, the biggest surprise wasn't anything scary. It was the gap between what we assumed was happening and what was actually happening. MCP servers we didn't know were connected. Agents running in contexts we didn't expect. Tool usage patterns that didn't match what we thought the workflow looked like.

That gap between assumption and reality? That's your actual risk surface. And you can't measure it until you can see it.

Then, you have to decide what's allowed.

Visibility without governance is just a dashcam. You'll have great footage of the accident, but you won't have prevented it.

This part was harder than we expected. The policy questions for AI agents are genuinely new, and some of them don't have obvious answers:

  • Do we want our SRE team giving Claude Code access to our production infrastructure? Maybe. Maybe not. But someone needs to make that call explicitly, not have it happen by default because a developer said "yes" to a permission prompt.
  • Which MCP servers should be allowed? This turned out to be the biggest one. MCP servers are how agents get access to external tools and services, and developers connect them without thinking about it from a security angle. Having an allowlist and blocking everything else by default was the single most impactful decision we made.
  • What can each agent actually do? Not just "Claude Code is allowed." Can it execute arbitrary shell commands? On which machines? Can it read files outside the project directory? Can it make network requests to unapproved endpoints? This level of scoping is where policy gets real.
  • What are the device requirements? Should agents run on unmanaged devices? On machines without disk encryption? If endpoint protection gets disabled mid-session, should the agent keep running?
  • How granular do the rules need to be? Can you evaluate not just which tool is being called, but what arguments are being passed? "Allow file reads in /src, block file reads in /etc or ~/.ssh/." That's the difference between a policy that's useful and a policy that's theater.

The key thing we landed on: these policies have to be enforced at runtime, automatically, deterministically. Not "communicated in a Slack message and hoped for." If an agent tries to do something that violates policy, the action gets blocked before it executes. Otherwise, you have guidelines, not governance.

Finally, you have to be able to prove it.

This is the part that matters when an auditor shows up, when leadership asks about AI risk, or when a board member wants to know what controls are in place.

Visibility tells you what's happening. Governance controls what's allowed. Proving it requires a cryptographic audit trail: every agent action signed with a hardware-bound key tied to a verified person on a verified device. Not logs that can be edited or deleted. Tamper-proof evidence that a specific person, on a specific machine, took a specific action, and that the policy engine evaluated it against a specific rule set.

This is what takes you from "I believe we're secure" to "here's the evidence." And it's what makes compliance (SOC 2, FedRAMP, PCI-DSS) tractable. If your system produces cryptographic audit trails by default, evidence collection isn't a quarterly scramble. It's an export.

When we thought about what we're working toward, this was it: the ability for someone to point at any agent action and see the full chain. Who did it. What device. What the device's security posture was at that exact moment. What tools were called. What arguments were passed. What the policy said. What happened. All signed, all verifiable. We're not there yet. But it's a clear target.

Where we ended up

We went through this whole progression. From "everyone's using AI, this is great" to "wait, what's actually happening" to "okay, here's how we think about securing this."

The framework we came up with is basically a maturity model:

Level 0: Blind. You don't know what's running. This is where we were. This is where most orgs are right now.

Level 1: Visible. You can see every agent, every tool call, every MCP connection, every device posture. You know what's happening.

Level 2: Governed. You've written policies and they're enforced at runtime. Unapproved MCP servers are blocked. Tool access is scoped. Device requirements are continuous.

Level 3: Provable. Every action has a cryptographic audit trail tied to a verified identity. You can prove what happened to anyone who asks.

Here's a quick way to figure out where you stand:

  • Can you list every AI agent running in your environment right now?
  • Do you know which MCP servers are connected to those agents?
  • Is every agent session tied to a verified human identity?
  • Are device posture requirements enforced continuously, not just at login?
  • Do you have runtime policies that block unauthorized tools and MCP servers?
  • Can you produce a cryptographic audit trail for any agent action?

This is why we built what we're building

I'll be honest about the punchline.

Going through this process internally, realizing that these questions were unanswerable, and that the existing security tooling wasn't built to answer them, is a big part of why we're building what we're building.

We're a security company. We watched this adoption wave happen inside our own walls first. We lived the experience of championing AI tools and then having to reckon with what it means when adoption actually succeeds. And we realized that the answer has to start on the device, where the agent actually lives, not at the network edge where you only see what already left the building.

More on that soon.

But regardless of what tools you use to get there, the framework holds. The six questions are the right questions. The progression from blind to visible to governed to provable is the right path. And the starting point is the same for everyone: be honest about how many of those questions you can answer today.

If you just bought Claude Code for your org and you're the person everyone's looking at to figure out what comes next, start with the whiteboard. Write down the six questions.

That’s where we started too.

We decided to build the control plane we couldn’t find. If you want in before launch, here’s where to start https://beyondidentity.ai

Follow me on X: x.com/coltonchojnacki

Colton Chojnacki, Product Manager