Loading...
Loading...

How do you prepare your SOC for AI, and how do you stop talking about it and get moving? Anton provides a comprehensive framework of what to ask, what to do, and what you'll get built around five pillars of AI-ready SOC.
In early 1900s, factory owners bolted the new electric dynamo onto their old, central-shaft-and-pulley systems. They thought they were modernizing, but they were just doing a "retrofit." The massive productivity boom didn't arrive until they completely re-architected the factory around the new unit-drive motor.
Retrofitting, as imagined by Gemini
Today's AI agent slapped onto a broken, 1990s-style SOC process stack is the same. Everyone is chasing the shiniest LLM or agentic system to "AI-enable" their existing, often sclerotic, processes. The result is an AI retrofit that instantly slams into deeper, systemic bottlenecks. As a result, you see a lot of this:
Bungled AI SOC transition
How do we do better? Let's see what we can actually do to make your SOC AI-ready.
We'll structure this around Five Pillars of an AI-ready SOC:
Now, the details:
This one is my absolute favorite and is at the center of most "AI in SOC" successes (if done well) and failures (if not done at all). Put simply, for the AI to work for you, it needs your data. If the context is wrong then even the most sophisticated model will arrive at inaccurate conclusions.
Security context (why context?) and data are available and can be queried by machines (API, MCP, etc) in a scalable and reliable manner (Both! If unreliable, humans will need to fix it and the project dies). Scalable, fast and reliable matter, as agents can screen scrape well, but you probably won't use this to get a gig of mainframe logs via tn3270. Federated often also means "not scalable and reliable" BTW. Because access to cheap/slow storage is, well, slow.
Of course, while availability and reliability are crucial, "AI ready SOC" also means data quality, structure, and governance. GIGO is still law! Scalability is necessary, but the quality of the ingested security context is the difference between this AI thing working … or not.
Conduct an "API or Die" data access audit to inventory critical data sources (telemetry and context) and stress-test their APIs (or other access methods) under load to ensure they can handle frequent queries from an AI agent. I'll be writing more on this in the future…
Establish or refine unified, intentional data pipelines for the data you need. This may be your SIEM, this may be a separate security pipeline tool, this may be magic for all I care … but it needs to exist. I met people who use AI to parse human analyst screen videos to understand how humans access legacy data sources, and this is very cool, but perhaps not what you want in prod.
Revamp case management to force structured data entry (e.g., categorized root causes, tagged MITRE ATT&CK techniques) instead of relying on garbled unstructured text descriptions, which provides clean training data for future AI learning. And, yes, if you have to ask: modern gen AI can understand your garbled stream of consciousness ticket description…. but what it makes of it, you will never know…
Your AI component, AI-powered tool or AI agent can get the data it needs nearly every time. The cases where it cannot become visible, and obvious immediately.
"AI in SOC" must be built on machine executable workflows.
Common SOC workflows CANNOT rely on human-to-human communication ("nobody knows what server4 does, let's see if John knows, well, he does not, but he suggested Joanna does, and — WIN! — she really does" workflows are not agent-friendly) are essential for AI success. If your SOC has a lot of ad hoc activities, agents will (at least initially!) have trouble.
Worse news: weak process (this pillar #2) is very often a close friend of weak data access (pillar #1) so they "double-team" your agentic effort to oblivion. This has sunk plenty of SOAR projects in its time.
Ultimately, "If your teams don't know who owns what, neither will your Agents." Your SOC processes must be documented, validated, and capable of being scaled and learned from (see pillar #5 below). This includes a way to train AI on past work and SOC history.
Codify the "Tribal Knowledge" into APIs: Stop burying your detection logic in dusty PDFs or inside the heads of your senior analysts. You must document workflows in a structured, machine-readable format that an AI can actually query. If your context — like CMDB or asset inventory — isn't accessible via API (BTW MCP is not magic!), your AI is essentially flying blind.
Draw a Hard Line Between Agent and Human: Don't let the AI "guess" its level of authority. Explicitly delegate the high-volume drudgery (log summarization, initial enrichment, IP correlation) to the agent, while keeping high stakes "kill switches" (like shutting down production servers) firmly in human hands.
Implement a "Grading" System for Continuous Learning: AI shouldn't just execute tasks; it needs to go to school. Establish a feedback loop where humans actively "grade" the AI's triage logic based on historical resolution data. This transforms the system from a static script into a living "recipe" that refines itself over time.
Target Processes for AI-Driven Automation: Stop trying to "AI all the things." Identify specific investigation workflows that are candidates for automation and use your historical alert triage data as a training ground to ensure the agent actually learns what "good" looks like.
The "tribal knowledge" that previously drove your SOC is recorded for machine-readable workflows. Explicit, structured handoff points are established for all Human-in-the-Loop processes, and the system uses human grading to continuously refine its logic and improve its 'recipe' over time. This does not mean that everything is rigid; "Visio diagram or death" SOC should stay in the 1990s. Recorded and explicit beats rigid and unchanging.
You say "fluffy management crap"? Well, I say "ignore this and your SOC is dead."
Cultivating a culture of augmentation, redefining analyst roles, providing training for human-AI collaboration, and embracing a leadership mindset that accepts probabilistic outcomes. You really, really need executives who support an "augmented" AI SOC vision, not those who seek to "kill off" the humans.
Also, they should accept that machines will make mistakes, and that is OK. In fact, leaders must not just accept "probabilistic outcomes," but explicitly be comfortable with the machine resolving some alerts, even if it's sometimes wrong. This acceptance of necessary imperfection is a core readiness indicator. If they expect perfection, you will have AI SOC for a month. And then go back to printing logs and reviewing them with sad little human eyes :-)
Implement the "AI Error Budget": Stop pretending AI will be 100% accurate. You must secure formal CISO sign-off on a quantified "AI Error Budget" — a predefined threshold for acceptable mistakes. If an agent automates 1,000 hours of labor but has a 5% error rate, the leadership needs to acknowledge that trade-off upfront. It's better to define "allowable failure" now than to explain a hallucination during an incident post-mortem.
Pivot from "Robot Work" to Agent Shepherding: The traditional L1/L2 analyst role is effectively dead; long live the "Agent Supervisor." Instead of manually sifting through logs, work that is essentially "robot work" anyway, your team must be trained to review, grade, and edit AI-generated logic. They are no longer just consumers of alerts; they are the "Editors-in-Chief" of the SOC's intelligence.
Rebuild the SOC Org Chart and RACI: Adding AI isn't a "plug and play" software update; it's an organizational redesign. You need to redefine roles: Detection Engineers become AI Logic Editors, and analysts become Supervisors. Most importantly, your RACI must clearly answer the uncomfortable question: If the AI misses a breach, is the accountability with the person who trained the model or the person who supervised the output?
Well, you arrive at a practical realization that you have "AI in SOC" (and not AI SOC). The tools augment people (and in some cases, do the work end to end too). No pro- ("AI SOC means all humans can go home") or contra-AI ("it makes mistakes and this means we cannot use it") crazies nearby.
If your tools lack APIs, take them and go back to the 1990s from whence you came! Destroy your time machine when you arrive, don't come back to 2026!
Implementing integrated and interoperable technologies that support intelligent systems and embed AI into existing workflows. This one is least critical of my pillar batch, but still it matters. Also, it is often a dependency for #1, so this matters as well.
The criticism is that a single "AI tool" is not the goal. The technology stack must ensure the entire security ecosystem is interoperable and flexible enough to support the other pillars. This means you can remediate, mitigate, etc.
Mandate "Detection-as-Code" (DaC): This is no longer optional. To make your stack machine-readable, you must implement version control (Git), CI/CD pipelines, and automated testing for all detections. If your detection logic isn't codified, your AI agent has nothing to interact with except a brittle GUI — and that is a recipe for failure.
Find Your "Interoperability Ceiling" via Stress Testing: Before you go live, simulate reality. Have an agent attempt to enrich 50 alerts simultaneously to see where the pipes burst. Does your SOAR tool hit a rate limit? Does your threat intel provider cut you off? You need to find the breaking point of your tech stack's interoperability before an actual incident does it for you.
Decouple "Native" from "Custom" Agents: Don't reinvent the wheel, but don't expect a vendor's "native" agent to understand your weird, proprietary legacy systems. Define a clear strategy: use native agents for standard tool-specific tasks, and reserve your engineering resources for custom agents designed to navigate your unique compliance requirements and internal "secret sauce."
This sounds like a perfect quote from Captain Obvious but you arrive at the SOC powered by tools that work with automation, and not with "human bridge" or "swivel chair."
You are ready for AI if you can, after adding AI, answer the "what got better?" question.
You need metrics and a feedback loop to get better. And to know you got better. If you "add AI" to a bad, old SOC, not only you won't get better, you won't even know you didn't get better.
Metrics are a must here. Without a defined way to measure value and feed the results back into the AI models and processes, the transformation risks stalling into at best a "retrofit", a nothing or even a worse situation…
Establish the "Before" Baseline and Fix the Data Slop: You cannot claim victory if you don't know where the goalposts were to begin with. Measure your current MTTR and MTTD rigorously before the first agent is deployed. Simultaneously, force your analysts to stop treating case notes like a private diary. Standardize on structured data entry — categorized root causes and MITRE tags — so the machine has "clean fuel" to learn from rather than a collection of "fixed it" or "closed" comments.
Build an "AI Gym" Using Your "Golden Set": Do not throw your agents into the deep end of live production traffic on day one. Curate a "Golden Set" of your 50–100 most exemplary past incidents — the ones with flawless notes, clean data, and correct conclusions. This serves as your benchmark; if the AI can't solve these "solved" problems correctly, it has no business touching your live environment.
Adopt Agent-Specific KPIs for Performance Management: Traditional SOC metrics like "number of alerts closed" are insufficient for an AI-augmented team. You need to track Agent Accuracy Rate, Agent Time Savings, and Agent Uptime as religiously as you track patch latency. If your agent is hallucinating 5% of its summaries, that needs to be a visible red flag on your dashboard, not a surprise you discover during an incident post-mortem.
Close the Loop with Continuous Tuning: Ensure triage results aren't just filed away to die in an archive. Establish a feedback loop where the results of both human and AI investigations are automatically routed back to tune the underlying detection rules. This transforms your SOC from a static "filter" into a learning system that evolves with every alert.
You have a fact-based visual that shows your SOC becoming better in ways important to your mission after you add AI (in fact, your SOC will get better even before AI but after you do the prep-work from this document)
The pillars should be framed not just as pre-requisites for AI adoption, but as the building blocks for a completely re-architected Security Operations Center. The transformation is about reimagining the whole way of doing things, not just accelerating one piece of an old process. As a result, we can hopefully get to this instead of where we started.
The content of this chapter is taken from Anton's recent blogs "Simple to Ask: Is Your SOC AI Ready? Not Simple to Answer" (October 25, 2025) and "Beyond 'Is Your SOC AI Ready?' Plan the Journey!" (January 9, 2026) on Medium at https://medium.com/@anton.chuvakin.
Read the full ebook → Security for Winners: The Art of Using AI to Secure Your Company and Get Yourself Promoted