Igor Kozlov
Why AI Agents are the future of Security Operations
TL;DR: Learn how AI Agents fix the current shortcomings of Large Language Models and how they can continuously improve the Security Operations Center procedures in the future.
What is an Artificial Intelligence Agent?
We need to clarify certain terms to make further discussion more productive. It’s important to note upfront that any taxonomy is approximate and ever evolving; however, viewing it from an evolutionary perspective helps to see the bigger picture.
AI Agent
The distinction between a program and an agent leads us back to the discussion in the widely-cited paper "Is it an Agent, or just a Program?"—which has amassed over 5,000 citations, confirming the complexity of this seemingly intuitive question.
A modern, well-regarded AI textbook defines an agent as "anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators." An (anecdotal) example of an agent is a wall thermostat, which senses the environment by measuring the temperature and acts by turning on the heating when the temperature drops below a set value. Unlike programs, agents exhibit autonomy when interacting with their environment, using actions to achieve set goals.
Learning AI Agent
The terms agent, environment, and action are also central to another influential field of Machine Learning—Reinforcement Learning (RL)—which, together with Deep Learning, has led to breakthroughs like AlphaGo. AlphaGo defeated the human world champion in Go, a game with a substantially larger action space than chess, making brute force approaches infeasible. AlphaGo succeeded by using methods like Monte Carlo Tree Search (MCTS) to evaluate various sequences of actions and predict those most likely to win, without an exhaustive search.
By definition, RL agents have a distinct feature: they learn to optimize their actions in an environment to achieve a goal. Provided with a metric to optimize, RL agents can explore their environment and, in many cases, discover novel strategies that outperform human approaches, sometimes sacrificing short-term rewards for long-term gains.
Learning becomes essential part of AI Agents and programs
The power of RL has not escaped Large Language Models (LLMs). This includes not only RLHF (Reinforcement Learning from Human Feedback), which helps align LLM outputs with human expectations, but also recent advances that use MCTS-like methods to explore multiple solutions to math and coding problems with verifiable answers.
Despite incorporating agentic elements during training, we generally consider (stand-alone) LLMs as programs. Certain LLM architectures can be made deterministic by adjusting parameters, and the variability in outputs often results from artificial inputs or system factors like batch processing on distributed systems. LLMs lack true autonomy and learning, though one could argue they "learn" by being continuously retrained on new data and refined by user feedback. However, the point when LLMs are connected to digital environments—tracking user behavior and adjusting outputs to optimize metrics—still seems distant, though not unreachable.
Now that we’ve clarified the terms, let's consider the practical gains offered by agentic systems compared to using LLMs alone.
How can AI Agents fix LLMs shortcomings?
We’ll now discuss three inherent features of LLMs, which lead to shortcomings when they are used in stand-alone mode. We’ll provide practical examples of how AI Agents can improve performance of Generative AI applications in Security Operations context.
Consequences of sequential generation
One of the main limitations of LLMs stems from the way they generate responses. When answering, LLMs do so word by word (or token by token), with each subsequent word influenced by previous ones. If an LLM begins to hallucinate, it continues to build on this incorrect information until the end of its response. This happens because LLMs don’t truly "understand" their outputs; they simply predict the most probable next word. Contextual confusion is inevitable, especially when terms have different meanings in different fields (e.g., "model" in ML vs. fashion). Unlike agents, LLMs cannot pause and correct themselves, making it beneficial to restart or edit earlier parts of a conversation rather than continuing a faulty dialogue.
In contrast, agents could work in pairs: one to generate an answer and another to validate its accuracy, prompting regeneration if hallucinations occur. While the same LLM could handle both roles, from a human perspective, it's easier to view this as a pair of specialized agents. Additionally, using different LLMs for various agents can yield performance, latency, and cost benefits. For instance, a security summarizer AI Agent could employ a simpler, more cost-effective architecture fine-tuned for the task.
Limited reasoning capabilities
Another limitation of LLMs is their tendency to go straight to a solution, often skipping steps. Users typically want concise answers, and the cost per token encourages brevity.
However, this approach can reduce the quality of the output. Numerous studies show that asking LLMs to think step-by-step (Chain of Thought) significantly improves performance in answering logical questions. Some LLMs employ "scratch space" or memory tokens to track intermediate results or list pros and cons before arriving at a final answer. However, how much control one has over LLM outputs can be significantly affected by the implementation of the agentic flow. An AI Agent might prompt the generation-validation team to explore multiple perspectives, record intermediate steps, and assess progress. This approach is particularly valuable when LLMs struggle with problems that require iterative exploration (such as security investigation, which does not have a well-defined path), as agents can interact with the environment (using tools like a computer terminal or Vision-Language Models) and quickly test multiple strategies—mirroring RL’s approach to optimizing actions.
Lack of relevant knowledge and limited planning capabilities
Lastly, LLMs have a knowledge cutoff. They struggle to answer questions about data that was unavailable during their training. To address this, LLMs often rely on Retrieval-Augmented Generation (RAG), where external data is fetched and appended to the input question. Agents enhance RAG systems by validating and filtering relevant documents and providing multiple perspectives on the environment, such as through graph-based approaches. AI Agents have also been shown to be more flexible in constructing knowledge graphs from collections of unnormalized documents. The power of agent flows lies in our ability to simplify planning (notoriously challenging for LLMs) and break down complex tasks, leaving the smaller, more manageable subtasks to LLM-powered agents.
The Future of Security Operations and AI Agents
We’ve covered a lot of ground on the distinctions and advantages of agents, but we’ve only scratched the surface of their potential applications. One exciting future scenario is autonomous blue-red team training. As LLM generation costs decrease and speed improves, we can envision network environments where blue and red agent teams continuously hone their skills. Blue team AI Agents could become ubiquitous, offering cost-effective defense against increasingly complex red team attacks. Agentic flows will continue to enhance red teaming, particularly in areas like jailbreaking, as regulations scrutinize offensive uses of LLMs. Security experts, armed with insights from blue team AI Agents across multiple environments, will be invaluable, especially as the proliferation of agentic tools risks diminishing individual knowledge due to ease of use.
- Though our predictions about the future are speculative, we firmly believe that the best way to predict the future is to build it, and we invite you to do so with us.