The Agentic Kill Chain: A Five-Phase Framework for LLM Security

Community Article Published May 15, 2026

Aleph Beth — submitted to the HuggingFace community blog Reading time: ~15 minutes

Imagine an agent built with the stack most readers of this blog will recognize. A function-calling model. A handful of tools — a web fetcher, a file reader, perhaps a code interpreter. A retrieval layer over a vector store. A modest set of credentials, scoped to the team's workspace. The agent ingests a document. The document might be a Slack export, a webpage, a PDF a colleague forwarded. Somewhere in that document — line 412, embedded in a footnote, written in white text on white background, hidden in a Markdown comment — is an instruction.

What does compromise mean here? Where does it begin? Where does it end?

The agent-security literature has multiplied in the last two years, but it lacks a shared sequential vocabulary for that question. Taxonomies abound — risks, threats, classes of attack — but few of them order their content the way the attacker actually moves. The cyber-defense tradition has had an answer to this problem for fifteen years: the kill chain. This article proposes a kill chain for agentic LLM systems, adapted to the new threat model, and argues that one phase in it — privilege — does almost all the analytical work, and that one defensive principle — least privilege — does almost all the structural defense.

The article is a reading frame, not a playbook. It speaks to builders of agents who want to think clearly about the security of what they ship.

Why we need an agentic kill chain

Three reference frameworks already populate the space, and each does part of the work.

OWASP's Top 10 for LLM Applications gives a taxonomy of risks — Prompt Injection (LLM01), Excessive Agency (LLM06), System Prompt Leakage (LLM07), and the rest. It is invaluable as a checklist. But it is unordered. It lists; it does not sequence. A builder reading it learns what can go wrong, not in what order things go wrong.

MITRE's ATLAS (Adversarial Threat Landscape for AI Systems) mirrors the structure of MITRE ATT&CK, tactic by tactic, technique by technique, applied to ML systems. It is more granular than OWASP and more empirical — it catalogs known incidents and adversary behaviors. But its granularity is built for post-hoc analysis. It is the framework you use after an incident, to label what happened. It is less suited to the design-time question: given the agent I am about to build, what can go wrong and where?

NIST's AI Risk Management Framework sits one floor higher: governance, principles, organizational practices. It tells you that you should care. It does not tell you what shape the attack takes.

What is missing is the builder's view: a phased sequence that lets a designer ask, slide by slide of their architecture, what could go wrong at this step, and what defends it? A kill chain delivers exactly that. The original Lockheed Martin Cyber Kill Chain (2011) decomposed a network intrusion into seven phases — reconnaissance, weaponization, delivery, exploitation, installation, command and control, actions on objectives — each with its own defensive opportunity. The metaphor has been critiqued in classical cyber (it underrepresents lateral attacks, it suggests an over-linear flow), and we should borrow only its structural form: phased compromise, phase-specific defense. We are not committing to its linearity.

With that frame in place, here is the chain.

The five phases

Phase 1 — Reconnaissance of the agentic perimeter

Before any attempt at compromise, the attacker maps what the agent can do. This is the agentic equivalent of footprinting in classical cyber.

The reconnaissance surface is wider than it first appears. The agent itself talks: ask it what tools it has, and a meaningful proportion of deployed agents will tell you. Ask it about its system prompt and a smaller but non-trivial proportion will quote it. Send malformed inputs and watch the error messages — they often leak tool schemas, model identifiers, retry counts. Time the responses and the latency profile reveals when tool calls happen and when they don't.

Beyond the agent's own output, the attacker observes the connectors. Which authentication scheme is exposed by the file connector? Which workspace identifier appears in the URL of a generated link? Does the agent's behavior change when given inputs in a language other than the deployment language?

The defensive correspondence at Phase 1 is surface reduction. Every capability the agent does not need, it should not have. Every introspection the agent volunteers, it should be configured not to volunteer. This is the OWASP LLM07 problem in its design-time form. Whatever leaks at this phase is intelligence the attacker carries into every subsequent phase.

It is worth noting that reconnaissance is not benign in the agentic context the way it can sometimes seem in classical cyber. A capability inventory of a function-calling agent is not equivalent to an Nmap scan of a webserver — it is closer to reading the binary's symbol table while it is running. The cost of leakage is higher.

Phase 2 — Privilege evaluation (the pivot)

This is the decisive phase. Everything that follows depends on its answer.

The question is binary: does the agent, at the moment of attack, already hold a primitive of execution — a shell, a code interpreter, write access to a filesystem, the ability to send a transaction, the ability to make an authenticated call to a system that itself executes? Or does it not?

Two regimes follow.

In the first regime — call it Voie A, the direct-execution agent — the attacker effectively already has root via the agent. Whatever surface manipulation is required to get the agent to use its execution primitive is, by construction, smaller than the surface that would be required to manufacture one. The attack skips Phase 3 entirely and proceeds to lateralization. The agent is operationally equivalent to a shell with a natural-language frontend, and inherits the threat model of a shell.

In the second regime — Voie B, the constrained agent — the attacker must manufacture the execution primitive. This is the regime that current LLM-security literature has most extensively studied, because it is where the novel attack surface of LLMs becomes legible. The prompt-injection literature, the tool-abuse literature, the AgentDojo benchmark — all of it inhabits Voie B.

The distinction is under-emphasized in current writing. Treating an agent with shell access as "an agent with tool use" obscures that it is operationally a shell. A useful illustration: a tool that simplifies shell management for non-developers — and there are several emerging in 2025 — must inherit the security posture of a shell, not the posture of a chat interface. The conflation between "convenient agent" and "ordinary chatbot" is precisely the conflation Phase 2 forces us to refuse.

The defensive question at Phase 2 is architectural and is asked once: which regime is my agent in, and is that the regime I intended? The answer determines how much defensive effort to deploy at Phases 3, 4, and 5.

Phase 3 — Building the execution primitive

For agents in Voie B, the attacker's work is capability subversion: transform a legitimate capability into a vector. The literature has converged on two principal families.

Prompt injection. The instruction-following property of the model is turned against the operator. In its direct form, the user's input contains instructions intended for the system rather than data for it. In its indirect form — first named by Greshake et al. in 2023 (Not what you've signed up for) — the instruction is embedded in third-party content that the agent ingests through retrieval, web fetch, file connectors, or any other ingestion mechanism. Indirect prompt injection is the most architecturally consequential class of attack in this phase, because the attacker need not be the user. Any author of any document the agent might ever read becomes a potential injector.

Tool abuse. A function implemented for a legitimate purpose is invoked toward an effect its designer did not anticipate. The web-fetch tool is asked to fetch a URL that triggers an SSRF condition in an internal service. The file-read tool is invoked on a path the agent's principal should not be reading. The send-message tool is invoked with content the operator did not authorize. This is OWASP LLM06 (Excessive Agency) in its operational form. The AgentDojo benchmark (Debenedetti et al., 2024) is currently the most rigorous public attempt to measure how often, and under what conditions, agents fail this class of test.

The defensive correspondence at Phase 3 is isolation and validation. Sandbox each tool; treat its output as untrusted; validate inputs against expected schemas; segregate contexts so that retrieved content cannot occupy the instruction position; gate capabilities behind explicit per-step authorization where the blast radius is large.

What Phase 3 reveals is that the LLM is not a participant in security — it is the medium through which security must be defended. The attacker's reach passes through the model. Anything one would protect against the model, one protects upstream and downstream of it, never inside it.

Phase 4 — Lateralization and persistence

Once an execution primitive exists, the attack rejoins the classical cyber pattern with some novel agentic shapes.

The classical pattern is familiar: pivot through the agent's credentials to other connected systems, exfiltrate data, install persistence. The agentic shapes that deserve naming are these.

Persistence in memory. An agent with long-term memory is a system in which the attacker, having reached the agent once, can write instructions that re-trigger on future sessions. This is a persistence mechanism without a classical analogue. The agent does not need a backdoor binary; it needs a sentence in its own memory file.

Persistence in retrieval. An attacker who can write to any document that the agent — or its successor — will later ingest via retrieval has installed a persistent injection. The defensive concept of trust boundary must extend to every document corpus the retrieval layer touches.

Persistence in tool descriptions. Some agentic frameworks read tool descriptions or system prompts at startup from configurable sources. An attacker who modifies those at rest has effectively rewritten the agent's mandate. This is the agentic equivalent of modifying a service's systemd unit.

The defensive correspondence at Phase 4 is credential scoping and revocation. The agent's credentials should be narrow, time-bounded, and revocable. Its memory should be auditable. Its retrieval corpora should have ownership and integrity controls. The agent should not, structurally, be able to harm more than it was asked to help.

Phase 5 — Effects

The final phase corresponds to actions on objectives in the Lockheed chain. The attacker achieves what the attack was for: exfiltration of data, manipulation of decisions (an agent that recommends the attacker's chosen vendor, marks the attacker's invoice as paid, opens the attacker's pull request), integrity attacks on downstream systems, fraud.

The defensive opportunity at Phase 5 is mostly limited. By the time effects manifest, prevention has failed. What remains is detection, audit, and containment — verifying that the effect was attributable, recording the trail, limiting how far it propagated.

This is why the architecture of the previous four phases matters so much. Phase 5 is where the consequences of design decisions are paid. The defensive investment that yields the most return is made earliest in the chain.

The defensive primitive: least privilege as transverse rule

Each phase has its own counter-measures. Surface reduction at Phase 1. Architectural separation at Phase 2. Sandboxing and validation at Phase 3. Credential scoping at Phases 4 and 5. But one rule runs through all five: the agent receives only what is strictly necessary — capability, data, right, duration.

That sentence is not new. Least privilege has been a security principle since Saltzer and Schroeder formalized it in 1975. What is new is the claim that it functions, for agentic systems, as the primitive defense — the principle on which every other defensive measure rests.

The argument has four moves.

First, phase-by-phase, every defensive measure is a specialization of least privilege. Reducing surface (Phase 1) is least capability. Separating regimes (Phase 2) is least exposure of the agent's most powerful primitive. Sandboxing tools (Phase 3) is least blast radius per call. Scoping credentials (Phases 4-5) is least scope and least duration. There is no defensive practice in this chain that is not, under analysis, a form of least privilege. That alone justifies naming it the primitive.

Second, and this is where agentic least privilege diverges from classical least privilege: the distance between input and execution is shorter in an agentic system than in any conventional architecture. In a classical application, the user's input passes through parsing, validation, business logic, authorization checks, and only then reaches an action. The instruction-execution distance is high. In an agentic system, the model is itself the parsing-validation-business-logic stage, and attacker-controlled content can occupy instruction position directly. Classical least-privilege thinking dimensions the user and their session. Agentic least-privilege must dimension the input surface itself — because the input is the attacker's vector. This is the analytical key. The user-account is no longer the unit of privilege; the content is.

Third, the design questions that follow are concrete, even if the answers depend on the system. For each tool the agent can call: what is the blast radius of a malicious invocation, and is the agent's principal scoped to that blast radius or wider? For each data source ingested into context: is its content trusted enough to occupy instruction position, and if not, what segregation prevents it from doing so? For each session: how long-lived are the agent's privileges, and at what point are they re-checked against fresh authorization? For each output the agent produces: what action in the world does it authorize, and what audit trail does that authorization leave?

Fourth, the slogan. Least privilege for agents distills to a sentence that a non-specialist can carry into a design review: an agent should have the right to do what is expected of it — and nothing else. The phrase is portable. It can be applied to a function-calling chatbot, a customer-service automation, a coding assistant with shell access, a desktop agent that controls a browser. The form of the answer changes; the form of the question does not.

This is what we mean when we say least privilege is the primitive defense of agentic AI. Not that it is the most important among many — that it is the principle that organizes the others.

Conclusion

Three takeaways for the builder.

First, the kill chain is a reading frame, not a procedure. Used well, it lets a builder ask the right question at the right slide of their architecture. Used badly, it becomes another checklist. The structural insight is the phasing, not the labels.

Second, the pivot at Phase 2 is the design decision that matters. Whether the agent holds direct execution or has to manufacture one determines almost everything downstream — the threat model, the defensive investment, the operational class of the system. Decide it deliberately. Treat the conflation between "agent" and "agent with shell" as a category error.

Third, least privilege is not best-practice hygiene for agents; it is the primitive defense. Every other measure is application. As agentic systems move from chat assistants to operational components of organizations — booking, transacting, deploying, communicating — the cost of getting Phase 2 wrong grows. The kill-chain framework is a small intellectual investment that pays off the moment an architecture review begins.

The question, in the end, is not whether the model is intelligent. The question is what the agent is allowed to touch, and how far that reach is from the world.

References

MITRE ATLAS — Adversarial Threat Landscape for AI Systems. atlas.mitre.org
OWASP — Top 10 for Large Language Model Applications, current version.
NIST — AI Risk Management Framework (AI 100-1), 2023.
Hutchins, Cloppert, Amin — Intelligence-Driven Computer Network Defense Informed by Analysis of Adversary Campaigns and Intrusion Kill Chains, Lockheed Martin, 2011.
Greshake, Abdelnabi, Mishra, Endres, Holz, Fritz — Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection, AISec 2023.
Debenedetti, Zhang, Balunović, Beurer-Kellner, Fischer, Tramèr — AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents, NeurIPS 2024 Datasets and Benchmarks.
Saltzer, Schroeder — The Protection of Information in Computer Systems, Proceedings of the IEEE, 1975.

Aleph Beth — cybersecurity for AI models, agents, and pipelines. Bilingual, French and English markets.

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote