Context Windows, Tools, and Termination

In Part 1, we covered the fundamental loop that drives AI agents: Input → Reason → Act → Observe → Repeat. That loop is the engine. But every engine runs within constraints.

This article explores three critical concepts that shape how agents operate:

The context window — The agent's working memory and its limits
Tools — How agents take action in the real world
Termination — How agents know when they're actually done

Understanding these mechanics helps you work within the system's constraints rather than fighting against them.

The Context Window: Working Memory

The context window is everything the agent can consider at once. Think of it as the agent's desk—only what's on the desk can influence the current decision.

┌─────────────────────────────────────────────────────────────────┐
│                    THE CONTEXT WINDOW                           │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   ┌─────────────────────────────────────────────────────────┐   │
│   │   SYSTEM PROMPT                                         │   │
│   │   Base instructions, personality, constraints           │   │
│   ├─────────────────────────────────────────────────────────┤   │
│   │   YOUR PROJECT RULES                                    │   │
│   │   Custom instructions for this codebase                 │   │
│   ├─────────────────────────────────────────────────────────┤   │
│   │   CONVERSATION HISTORY                                  │   │
│   │   Previous messages, tool calls, results                │   │
│   ├─────────────────────────────────────────────────────────┤   │
│   │   FILES AND DATA LOADED                                 │   │
│   │   Code, configs, logs you've pulled in                  │   │
│   ├─────────────────────────────────────────────────────────┤   │
│   │   CURRENT PROMPT                                        │   │
│   │   What you're asking right now                          │   │
│   └─────────────────────────────────────────────────────────┘   │
│                                                                 │
│   LIMIT: ~200,000 tokens (roughly 500 pages of text)            │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

That sounds like a lot—500 pages. But it fills up faster than you'd expect.

Why Context Limits Matter

Here's the uncomfortable truth: you're always in competition for context space.

Your project rules compete with conversation history. The files you've loaded compete with the system prompt. As conversations grow longer, earlier context gets pushed toward the margins.

For security practitioners, this has practical implications:

Loading entire log files is expensive. That 10,000-line auth log you just asked the agent to analyze? It's consuming context that could be used for reasoning. Sometimes that's necessary. Often, filtering first is smarter.

Long investigations accumulate baggage. By hour two of an incident investigation, your conversation history might be consuming half the available context. The agent is reasoning with one hand tied behind its back.

Your rules are always present. Whatever you've configured as standing instructions takes up space on every single interaction. Keep them focused.

The skill here is context curation—being strategic about what information you bring into the window and when.

Practical Context Management

Think of context management like managing an investigation workspace:

Situation	Strategy
Starting fresh task	Reset conversation to clear history
Loading large files	Filter or excerpt before loading
Long-running investigation	Periodically summarize and reset
Multiple parallel threads	Separate conversations per thread
Standing instructions	Keep them minimal and high-signal

The agent can't manage its own context—you have to do it. When you notice degraded performance in long conversations, context exhaustion is often the culprit.

Tools: How Agents Take Action

Agents aren't just text generators with opinions. They can take real actions in the real world through tools.

┌─────────────────────────────────────────────────────────────────┐
│                    AGENT TOOLS                                  │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   BUILT-IN TOOLS (typical coding agents):                       │
│   ├── Read       → Read file contents                           │
│   ├── Write      → Create or overwrite files                    │
│   ├── Edit       → Modify existing files                        │
│   ├── Bash       → Execute shell commands                       │
│   ├── Search     → Find files or content patterns               │
│   └── ...        → Many more depending on the agent             │
│                                                                 │
│   EXTENDED TOOLS (via MCP or plugins):                          │
│   ├── Database queries                                          │
│   ├── API integrations                                          │
│   ├── Custom security tools                                     │
│   └── Anything you configure                                    │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

When an agent decides to use a tool, here's what happens:

Agent reasons that it needs information or needs to take action
Agent specifies which tool to use and with what parameters
Tool executes outside the model (this is real execution, not simulation)
Results return to the agent
Agent incorporates results and continues reasoning

That third step is crucial: tool execution is real. When the agent writes a file, the file exists. When it runs a command, the command executes. This isn't roleplay.

Tools for Security Work

The extensibility of tools is where things get interesting for security practitioners. Out of the box, an agent can read files and run commands. But with proper configuration, you can extend this to:

Query your SIEM directly
Check indicators against threat intel APIs
Pull logs from specific timeframes
Execute forensic collection scripts
Interface with ticketing systems

The agent's reasoning capabilities combined with security-specific tools creates something more powerful than either alone. The agent can decide what to query based on its analysis; the tools give it the ability to query.

This is the foundation of agentic security workflows—but that's a topic for another article.

How Agents Know They're Done

The "decide if done" step in the agent loop deserves its own examination. How does an agent know when a task is complete?

There's no single answer. Different systems use different approaches:

┌─────────────────────────────────────────────────────────────────┐
│                 TERMINATION STRATEGIES                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   1. SELF-DETERMINED (LLM Judgment)                             │
│      Agent reasons about whether the goal is met                │
│      "I've analyzed the logs, identified the IOCs,              │
│       documented findings → task complete"                      │
│      ↳ Flexible, but can miss edge cases                        │
│                                                                 │
│   2. DETERMINISTIC VERIFICATION                                 │
│      External check with clear pass/fail                        │
│      "Run detection → alert fires → working"                    │
│      "Execute test suite → all pass → done"                     │
│      ↳ Unambiguous, but only for verifiable tasks               │
│                                                                 │
│   3. EVALUATOR AGENT                                            │
│      Separate model reviews the work                            │
│      Agent A investigates, Agent B validates findings           │
│      ↳ Catches errors, but adds latency and cost                │
│                                                                 │
│   4. HARD LIMITS                                                │
│      External constraints force stop                            │
│      Maximum iterations, token budget, timeout                  │
│      ↳ Safety net, not true completion detection                │
│                                                                 │
│   5. HYBRID APPROACHES                                          │
│      Combinations for robustness                                │
│      "Agent thinks done → run verification → confirm"           │
│      ↳ Most reliable for production workflows                   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Termination in Practice

Most coding agents today use a hybrid of self-determination and deterministic verification:

For tasks with clear success criteria ("make the tests pass"), the agent reasons until it believes it's done, then verifies with actual test execution
For open-ended tasks ("review this code for security issues"), the agent relies primarily on its own judgment about completeness

This is why clear success criteria matter. When you give the agent a verifiable goal, it can use deterministic termination. When the goal is fuzzy, you're relying on the model's judgment—which is good but not perfect.

Compare:

Fuzzy: "Improve the detection coverage"
Agent decides it's done based on... vibes? It added some rules, seems like more coverage, probably done.

Clear: "Write detections for T1003.001 through T1003.004 and verify each fires against the test samples in /tests/credential-access/"
Agent has explicit completion criteria: four techniques covered, each verified against samples.

You don't always need perfectly crisp criteria—sometimes exploration is the point. But when precision matters, define what "done" looks like.

Putting It Together

These three concepts interact constantly:

Context limits force tool use. You can't load a million-line log file into context—so the agent queries it through tools, bringing back only relevant excerpts.

Tool results consume context. Every search result, every file read, every command output goes into the context window. Verbose tools fill context fast.

Termination depends on context. The agent can only reason about completion based on what's in its context window. If critical information got pushed out by a long conversation, the agent might terminate prematurely.

Understanding these interactions helps you debug problems. Agent gave a shallow answer? Maybe context is exhausted. Agent keeps running without finishing? Maybe success criteria are unclear. Agent missed something obvious? Maybe the relevant information never made it into context.

Practical Implications

Understanding	Changed Behavior
Context is finite	Curate information strategically
History accumulates	Reset when switching major tasks
Tools have real effects	Review before authorizing destructive actions
Termination needs criteria	Define what "done" looks like
Everything competes for context	Keep rules and prompts focused

What You Don't Need to Know

This article deliberately skipped:

Transformer architecture and attention mechanisms
Token prediction and sampling strategies
Training data composition
Specific API implementations

These matter if you're building models. For working effectively with agents, the conceptual model we've covered is sufficient. You don't need to understand internal combustion to drive a car well—but you do need to understand that the car needs fuel and has a turning radius.

Key Takeaways

Context window = working memory. Finite space that everything competes for. Manage it actively.

Tools = real capabilities. Agents can read, write, execute, query. These actions have real effects.

Termination varies. Self-determined, verified, evaluated, or hard-limited. Clear criteria help.

These concepts interact. Context limits drive tool use. Tool results consume context. It's all connected.

Where to Go From Here

With the agent loop (Part 1) and these operational concepts (Part 2), you have a working mental model of how AI coding agents function. You understand the cycle, the constraints, and the mechanics.

The next step is learning how to communicate effectively within this system. That means shifting from imperative instructions ("do X, then Y, then Z") to declarative goals ("achieve this outcome"). That shift—the declarative mindset—is where the real leverage comes from.

Context Windows, Tools, and Knowing When to Stop