Every GTM AI workflow runs on tokens. Every dollar you spend on AI is a dollar spent on tokens. And whether your AI agent produces useful output or expensive hallucination comes down, more than almost anything else, to what tokens you put in the context window and how you put them there.
This guide builds the mental model: the working intuition that lets you make good architectural decisions, avoid the most expensive mistakes, and understand why some AI outputs are sharp and others are garbage. Think of it as the reasoning layer beneath the API documentation.
What a Token Actually Is
A token is the fundamental unit of text that language models process. It is not a word and it is not a character. It sits somewhere in between.
The practical conversion:
- 1 token = approximately 4 characters of English text
- 1 token = approximately 0.75 words
- 750 words = approximately 1,000 tokens
- To estimate tokens from a word count: multiply by 1.3 to 1.4
For common words, one word is often one token. For less common words, longer words, and technical terminology, a single word might become two or three tokens. Code and structured data tokenize less efficiently still. Non-English text is often worse.
Why it matters: Every model has a context window measured in tokens. Every API call is priced per token. Once you internalize these conversion rates, you can estimate the size and cost of any workflow before you build it.
Token Counts for Common GTM Documents
Before you can reason about context, you need a feel for how large typical GTM documents actually are in token terms.
| Document type | Approximate token count |
|---|---|
| Outbound sales email (150-250 words) | 200-325 tokens |
| Well-personalized email with research hook | 300-500 tokens |
| 3-step cold email sequence (all three emails) | 600-1,000 tokens |
| LinkedIn profile (full) | 600-1,200 tokens |
| CRM contact record (basic fields + notes) | 300-500 tokens |
| CRM contact record (full, with activity history) | 1,000-3,000 tokens |
| 30-minute sales call transcript | 5,600-6,000 tokens |
| 60-minute discovery call transcript | 10,000-12,000 tokens |
| Light account brief (name, industry, size, key products) | 400-600 tokens |
| Full account brief (news, tech stack, contacts, open deals) | 2,000-6,000 tokens |
| Deep research document (analyst-grade) | 8,000-20,000 tokens |
| 10-K filing or annual report | 50,000-150,000 tokens |
A useful calibration: a complete, well-researched account brief for a single prospect sits around 4,000-6,000 tokens. A single AI SDR workflow that researches a prospect and generates a personalized email sequence consumes roughly 3,000-5,000 tokens per prospect, all in.
Context Windows: What the Numbers Mean
Every model comes with an advertised context window: the maximum number of tokens it can process in a single call. These numbers have grown dramatically. Models that topped out at 4,096 tokens in 2022 now commonly offer 128K, 200K, or 1 million tokens.
128K tokens
Approximately 96,000 words. Enough to hold the full text of a novel, or roughly 20 detailed account briefs simultaneously.
200K tokens
Approximately 150,000 words. Roughly 30 full account briefs, or a complete mid-size sales pipeline with full deal history.
1M tokens
Approximately 750,000 words. An entire company CRM for a small team, or a year of sales call transcripts.
10M tokens
Meta's Llama 4 Scout context window. The entire works of Shakespeare fit in roughly 900,000 words, leaving room for 9 million more.
Large context windows are genuinely useful. But the number the model advertises and the context the model actually uses effectively are two different things.
The Gap Between Advertised and Effective Context
This is where most GTM AI builders go wrong.
Lost in the Middle
In 2023, a Stanford research team published "Lost in the Middle: How Language Models Use Long Contexts"↗ in the Transactions of the Association for Computational Linguistics. The finding was counterintuitive and has since become one of the most referenced results in applied AI.
Models exhibit a U-shaped performance curve across their context window. Performance is highest when relevant information appears at the beginning or end of the context. Performance degrades significantly when relevant information is buried in the middle.
Performance can degrade by more than 30% when the same fact moves from the start or end of a context window to the middle. This holds even for models explicitly designed for long-context tasks.
What this means in practice: load an account brief, three past call transcripts, a competitor analysis, and a list of product features into a large context window. If the specific fact that should drive your email hook is buried in transcript number two, the model is substantially more likely to miss it, misremember it, or ignore it entirely.
The most critical information belongs at the top of your context, or at the very end. Always the edges.
Context Rot
In July 2025, Chroma published "Context Rot: How Increasing Input Tokens Impacts LLM Performance,"↗ a study covering 18 state-of-the-art models including GPT-4.1, Claude 4, Gemini 2.5, and Qwen3.
The central finding: every single model tested degraded in performance as input context length increased, even on simple tasks like retrieval and text replication.
Context Rot vs. Context Overflow
Context overflow is hitting the token limit and getting an API error. Context rot is different: it is the silent degradation that sets in well before the limit, as the model's attention spreads thinner with every additional token you add to the prompt. A model with a 200K token context window can show significant accuracy loss at 50K tokens. Overflow throws an error. Rot degrades your outputs without warning.
The mechanism is attention. Every token competes with every other token for the model's focus. As context grows, each token gets proportionally less of it. The model's working memory is finite. Adding tokens beyond what the task needs dilutes the signal.
In practice, most models effectively utilize only 10-20% of their theoretical context window for complex tasks. Models in the Chroma study "fell far short of their Maximum Context Window by as much as >99%" in real task performance.
Context Stuffing: The Anti-Pattern
With million-token context windows now available and prices falling, there is a tempting architectural pattern: dump everything into the context and let the model sort it out. Every CRM record, every call transcript, every piece of company research, every product document. All of it, in one prompt.
This is called context stuffing. Production AI research identifies it as the single largest source of wasted compute, hallucinated outputs, and unreliable agent behavior in GTM pipelines.
Why context stuffing fails:
-
The Lost in the Middle effect. Critical information in the interior of a large context gets missed or misrepresented. The problem scales with context size.
-
Context rot. Performance degrades as tokens accumulate, even before you approach the limit.
-
Attention dilution. The model's attention budget is finite. More noise means less focus on the signal that actually matters.
-
Hallucination amplification. More context means more potentially conflicting information. Conflicts and distractors increase hallucination rates.
-
Cost and latency. A 200K token prompt for a task that actually needed 8K tokens costs 25 times more and takes substantially longer.
In production, these effects compound.
Research comparing context stuffing directly to retrieval-augmented generation (RAG) found that RAG achieved equivalent output quality with less than half the tokens and roughly half the latency.
Context Quality vs. Context Quantity
The most important insight in applied AI for GTM is this: the quality of your context matters more than the size of it.
A tightly selected 5,000-token context with the right account signals, the relevant call transcript, and a clear ICP definition will outperform a bloated one. Even 50,000 tokens performs worse if 40,000 of them are loosely related noise.
This has direct implications for how you build GTM AI workflows:
Be selective about what goes in
For any given task (outreach generation, account briefing, lead scoring), define the minimum viable context set. What are the three to five pieces of information that most directly determine output quality? Put those in. Not everything that might be relevant.
Put critical information first
The Lost in the Middle research is consistent: primacy bias is real. The most important signals belong at the top of the context. If there is one piece of information that should drive the AI's output, it belongs in the first few hundred tokens.
Use retrieval rather than bulk loading
For workflows that draw on large knowledge bases (product documents, competitive intel, CRM history), retrieve the most relevant chunks rather than loading everything. Semantic search or structured filtering lets you feed a 4,000-token context with exactly the right material instead of a 100,000-token context with the right material diluted by 90,000 tokens of less relevant content.
Summarize before injecting
Long documents benefit from a summarization step before injection. A 12,000-token call transcript summarized down to a 1,000-token structured brief before being used as context for outreach generation produces better outreach than the raw transcript injected in full.
Use agent chains for complex tasks
Google Research's "Chain of Agents"↗ work shows that breaking complex, document-heavy tasks across multiple agent calls outperforms feeding one massive context to a single model. Instead of one call with 50 documents in context, five sequential calls each focusing on a subset of the material, building toward a final synthesis, produces sharper outputs with less context rot.
RAG vs. Long Context: What the Research Says
A January 2025 paper directly comparing long-context LLMs against retrieval-augmented generation, "Long Context vs. RAG for LLMs"↗, produced findings worth understanding before you architect GTM AI pipelines.
Long context wins when...
The task genuinely requires synthesizing information across an entire long document: summarizing a lengthy contract, analyzing a full earnings call, reviewing a comprehensive competitive report. When breadth of coverage matters and the document is the unit of analysis, feeding the whole thing in context produces better results.
RAG wins when...
You are retrieving specific facts or signals from a large corpus: finding the right contact in a database, pulling the relevant section of a playbook, looking up previous deal history. When precision matters and the relevant information is a small fraction of the total available, RAG is both more accurate and significantly cheaper.
For most day-to-day GTM AI use cases (outreach generation, lead scoring, call summaries, account briefs), RAG or selective context injection outperforms bulk context loading. The exception is synthesis-heavy tasks (full document analysis, contract review, earnings call synthesis) where the model needs to reason across the entire scope. That's a narrow category.
Token Economics: What Things Actually Cost
The cost of AI has collapsed. As of March 2026, the price per token has declined roughly 280x from GPT-3.5 launch in November 2022. A workload that cost $10,000 per month in model API fees in 2023 likely costs $50-200 today.
Price ranges across models are wide. Routing decisions matter:
Per-call cost: 10,000 input tokens, 1,000 output tokens
| Model | Total cost |
|---|---|
| GPT-4o mini | $0.0021 |
| Gemini 2.5 Flash | $0.0055 |
| o4-mini | $0.0154 |
| Gemini 2.5 Pro | $0.020 |
| GPT-4.1 | $0.028 |
| Claude Sonnet 4.6 | $0.045 |
| Claude Opus 4.6 | $0.075 |
Per-call cost: 100,000 input tokens, 5,000 output tokens
| Model | Total cost |
|---|---|
| GPT-4o mini | $0.018 |
| Gemini 2.5 Flash | $0.043 |
| Gemini 2.5 Pro | $0.300 |
| GPT-4.1 | $0.240 |
| Claude Sonnet 4.6 | $0.375 |
| Claude Opus 4.6 | $0.625 |
| GPT-5.4 | $0.325 |
At 100K input tokens, the spread between the cheapest and most expensive options is more than 30x. Context architecture is cost architecture.
This is another reason to avoid context stuffing. Every token you trim from the context window is a direct cost reduction. Output quality often improves at the same time, because of context rot.
A Framework for GTM Token Budgets
Not every task needs the same context depth. A practical mental model:
What This Means for GTM AI Pipelines
The research points toward a consistent architectural principle: structured, targeted context outperforms bulk context at every price point.
The GTM AI workflows that perform best in production thought carefully about what the model needs for each task. They retrieve exactly that and structure it with the most important signals first.
The practical version of this principle looks like:
-
An AI SDR tool that pulls a prospect's recent job change, one relevant news item, and the ICP definition for that segment beats one that dumps the full LinkedIn history, every news article from the past year, and all prior email history into a single prompt.
-
A call summary workflow that feeds the transcript plus the CRM schema beats one that also injects the company profile, the contact's LinkedIn, and the rep's full account history.
-
A lead scoring system that retrieves the ten most predictive firmographic signals plus one or two behavioral indicators beats one that loads every available data field.
The model is capable. The question is whether you are giving it the right raw material to work with. Fewer, better-selected tokens consistently outperform more tokens of loosely relevant context.
The single most important takeaway
Quality of context is the primary driver of AI output quality in GTM applications. Model choice is secondary. Context architecture is where the work is.
Research sources: Liu et al. (2023), "Lost in the Middle: How Language Models Use Long Contexts,"↗ Stanford / TACL. Chroma Research (July 2025), "Context Rot: How Increasing Input Tokens Impacts LLM Performance."↗ "Long Context vs. RAG for LLMs: An Evaluation and Revisits"↗ (January 2025). Google Research, "Chain of Agents: Large Language Models Collaborating on Long-Context Tasks"↗ (NeurIPS 2024). Token pricing data from pricepertoken.com↗ and official provider pricing pages, March 2026. Token economics via Stanford AI Index 2025↗.