your agent needs context, not just capability (part 1)
introduction
in last 1 year we've seen rapid development in ai capabilities and tooling. claude opus 4.6, codex, clawdbot, and china models. with this, we expect the tasks at hand to become easier for ai to solve given sufficient context and tools. when we talk about tools, they're relatively straightforward. 50-200 lines of code with if-else logic, api calls, and structured outputs. on the other hand context is the harder problem. have we solved it? i'd say no. but are we trying? yes !
what is context? context for an ai agent is the instructions and reference text you provide for a task. it can be a book, a blog post, a code file, or anything that helps the llm resolve the task at hand. context helps the llm set up the its precedents for what is to come.
despite significant progress in agent architectures, frontier model evolution, and tooling, context problem remains under-engineered. the model has limited context window and even with models with as large as 1M context window, the size is not optimal to its performance.
as you hit 200k+ tokens, you would find degradation in performance and the inability to follow the precise instructions. the llm has to trade-off between following your instructions, keeping the things right (reduce hallucinations), or trying to finish the task. there only a limited amount of text you can use to get the best output from the llm, aka your ai agent. (llm, ai agent, and agent in this blog would be referred to the same idea)
in this post, i'll walk through a few key aspects of context engineering that can help you improve your agent workflows and explain why they matter. you might already be using some of these, but it's worth revisiting them with more precision.
table of contents
- avoid json; use csv and markdown
- structure your documents early
- pre-index large documents
- correct for bias drift
- choose your semantic triggers carefully
- concluding thoughts
avoid json; use csv and markdown
when you work with ai agents, you want your agent to pick up files, scrape the internet, and read content as it moves through the task scope. however, when you provide an agent with a json file, it tends to write code to parse the file rather than running simple cat or head commands. this is acceptable in isolation, but after a few sequential steps, the behavior cascades. the agent begins writing code to read markdown and csv files as well, even when direct reading would suffice.
# what you want (simple, direct)
$ cat product_catalog.csv
product,category,units_sold
widget_a,electronics,1200
# what often happens with json in context
$ python -c "import json; print(json.load(open('catalog.json')))"
# then this cascades to other file types
$ python -c "import pandas; print(pandas.read_csv('catalog.csv'))"
the reason is tool definitions in the system prompt are already formatted in json. this creates a representational bias. the model associates json with structured processing logic and extends that assumption to all file interactions. csv and markdown, by contrast, are closer to natural language in structure and makes them simple for the agent to work.
key takeaway: avoid json in the context window for any important working files. restrict it to where it belongs: tool definitions and structured api responses in the system prompt.
structure your documents early
when you're parsing pdfs to text, a significant amount of structural information is lost. tables flatten into ambiguous strings, headers merge with body text, and numerical relationships become unrecoverable. this directly leads to incorrect data and misattributed details in agent outputs.
# pdf-to-text output (structure lost)
specifications weight 2500kg dimensions 3x2x1m power 480v compliance iso9001 warranty 24months
# pdf-to-markdown output (structure preserved)
## product specifications
- weight: 2,500 kg
- dimensions: 3m × 2m × 1m
- power: 480v
- compliance: iso 9001
- warranty: 24 months
the reason this matters at the token level is that without explicit delimiters and formatting, the model has to infer boundaries between data points using positional context alone. that inference is unreliable, especially in longer documents where attention weights dilute over distance. preserving structure through markdown gives the model explicit anchors to attend to.
key takeaway: prefer using a pdf-to-markdown converter with strong ocr models and parsers that handle complex charts and tables well.
pre-index large documents
while grep has been useful for searching large documents, it introduces its own problems. many documents don't follow straightforward structures. granular details in the documents can be spread across distant sections, and grep fails in such scenarios.
grep is a basic pattern search, it misses semantically related content that doesn't share exact keywords. it also tends to pollute the context window when dealing with large files. the tool output is often messy, incomplete, or empty, which breaks the agent's flow and forces recovery steps that consume additional tokens.
# grep misses context spread across sections
$ grep "thermal performance" product_manual.txt
# returns scattered line matches but misses the full analysis
# better: pre-indexed topic sections
sections/
├── overview.md
├── technical_specs.md
├── installation_guide.md
└── compliance_certifications.md
key takeaway: the better approach is to split documents into topic-level sections through a separate preprocessing pipeline before the agent begins its task. this way, you can point the agent to the specific section relevant to its current step, reducing noise and keeping the context window focused. this is particularly important for long-horizon tasks where context window bloat compounds with each step.
bias drift
frontier models are predominantly trained on western data. this creates a persistent reasoning bias towards western frameworks, standards, and conventions. when you're working on a use case that requires local knowledge and local reasoning, the model will default to its stronger priors unless explicitly corrected.
this becomes especially prominent during long-horizon tasks. as the context window fills and the model's attention spreads across more tokens, the influence of your corrective instructions weakens relative to the model's pretrained bias.
<!-- without bias correction -->
"the product meets fda standards and follows us manufacturing guidelines..."
<!-- with bias correction embedded early in system prompt -->
<context>
you are evaluating products for the indian market. consider:
- bis standards, not ansi/astm
- local supply chain and logistics constraints
- regional consumer behavior and preferences
- regulatory framework: bureau of indian standards, fssai where applicable
</context>
<!-- output now aligns with the correct context -->
"the product meets bis certification requirements for the indian market..."
key takeaway: start early. embed the corrective framing in your system prompts and skill definitions from the first step itself. you need to periodically reinforce the perspective throughout the workflow.
choose your semantic triggers carefully
when we write prompts, we often use llms to meta-prompt what we want to express. this works well enough in early iterations. but as you move towards accuracy and reliability over longer tasks, your word choices start to matter significantly.
there is a measurable difference in output when you ask your agent to do "market research" versus "market analysis" on the same topic. each phrase activates different clusters of associations. "research" biases the model toward breadth, surveys, and qualitative trends, while "analysis" steers it toward metrics, competitive positioning, and quantitative breakdowns.
# subtle but significant differences in output behavior
"conduct research on the ev battery supply chain":
→ broad information gathering, surveys, trend summaries
"conduct analysis on the ev battery supply chain":
→ metrics, competitive positioning, quantitative breakdown
"perform a quality audit on the manufacturing process":
→ verification-oriented, defect identification, compliance checks
"review the manufacturing process":
→ descriptive summary of existing procedures
key takeaway: this is not something you need to optimize from day one. but as your agent workflows mature and you begin evaluating output quality at a granular level, revisiting the semantics of your prompts becomes essential.
concluding thoughts
context engineering is not a solved problem. we have better models, better tools, and longer context windows. but the quality of what goes into that window remains the primary bottleneck for agent reliability.
the patterns above are not the exact theoretical suggestions. what works in once scenario would not work in another. different problems require different context engineering approach. often sometimes change in entire architecture to small iteration on data pipeline prompts! this side of engineering is always never sticking to old ideas and trying new approach to solve the problem however absurd the solution may seem.
notes and appreciation
this is part 1 of a two-part series. this post focused on context quality, how to prepare, format, and frame the information your agent consumes. part 2 will focus on context architecture, how to structure the flow of information across multi-step agent workflows to maintain coherence over long-horizon tasks.
thanks to @dejavucoder, @ambuj_2032, @soma_as_moon7, @_diginova, @weeyev, and @paneerchilli65 for the reading the initial draft and providing suggestions.
further reading
effective context engineering for ai agents
context rot: how increasing input tokens impacts llm performance
context engineering for ai agents: lessons from building manus