What is Agentic Context Engineering (ACE)?

ACE is a framework that treats an LLM's context as an evolving playbook of itemized bullets that accumulate, refine, and organize strategies over time through a Generator, Reflector, and Curator, using incremental delta updates instead of full rewrites.

Does ACE work with the OpenAI Agents SDK?

Yes. ACEAgent wraps any agents.Agent as a drop-in self-improving memory: the playbook is injected into the agent's instructions and updated from each run's trajectory and feedback.

Does ACE need ground-truth labels?

No. ACE can adapt from natural execution feedback (test pass/fail, API errors, a reward function) via a feedback_fn hook, in addition to labeled supervision.

ACE — Agentic Context Engineering | Self-Improving LLM Agent Framework (Python)

Why ACE

Prompt optimizers suffer brevity bias; monolithic memory suffers context collapse. ACE fixes both.

🧠

Evolving playbook

Context is a set of itemized bullets that accumulate and organize knowledge over time — not a fragile single prompt.

⚡

Incremental deltas

Localized ADD/UPDATE/REMOVE edits merged by deterministic non-LLM logic. No full rewrites, no collapse.

🔁

Generator · Reflector · Curator

A modular loop that experiments, reflects, and consolidates — mirroring how humans learn.

🏷️

Label-free

Learns from natural execution feedback (tests, API errors, rewards) — no ground-truth labels required.

🔌

OpenAI Agents SDK

Drop-in self-improving memory for any agents.Agent in a few lines.

🔍

Interpretable

Every bullet has an id and helpful/harmful counters — inspect, edit, and prune the agent's knowledge.

Quickstart

Install, then run the headline comparison — no API key required.

pip install -e .            # core: numpy + rich only
ace demo --html report.html # Base vs ACE vs Monolithic (context collapse)
ace run                     # live terminal dashboard

…or in ~10 lines of Python:

from ace import ACE, SimulatedLLM, TeachingEnvironment, build_teaching_task
from ace.baselines import StaticAgent

env  = TeachingEnvironment()
task = build_teaching_task()
train, test = task.split()

base = StaticAgent(SimulatedLLM(env)).run(test)        # no learning
ace  = ACE(SimulatedLLM(env))
ace.adapt_offline(train)                               # build a playbook
result = ace.evaluate(test)                            # held-out eval

print(f"Base {base.accuracy:.0f}%  →  ACE {result.accuracy:.0f}%")
print(ace.playbook.render())                           # human-readable playbook

Results you can reproduce

Straight from the bundled, deterministic, key-free demos (examples/*.py).

+38.9

pts · quickstart (44→83%)

+46.6

pts · context-collapse demo

96.6%

offline warmup + online

−94.9%

adaptation token ingestion

Reported in the paper (DeepSeek-V3.1):

Benchmark	Baseline	+ ACE
AppWorld (agent, avg)	42.4% (ReAct)	59.5% (+17.1)
FiNER (financial NER)	70.7%	78.3%
Formula (financial reasoning)	67.5%	85.5%
Adaptation latency (offline)	—	−86.9%

How it works

Generate → Reflect → Curate → deterministic merge → grow-and-refine.

flowchart LR Q([Query]) --> G[Generator] PB[(Context Playbook)] -. injected .-> G G -->|trajectory + bullet usage| R[Reflector] FB([Feedback: labels or execution signal]) --> R R -->|insights, iterative refinement| C[Curator] C -->|delta items| M{{Deterministic Merge - non-LLM}} M --> PB M --> GR[Grow and Refine: dedupe / prune] GR --> PB

📐 Read the full architecture — 14 diagrams

Use it with the OpenAI Agents SDK

ACE becomes a self-improving memory for any agent.

from agents import Agent
from ace import ACE, OpenAILLM
from ace.integrations.openai_agents import ACEAgent

base  = Agent(name="Support", instructions="You are a concise support agent.")
agent = ACEAgent(base, ace=ACE(OpenAILLM(model="gpt-4o-mini")))

out = agent.run_and_learn("Cancel order #C99",
                          signal="Policy: cancellation requires identity verification first.")
print(out.output)
print(agent.ace.playbook.render())   # the agent just wrote itself a rule

self-improving agents context engineering agent memory prompt optimization test-time learning in-context learning OpenAI Agents SDK LLM playbook