Python 3.9+ License MIT Tests OpenAI Agents SDK Paper
Agentic Context Engineering

Agents that write their own playbook

ACE is an open-source Python framework that turns an LLM's context into an evolving, self-improving playbook — accumulating strategies, domain knowledge, and pitfalls instead of re-prompting. A faithful implementation of the ICLR 2026 paper, with first-class OpenAI Agents SDK support.

Why ACE

Prompt optimizers suffer brevity bias; monolithic memory suffers context collapse. ACE fixes both.

🧠

Evolving playbook

Context is a set of itemized bullets that accumulate and organize knowledge over time — not a fragile single prompt.

Incremental deltas

Localized ADD/UPDATE/REMOVE edits merged by deterministic non-LLM logic. No full rewrites, no collapse.

🔁

Generator · Reflector · Curator

A modular loop that experiments, reflects, and consolidates — mirroring how humans learn.

🏷️

Label-free

Learns from natural execution feedback (tests, API errors, rewards) — no ground-truth labels required.

🔌

OpenAI Agents SDK

Drop-in self-improving memory for any agents.Agent in a few lines.

🔍

Interpretable

Every bullet has an id and helpful/harmful counters — inspect, edit, and prune the agent's knowledge.

Quickstart

Install, then run the headline comparison — no API key required.

pip install -e .            # core: numpy + rich only
ace demo --html report.html # Base vs ACE vs Monolithic (context collapse)
ace run                     # live terminal dashboard

…or in ~10 lines of Python:

from ace import ACE, SimulatedLLM, TeachingEnvironment, build_teaching_task
from ace.baselines import StaticAgent

env  = TeachingEnvironment()
task = build_teaching_task()
train, test = task.split()

base = StaticAgent(SimulatedLLM(env)).run(test)        # no learning
ace  = ACE(SimulatedLLM(env))
ace.adapt_offline(train)                               # build a playbook
result = ace.evaluate(test)                            # held-out eval

print(f"Base {base.accuracy:.0f}%  →  ACE {result.accuracy:.0f}%")
print(ace.playbook.render())                           # human-readable playbook

Results you can reproduce

Straight from the bundled, deterministic, key-free demos (examples/*.py).

+38.9
pts · quickstart (44→83%)
+46.6
pts · context-collapse demo
96.6%
offline warmup + online
−94.9%
adaptation token ingestion

Reported in the paper (DeepSeek-V3.1):

BenchmarkBaseline+ ACE
AppWorld (agent, avg)42.4% (ReAct)59.5% (+17.1)
FiNER (financial NER)70.7%78.3%
Formula (financial reasoning)67.5%85.5%
Adaptation latency (offline)−86.9%

How it works

Generate → Reflect → Curate → deterministic merge → grow-and-refine.

flowchart LR Q([Query]) --> G[Generator] PB[(Context Playbook)] -. injected .-> G G -->|trajectory + bullet usage| R[Reflector] FB([Feedback: labels or execution signal]) --> R R -->|insights, iterative refinement| C[Curator] C -->|delta items| M{{Deterministic Merge - non-LLM}} M --> PB M --> GR[Grow and Refine: dedupe / prune] GR --> PB

📐 Read the full architecture — 14 diagrams

Use it with the OpenAI Agents SDK

ACE becomes a self-improving memory for any agent.

from agents import Agent
from ace import ACE, OpenAILLM
from ace.integrations.openai_agents import ACEAgent

base  = Agent(name="Support", instructions="You are a concise support agent.")
agent = ACEAgent(base, ace=ACE(OpenAILLM(model="gpt-4o-mini")))

out = agent.run_and_learn("Cancel order #C99",
                          signal="Policy: cancellation requires identity verification first.")
print(out.output)
print(agent.ace.playbook.render())   # the agent just wrote itself a rule
self-improving agents context engineering agent memory prompt optimization test-time learning in-context learning OpenAI Agents SDK LLM playbook