ACE is an open-source Python framework that turns an LLM's context into an evolving, self-improving playbook — accumulating strategies, domain knowledge, and pitfalls instead of re-prompting. A faithful implementation of the ICLR 2026 paper, with first-class OpenAI Agents SDK support.
Prompt optimizers suffer brevity bias; monolithic memory suffers context collapse. ACE fixes both.
Context is a set of itemized bullets that accumulate and organize knowledge over time — not a fragile single prompt.
Localized ADD/UPDATE/REMOVE edits merged by deterministic non-LLM logic. No full rewrites, no collapse.
A modular loop that experiments, reflects, and consolidates — mirroring how humans learn.
Learns from natural execution feedback (tests, API errors, rewards) — no ground-truth labels required.
Drop-in self-improving memory for any agents.Agent in a few lines.
Every bullet has an id and helpful/harmful counters — inspect, edit, and prune the agent's knowledge.
Install, then run the headline comparison — no API key required.
pip install -e . # core: numpy + rich only
ace demo --html report.html # Base vs ACE vs Monolithic (context collapse)
ace run # live terminal dashboard
…or in ~10 lines of Python:
from ace import ACE, SimulatedLLM, TeachingEnvironment, build_teaching_task
from ace.baselines import StaticAgent
env = TeachingEnvironment()
task = build_teaching_task()
train, test = task.split()
base = StaticAgent(SimulatedLLM(env)).run(test) # no learning
ace = ACE(SimulatedLLM(env))
ace.adapt_offline(train) # build a playbook
result = ace.evaluate(test) # held-out eval
print(f"Base {base.accuracy:.0f}% → ACE {result.accuracy:.0f}%")
print(ace.playbook.render()) # human-readable playbook
Straight from the bundled, deterministic, key-free demos (examples/*.py).
Reported in the paper (DeepSeek-V3.1):
| Benchmark | Baseline | + ACE |
|---|---|---|
| AppWorld (agent, avg) | 42.4% (ReAct) | 59.5% (+17.1) |
| FiNER (financial NER) | 70.7% | 78.3% |
| Formula (financial reasoning) | 67.5% | 85.5% |
| Adaptation latency (offline) | — | −86.9% |
Generate → Reflect → Curate → deterministic merge → grow-and-refine.
ACE becomes a self-improving memory for any agent.
from agents import Agent
from ace import ACE, OpenAILLM
from ace.integrations.openai_agents import ACEAgent
base = Agent(name="Support", instructions="You are a concise support agent.")
agent = ACEAgent(base, ace=ACE(OpenAILLM(model="gpt-4o-mini")))
out = agent.run_and_learn("Cancel order #C99",
signal="Policy: cancellation requires identity verification first.")
print(out.output)
print(agent.ace.playbook.render()) # the agent just wrote itself a rule