What it does
Traditional AML scoring engines apply static weights to risk factors (PEP status, suspicious-activity flags, turnover, adverse media) and compare the total to a fixed threshold. RL-Anti-Money-Laundry reframes weight tuning as a single-step contextual bandit and trains an agent (PPO / A2C / DQN) to adjust those weights case-by-case, learning from ground-truth labels or analyst human-in-the-loop feedback.
Features
🎯 Adaptive scoring
An RL policy nudges weights and the decision threshold per case, clamped to safe bounds.
🧠 Multiple algorithms
PPO, A2C, and DQN via Stable-Baselines3 on a custom Gymnasium environment.
📊 Live dashboard
React + TypeScript UI with real-time accuracy/reward curves, evaluation, and data tooling.
🔍 Hyperparameter tuning
Grid, random, and a TPE-inspired Bayesian optimizer with parallel trials.
🔌 Drop-in integration
An adaptive decision node that replaces a fixed-weight LangGraph pipeline step.
⚖️ Paired evaluation
Fair RL-vs-fixed-weights comparison on an identical, re-seeded case sequence.
The RL formulation
A one-step contextual bandit: observe a case, pick a weight adjustment, score it, receive a reward, terminate.
| Element | Definition |
|---|---|
| State | 24-dim vector: LOB & reason-code one-hots, red-flag counts by severity, PEP/sanctions flags, normalised turnover, adverse media, transaction ratios, account age, prior alerts/SARs. |
| Action | 19 discrete weight/threshold nudges (single step, double-step "BIG" variants, or no-op). |
| Reward | +1.0 correct suspicious/escalated · +0.5 efficient non-suspicious · −0.5 missed escalation · −1.0 wrong; or HITL gate feedback online. |
| Episode | Single step — one action per AML case. |
Quick start
One script creates a venv, installs dependencies, and launches the backend (:8200) and dashboard (:4003).
git clone https://github.com/rrahimi-uci/rl-anti-money-laundry.git
cd rl-anti-money-laundry
bash start.sh
Train & evaluate from the CLI
# Generate synthetic episodes (self-contained)
python -m data.generate_episodes --out data/episodes.jsonl --count 2000
# Train a PPO agent
python -m training.train --episodes data/episodes.jsonl --timesteps 50000
# Evaluate against the fixed-weight baseline (paired)
python -m training.evaluate --model models/aml_ppo --n-eval 200
Tech stack
Disclaimer
⚠️ This is an educational proof of concept that demonstrates a technique. It is not a compliant, audited Anti-Money-Laundering control and must not be used to make real financial-crime decisions.