Policy to Knowledge
Enterprise compliance automation that transforms regulatory documents into structured, queryable knowledge graphs using a 10-agent AI pipeline.
At a glance
Policy to Knowledge takes a single rule set from raw compliance documents through a 10-agent extraction pipeline, into an explorable knowledge graph, and out to obligation registers, impact analysis, and graph comparison — all from one unified shell.
Goal & Use Case
Policy to Knowledge is a compliance knowledge-extraction suite. It addresses the challenge of navigating complex, interconnected regulatory rule sets where the relationships between rules are as critical as the rules themselves.
The Problem
Financial compliance requires navigating complex, interconnected rule sets where rules reference and depend on other rules (transitive dependencies), relationships are first-class domain concepts, natural language queries must resolve to precise requirements, and multi-hop relationships must be understood for rule interpretation.
The Solution
Policy to Knowledge unifies document extraction, graph storage, semantic search, and an interactive AI-powered explorer into a single interface. It automatically ingests compliance documents (PDF, DOCX, Markdown, CSV, Excel), extracts business rules and entities using a multi-agent AI pipeline, builds a structured knowledge graph, and provides intelligent exploration through natural language chat, graph visualization, impact analysis, and obligation tracking.
Supported Domains
Mortgage Lending
Agency and investor guidelines, servicing policies, and lender-specific overlays
Anti-Money Laundering
Suspicious-activity, KYC/CDD, and sanctions-screening rules and entity relationships
Commercial Lending
Business lending compliance rules and regulatory frameworks
Healthcare
Healthcare regulatory compliance and policy enforcement
Key Features
10-Agent AI Pipeline
Automated extraction from documents through entities, rules, validation, optimization, and visualization.
AI Chat Assistant
GPT-4o powered conversational agent with 13 tools for graph traversal, semantic search, and analysis.
Interactive Graph Explorer
D3.js force-directed visualization with zoom, click-to-inspect, color-coded node types, and live search.
Knowledge Graph Merging
Compare graphs with intersection, union, differences, contradictions, and AI-powered semantic matching.
Graph Analytics
Rule type distribution, risk levels, dependency analysis, confidence scores, and entity coverage charts.
Impact Analysis
Upload old and new regulatory docs to identify affected rules, severity, and recommended actions.
Obligation Tracking
Map internal controls to compliance obligations, identify coverage gaps, and generate heatmaps.
Release Management
Create immutable graph snapshots with semantic versioning, lock graphs, and browse release history.
Semantic + Full-Text Search
OpenSearch k-NN with sentence-transformer embeddings plus JanusGraph mixed index full-text search.
System Architecture
Policy to Knowledge is composed of three tightly integrated applications orchestrated via Docker Compose, with a central navigation hub (Suite Shell) routing to all services.
High-Level Architecture
graph TD
Browser["🌐 Browser"]
subgraph FE["Frontend Layer"]
Shell["Suite Shell
React + Vite · :4000"]
PipeUI["Pipeline UI
React + Vite · :5173"]
ExplorerUI["KG Explorer
D3.js · :5000"]
end
subgraph BE["Backend Layer"]
PipeAPI["Pipeline API
FastAPI · :8000"]
AssistantAPI["Assistant
Flask · :5000"]
end
subgraph AI["AI Inference"]
OpenAI["OpenAI GPT"]
end
subgraph Data["Data Layer"]
JanusGraph["JanusGraph · :8182
Graph Database"]
Cassandra["Apache Cassandra · :9042
Storage Backend"]
OpenSearch["OpenSearch · :9200
Full-text + Vector k-NN"]
Redis["Redis · :6379
Query Cache LRU"]
end
Browser --> Shell
Shell -->|"iframe"| PipeUI
Shell -->|"iframe"| ExplorerUI
PipeUI --> PipeAPI
ExplorerUI --> AssistantAPI
PipeAPI --> OpenAI
AssistantAPI --> Redis
AssistantAPI --> JanusGraph
JanusGraph --> Cassandra
JanusGraph --> OpenSearch
Layered Service Architecture
graph TB
subgraph Presentation["Presentation Layer"]
UI["Web Client — D3.js + EventSource API
Visualization · User Interaction · SSE"]
end
subgraph Application["Application Layer"]
API["REST API Gateway — Flask + CORS
Request Routing · Auth · Rate Limiting"]
ConnMgr["Connection Manager — Singleton Pools
Resource Lifecycle · Circuit Breaking"]
end
subgraph Services["Service Layer"]
AIOrch["AI Orchestration — GPT-4o + Tool Calling
Query Understanding · Tool Selection"]
GraphSvc["Graph Query Service — Gremlin
Traversals · Pattern Matching"]
VectorSvc["Vector Search — Embeddings + k-NN
Semantic Similarity · Ranking"]
end
subgraph DataLayer["Data Layer"]
GraphDB[("JanusGraph
Multi-Graph")]
VectorDB[("OpenSearch
k-NN Index")]
Cache[("Redis
LRU Cache")]
end
UI --> API
API --> ConnMgr
API --> AIOrch
AIOrch --> GraphSvc
AIOrch --> VectorSvc
ConnMgr --> GraphDB
ConnMgr --> VectorDB
GraphSvc --> ConnMgr
VectorSvc --> ConnMgr
Cache -.-> GraphSvc
Cache -.-> VectorSvc
Startup Sequence
sequenceDiagram
participant S as start.sh
participant C as Cassandra :9042
participant O as OpenSearch :9200
participant R as Redis :6379
participant J as JanusGraph :8182
participant P as Pipeline API :8000
participant U as Pipeline UI :5173
participant K as Assistant :5000
participant F as Suite Shell :4000
S->>C: start + health check
S->>O: start + health check
S->>R: start + health check
S->>J: start (depends on C + O)
J-->>S: ready
S->>P: start FastAPI
S->>U: start Vite dev server
S->>K: load KGs into JanusGraph
K-->>S: graphs loaded
S->>K: start Flask server
S->>F: start Vite dev server
F-->>S: Suite ready at localhost:4000
AI Pipeline Architecture
Policy to Knowledge uses a 10-agent pipeline divided into two phases: extraction and merging.
Pipeline Flow
flowchart TB
Docs["📄 Compliance Documents\nPDF · DOCX · MD · CSV · XLSX"]
subgraph Extraction["Phase 1 — Extraction Pipeline"]
direction TB
A1["🗂️ Agent 1 — Document Organizer\nTOC-based hierarchical chunking\nSplits documents into structured sections"]
A1out["📁 Organized Chunks\nknowledge-files-organized/"]
A2["🔍 Agent 2 — Entity Extractor\nMeta-agent prompt optimization\nExtracts domain entities & relationships"]
A2out["📋 Entity Definitions\nentity_types_and_relationships.json"]
A3["⚖️ Agent 3 — Rules Extractor\nParallel batch processing\n10-category rule taxonomy"]
A3out["📜 Business Rules\ncompliance_rules_with_entities.json"]
A35["✅ Agent 3.5 — Rule Validator\nSource verification · Numeric consistency\nContradiction detection · Confidence scoring"]
A35out["📊 Validation Report\nvalidation_report.json"]
A4["🔗 Agent 4 — Rules + Entities Merger\nEnriches rules with entity context\nAssembles complete knowledge graph"]
A4out["🗃️ Complete Knowledge Graph\ncompliance_knowledge_graph.json"]
A5["⚡ Agent 5 — KG Optimizer\nRule deduplication · 7 dependency types\nImpact analysis & confidence scoring"]
A5out["💎 Optimized Knowledge Graph\noptimized_compliance_knowledge_graph.json"]
A6["🎨 Agent 6 — Visualization Generator\nvis.js network graphs\nSearchable rule tables & metrics"]
A6out["🌐 Interactive HTML Report\n{name}_knowledge_graph.html"]
A1 --> A1out --> A2
A2 --> A2out --> A3
A3 --> A3out --> A35
A35 --> A35out -.->|non-blocking| A4
A2out -->|entities| A4
A3out -->|rules| A4
A4 --> A4out --> A5
A5 --> A5out --> A6
A6 --> A6out
end
subgraph Merging["Phase 2 — Merge Pipeline (Multi-Graph Comparison)"]
direction TB
KG_A["📊 Knowledge Graph A"]
KG_B["📊 Knowledge Graph B"]
A7["📦 Agent 7 — Rule Type Clusterer\nGroups by behavior type\nformula · threshold · sequence\nmethod · mandate · prohibition\nclassification · timing"]
A7out["🏷️ Rule Clusters\nrule_clusters.json"]
A8["🧠 Agent 8 — Semantic Rule Matcher\nLLM-powered pairwise comparison\nBatch parallelism for scale\nIDENTICAL · EQUIVALENT\nCONTRADICTORY · UNRELATED"]
A8out["🔎 Match Results\nmatch_results.json"]
A9["🔀 Agent 9 — Set Operations\nComputes 5 set operations"]
A9i["∩ Intersection"]
A9u["∪ Union"]
A9d1["A \\ B Difference"]
A9d2["B \\ A Difference"]
A9c["⚠️ Contradictions"]
A10["📈 Agent 10 — Set Visualization\nVenn diagrams · Dashboard\nComparison HTML reports"]
A10out["🌐 6 HTML Reports\nindex · intersection · union\ndifferences · contradictions"]
KG_A --> A7
KG_B --> A7
A7 --> A7out --> A8
A8 --> A8out --> A9
A9 --> A9i
A9 --> A9u
A9 --> A9d1
A9 --> A9d2
A9 --> A9c
A9i --> A10
A9u --> A10
A9d1 --> A10
A9d2 --> A10
A9c --> A10
A10 --> A10out
end
subgraph Infra["AI Infrastructure"]
LLM["🤖 LLM Provider\nOpenAI GPT\nvia the official OpenAI SDK"]
end
Docs --> A1
A6out -->|"Per-document KG"| KG_A
A6out -->|"Per-document KG"| KG_B
A3 -.->|API calls| LLM
A2 -.->|API calls| LLM
A5 -.->|API calls| LLM
A8 -.->|API calls| LLM
Extraction Agents (1–6)
| Agent | Name | Input | Output | Key Techniques |
|---|---|---|---|---|
| 1 | Document Organizer | Raw compliance files (PDF, DOCX, MD, CSV, XLSX) | knowledge-files-organized/ — hierarchical text chunks |
TOC-based hierarchical splitting; supports multi-format parsing via PyPDF2, python-docx, pandas, openpyxl; OCR fallback via pytesseract |
| 2 | Entity Extractor | Organized document chunks | entity_types_and_relationships.json — entities & relationship definitions |
Meta-agent prompt optimization; iterative entity refinement loop; extracts domain entities (borrower, property, loan, etc.) with typed relationships |
| 3 | Rules Extractor | Organized chunks + entity definitions | compliance_rules_with_entities.json — structured business rules |
Parallel batch processing (10 rules/batch) for 3–10× speedup; 10-category rule taxonomy (eligibility, calculation, validation, threshold, prohibition, timing, method, mandate, classification, reporting) |
| 3.5 | Rule Validator | Extracted business rules | validation_report.json — quality scores & recommendations |
Non-blocking validation; source text verification; numeric consistency checks; contradiction detection; per-rule confidence scoring (0.0–1.0) |
| 4 | Rules + Entities Merger | Entity definitions (Agent 2) + Business rules (Agent 3) | compliance_knowledge_graph.json — complete KG |
Dual-input assembly; enriches every rule with entity context; links rules to governing entities; batch parallelism for large graphs |
| 5 | KG Optimizer | Complete knowledge graph | optimized_compliance_knowledge_graph.json — deduplicated KG with dependencies |
Conservative rule deduplication with rationale; 7 dependency types (prerequisite, sequential, conditional, complementary, contradictory, override, validation); impact analysis per dependency |
| 6 | Visualization Generator | Optimized knowledge graph | {name}_knowledge_graph.html — interactive report |
vis.js network graph with color-coded dependency edges; searchable rules table; 5 key metrics dashboard (rules, dependencies, confidence, low-confidence alerts, duplicates removed); responsive design |
Merge Agents (7–10)
| Agent | Name | Input | Output | Key Techniques |
|---|---|---|---|---|
| 7 | Rule Type Clusterer | Two knowledge graphs (KG A + KG B) | rule_clusters.json — rules grouped by behavior |
Groups rules into 8 behavior types: formula, classification, threshold, prohibition, timing, sequence, method, mandate; reduces comparison space for downstream matching |
| 8 | Semantic Rule Matcher | Clustered rules by behavior type | match_results.json — pairwise match classifications with confidence |
LLM-powered pairwise comparison within each cluster; batch parallelism for scale; classifies each pair as IDENTICAL, EQUIVALENT, CONTRADICTORY, or UNRELATED with confidence score |
| 9 | Set Operations | Semantic match results | 5 JSON files: intersection.json, union.json, g1_minus_g2.json, g2_minus_g1.json, contradictions.json |
Computes intersection (rules in both graphs), union (all unique rules), A∖B and B∖A differences (exclusive rules), and contradictions (conflicting rule pairs) with AI-generated analysis |
| 10 | Set Visualization | Set operation JSON files | 6 HTML files: index.html (dashboard) + one per set operation |
Generates Venn diagram encoding; summary dashboard with overlap percentages; dedicated pages for intersection, union, differences, and contradictions; confidence score display per matched pair |
Prompt Architecture
The pipeline uses a two-tier prompt system that separates domain-agnostic pipeline logic from domain-specific terminology, examples, and entity vocabularies. This allows the same 10-agent pipeline to operate across entirely different compliance domains without code changes.
Resolution Strategy
flowchart LR
A["Agent requests prompt\n(e.g. entity_extraction)"] --> B{"domain-prompts/\n{domain}/{prompt}.txt\nexists?"}
B -- Yes --> C["Load domain-specific\nprompt"]
B -- No --> D["Load shared fallback\nprompts/{prompt}.txt"]
C --> E["Substitute parameters\n(entity_context, batch_num, etc.)"]
D --> E
E --> F["Send to LLM\n(OpenAI GPT)"]
The PromptManager class resolves prompts with domain-first precedence: it checks domain-prompts/{active_domain}/ first, then falls back to the shared prompts/ directory. This means every domain can override any prompt while inheriting the rest.
Directory Layout
| Path | Scope | Description |
|---|---|---|
prompts/ | Shared (fallback) | Domain-agnostic prompt templates — the default baseline for all agents |
domain-prompts/mortgage/ | Mortgage Lending | Mortgage-specific terminology, entity types (Borrower, Property, Loan), and agency/investor regulatory references |
domain-prompts/aml/ | Anti-Money Laundering | AML/BSA compliance — SAR, CTR, CDD, KYC entity types; FinCEN regulatory references |
domain-prompts/commercial_lending/ | Commercial Lending | Commercial loan origination — borrower financials, collateral valuation, covenant tracking |
domain-prompts/healthcare/ | Healthcare | Healthcare compliance — HIPAA, patient entities, provider relationships, clinical protocols |
Prompt Inventory (11 templates × 4 domains)
Each domain contains a full set of 11 prompt templates — one for every agent stage. The domain-specific versions inject specialized terminology, entity vocabularies, rule taxonomies, and worked examples.
| Prompt Template | Agent | Lines | What Gets Domain-Specialized |
|---|---|---|---|
document_structure_analysis.txt | 1 — Document Organizer | ~94 | Section heading patterns, document structure conventions |
entity_extraction.txt | 2 — Entity Extractor | ~245 | Domain entity vocabulary (e.g., Borrower/Loan vs. SAR/CTR vs. Patient/Provider), relationship definitions, attribute schemas |
entity_refinement.txt | 2 — Meta-Agent | ~266 | 5-dimensional quality scoring criteria, domain-specific completeness benchmarks |
entity_resolution.txt | 2 — Resolution | — | Synonyms and canonical name mappings per domain |
business_rules_extraction.txt | 3 — Rules Extractor | ~343 | Rule type taxonomy (10 categories), batch size tuning, domain examples, confidence scoring factors |
validation_report.txt | 3.5 — Rule Validator | ~401 | Domain-specific validation criteria, expected numeric ranges, regulatory cross-references |
rule_resolution.txt | 4 — Merger | — | Entity-rule linkage heuristics per domain |
dependency_analysis.txt | 5 — KG Optimizer | ~401 | 7 dependency types with domain-specific examples and strength rating criteria |
rule_deduplication.txt | 5 — Deduplication | ~280 | Conservative merge criteria, domain-specific variation preservation rules |
rule_matcher.txt | 8 — Semantic Matcher | — | Match classification criteria (IDENTICAL / EQUIVALENT / CONTRADICTORY / UNRELATED) tuned per domain |
rule_matcher_batch.txt | 8 — Batch Matcher | — | Batch parallelism parameters and domain-specific pairwise comparison prompts |
Domain Adaptation Example
The same entity_extraction.txt prompt is specialized for each domain by changing the opening context, entity vocabulary, and relationship definitions:
| Domain | Opening Context | Example Entities | Example Relationships |
|---|---|---|---|
| Mortgage | "…specializing in domain modeling for compliance and regulatory systems… mortgage lending compliance knowledge graph" | Borrower, Loan, Property, Appraisal, Credit Report, Underwriting | BORROWER_APPLIES_FOR_LOAN, PROPERTY_SECURES_LOAN |
| AML | "…specializing in domain modeling for Anti-Money Laundering (AML) and Bank Secrecy Act (BSA) compliance" | Customer, Transaction, SAR, CTR, Beneficial Owner, CDD Profile | CUSTOMER_FILES_SAR, TRANSACTION_TRIGGERS_ALERT |
| Healthcare | "…specializing in domain modeling for healthcare compliance and regulatory systems" | Patient, Provider, Procedure, Diagnosis, Coverage, Authorization | PROVIDER_TREATS_PATIENT, PROCEDURE_REQUIRES_AUTH |
| Commercial Lending | "…specializing in domain modeling for commercial lending compliance" | Borrower, Facility, Collateral, Covenant, Financial Statement | FACILITY_SECURED_BY_COLLATERAL, BORROWER_COMPLIES_WITH_COVENANT |
LLM Provider Optimization
Prompt batch strategy (OpenAI):
| Parameter | OpenAI GPT |
|---|---|
| Rules per batch (Agent 3) | 10 |
| Rules per dependency batch (Agent 5) | 50 |
| Temperature | 0.7 |
| Total prompt engineering | ~2,030 lines across 11 shared templates |
Data Architecture
Graph Schema
erDiagram
business_rule ||--o{ business_rule : "depends_on"
business_rule }o--|| entity_category : "belongs_to_category"
business_rule ||--o{ business_rule : "relates_to"
business_rule {
string rule_id PK
string rule_name
string rule_type
string description
string conditions
string consequences
string exceptions
string reference
boolean mandatory
double confidence_score
boolean requires_review
string review_reason
string node_type
string vertex_uuid UK
string source_reference
string effective_date
string expiration_date
string superseded_by
string jurisdiction
string risk_level
string enforcement_action
string applicability_scope
string data_points_required
string audit_frequency
boolean reference_verified
string reference_verification_note
string confidence_breakdown
string deduplication_info
string related_rules
}
entity_category {
string name
string entity_type
string entity_or_relationship
string description
string extraction_notes
string content
string category
}
Vertex Labels
| Label | Description | Key Properties |
|---|---|---|
business_rule | Compliance rule extracted from regulatory documents. Carries full rule semantics, conditions, exceptions, confidence scores, and v2 metadata (jurisdiction, risk, audit). | rule_id, rule_name, rule_type, confidence_score, mandatory |
entity_category | Domain entity or relationship category (e.g., Borrower, Property, Loan Product). Groups related business rules. | name, entity_type, entity_or_relationship, category |
Edge Labels
| Label | Source → Target | Edge Properties | Description |
|---|---|---|---|
depends_on | business_rule → business_rule | dependency_type, rationale, impact_if_fails, strength | Models 7 dependency types: prerequisite, sequential, conditional, complementary, contradictory, override, validation |
belongs_to_category | business_rule → entity_category | — | Links a business rule to its governing entity category |
relates_to | business_rule → business_rule | — | General semantic relationship between related rules |
Property Keys — Business Rule Vertex
| Property | Type | Category | Description |
|---|---|---|---|
rule_id | String | Core | Unique rule identifier |
rule_name | String | Core | Human-readable rule name |
rule_type | String | Core | Taxonomy (10 types): eligibility, constraint, calculation, validation, process, compliance, documentation, prohibition, definition, exception |
description | String | Core | Full rule description text |
conditions | String | Core | When-conditions triggering the rule |
consequences | String | Core | Then-outcomes when rule fires |
exceptions | String | Core | Unless-exceptions to the rule |
reference | String | Core | Source document section reference |
mandatory | Boolean | Core | Whether the rule is mandatory vs. advisory |
confidence_score | Double | Quality | AI extraction confidence (0.0–1.0) |
requires_review | Boolean | Quality | Flagged for human review |
review_reason | String | Quality | Reason the rule was flagged |
confidence_breakdown | String (JSON) | Quality | Per-dimension confidence scores |
reference_verified | Boolean | Quality | Source reference verified by validator |
reference_verification_note | String | Quality | Verification outcome detail |
deduplication_info | String (JSON) | Quality | Dedup merge rationale from optimizer |
source_reference | String | v2 Metadata | Full source document citation |
effective_date | String | v2 Metadata | Rule effective date (ISO 8601) |
expiration_date | String | v2 Metadata | Rule expiration date |
superseded_by | String | v2 Metadata | ID of the superseding rule |
jurisdiction | String | v2 Metadata | Geographic or regulatory jurisdiction |
risk_level | String | v2 Metadata | Associated risk level |
enforcement_action | String | v2 Metadata | Enforcement consequence for non-compliance |
applicability_scope | String | v2 Metadata | Scope of the rule's applicability |
data_points_required | String | v2 Metadata | Data needed to evaluate the rule |
audit_frequency | String | v2 Metadata | How often the rule should be audited |
related_rules | String (JSON) | v2 Metadata | IDs of semantically related rules |
Edge Properties
| Property | Type | Used On | Description |
|---|---|---|---|
dependency_type | String | depends_on | One of: prerequisite, sequential, conditional, complementary, contradictory, override, validation |
rationale | String | depends_on | AI-generated explanation of why the dependency exists |
impact_if_fails | String | depends_on | Downstream impact description if the dependency is broken |
strength | Integer | depends_on | Dependency strength score |
Index Strategy
The mixedContentIndex is an OpenSearch-backed mixed index covering 12 property keys across both vertex labels, enabling full-text search, exact-match filtering, and faceted queries.
| Index Name | Fields (Mapping) | Purpose | Backend |
|---|---|---|---|
mixedContentIndex | content (TEXT), name (TEXTSTRING) | Full-text search + exact match | OpenSearch |
rule_type, node_type, rule_id, category (STRING) | Exact-match filtering | OpenSearch | |
entity_or_relationship, vertex_uuid (STRING) | Entity lookups & dedup | OpenSearch | |
jurisdiction, risk_level, effective_date, audit_frequency (STRING) | v2 metadata facets | OpenSearch | |
| k-NN Vector | embedding (384-dim float[]) | Semantic similarity search | OpenSearch |
Multi-Graph Isolation
Each compliance domain gets its own logically isolated graph within JanusGraph, backed by separate Cassandra keyspaces and OpenSearch indices. The manifest is driven by graphs.yaml.
| Graph | Domain |
|---|---|
| Sample Guidelines | Mortgage Lending |
| Example Policies | Mortgage Lending |
Technology Stack
| Layer | Technology |
|---|---|
| Frontend (Suite Shell) | React 19, TypeScript, Tailwind CSS, Vite |
| Frontend (Pipeline UI) | React 19, TypeScript, Tailwind CSS, Vite |
| Frontend (KG Explorer) | Vanilla JavaScript, D3.js v7, marked.js |
| Backend (Pipeline) | FastAPI, Uvicorn, SQLite, SQLAlchemy |
| Backend (Assistant) | Flask 3.1, Gremlin WebSocket, OpenSearch HTTP |
| AI Inference | OpenAI GPT (official OpenAI Python SDK) |
| Graph Database | JanusGraph 1.0 |
| Storage Backend | Apache Cassandra 4.1 |
| Search & Vectors | OpenSearch 2.17 (full-text + k-NN, 384-dim) |
| Embeddings | sentence-transformers (all-MiniLM-L6-v2) |
| Cache | Redis 7 (LRU eviction) |
| Document Parsing | PyPDF2, python-docx, pandas, openpyxl, pytesseract (OCR) |
| Containerization | Docker, Docker Compose |
| Cloud Deployment | Azure Container Apps, Bicep IaC |
| Testing | Playwright (E2E), pytest (backend) |
Step-by-Step Usage
-
Launch the Suite & Open the Dashboard
After running
./start.sh, open http://localhost:4000 in your browser. The Dashboard shows all loaded knowledge graphs, recent pipeline runs, service health, and quick action buttons. -
Upload Compliance Documents
Navigate to Documents in the sidebar. Upload PDFs, DOCX, Markdown, CSV, or Excel files into organized folders. Each folder can be associated with a compliance domain (mortgage, AML, commercial lending, healthcare).
-
Run the Extraction Pipeline
Go to Pipeline, select your uploaded documents, then click Start Extraction. Monitor progress through the visual agent pipeline tracker showing all 6 extraction stages in real time.
-
Review Run History
Check Run History to see all past pipeline runs with their status (completed, running, failed), domain, source, duration, and timestamps. Expand any run for detailed agent-by-step progress.
-
Explore the Knowledge Graph
Open Assistant to interact with the knowledge graph. The split-pane interface shows an interactive D3.js force-directed graph on the left and an AI chat panel on the right. Ask questions like "Show me the full graph", "Count rules", "Find prohibitions", or "Search appraisal rules".
-
View Graph Analytics
Navigate to Analytics for a comprehensive dashboard showing rule type distribution, risk level breakdown, dependency types, confidence score distribution, entity coverage, and most connected rules.
-
Compare Knowledge Graphs
Use Compare Knowledge Graphs to select two graphs (e.g., two regulatory rule sets). The system uses LLM-powered semantic matching to detect common rules, unique rules, and contradictions with AI-generated analysis.
-
Run Impact Analysis
Open Impact Analysis to upload old and new versions of a regulatory document. The system identifies affected rules, assesses severity, and provides recommended actions — useful for tracking regulatory changes.
-
Track Obligations
Go to Obligations to initialize an obligation register from graph rules. Map internal controls to compliance obligations, identify coverage gaps, and generate heatmaps for gap analysis.
-
Edit, Review & Publish
In the Knowledge Graph Explorer, click any node to inspect details, add comments, edit content, or mark as reviewed/approved. When ready, click Release to create an immutable version snapshot with semantic versioning.
Installation Guide
Prerequisites
Docker & Docker Compose
Required for running infrastructure services (JanusGraph, Cassandra, OpenSearch, Redis)
Python 3.11 or 3.12
Required for running agents and backends outside Docker
Node.js 20+
Required for frontend development and the Suite Shell
OpenAI API Key
Required — set OPENAI_API_KEY in your environment
Quick Start
# 1. Clone the repository
git clone <repository-url>
cd policy-to-knowledge
# 2. Configure environment
cp .env.example .env
# Open .env and set OPENAI_API_KEY at minimum
# 3. Start the full stack (~2-3 minutes on first run)
./start.sh
# 4. Open the suite
open http://localhost:4000
Environment Variables
# Required
OPENAI_API_KEY=sk-...
# Optional
OPENAI_CHAT_MODEL=gpt-4o-mini # Default chat model
OPENAI_REASONING_EFFORT=low # For reasoning models
MAX_TOOL_ROUNDS=3 # Max tool-call rounds per chat turn
# Port overrides (all optional)
SUITE_PORT=4000
KG_BACKEND_PORT=8000
KG_FRONTEND_PORT=5173
CA_PORT=5000
JANUSGRAPH_PORT=8182
CASSANDRA_PORT=9042
OPENSEARCH_PORT=9200
REDIS_PORT=6379
What start.sh Does
- Validates Docker, Python venv, and port availability
- Generates JanusGraph configuration from
apps/explorer/conf/graphs.yaml - Starts infrastructure services in dependency order (Cassandra → OpenSearch → Redis → JanusGraph)
- Waits for each service to pass its health check
- Starts the Pipeline API (FastAPI, port 8000) and Pipeline UI (Vite, port 5173)
- Loads knowledge graphs into JanusGraph (incremental by default)
- Starts Assistant (Flask, port 5000)
- Starts the Suite Shell (Vite, port 4000)
Stopping Services
# Stop application processes (Docker stays running for fast restart)
./stop.sh
# Stop everything including Docker infrastructure
./stop.sh --all
# Nuclear reset — wipe all volumes + DBs, then rebuild from scratch
./start.sh --clean
CLI Reference
Suite Management
| Command | Description |
|---|---|
./start.sh | Start the full stack (Explorer DB + Pipeline API/UI + suite shell) |
./stop.sh | Stop application processes (Docker infra stays running) |
./stop.sh --all | Stop application processes + Docker containers |
cd apps/explorer && ./start.sh --fresh | Explorer only: rebuild graphs & re-index; keeps Docker volumes intact |
cd apps/explorer && ./start.sh --clean | Explorer only: wipe all data stores (volumes + SQLite + Redis) and rebuild |
Extraction Pipeline
cli/extract.py — Document-to-KG Extraction
# Process all files with OpenAI
python3 cli/extract.py --provider openai
# Process a single document
python3 cli/extract.py --provider openai --file compliance-files/graphA.pdf
# Run a specific agent step (1-6)
python3 cli/extract.py --step 1 # Document organizer only
python3 cli/extract.py --step 6 # Visualization only
# Process in batches by subdirectory
python3 cli/extract.py --batch
# Process a specific domain subdirectory
python3 cli/extract.py --batch-dir healthcare
# Run merge phase with existing KGs
python3 cli/extract.py --merge
python3 cli/extract.py --merge-only --merge-strategy provenance
cli/compare.py — Knowledge Graph Merging
# List available knowledge graphs
python3 cli/compare.py --list
# Merge two graphs (computes all set operations)
python3 cli/compare.py --g1 graphA --g2 graphB --workers 15
# Merge with custom batch size
python3 cli/compare.py --g1 FM --g2 graphA --workers 20 --batch-size 15
Assistant — Knowledge Graph Management
# Full rebuild (clears and reloads all graphs)
python3 -m src.main setup
# Incremental load (only load what is missing)
python3 -m src.main setup-if-empty
# Nuclear reset (wipe all data and rebuild)
python3 -m src.main force-clean
# Start server only (infra must be running)
python3 -m src.main serve
Individual Service Start Commands
# Explorer (Flask + JanusGraph)
cd apps/explorer && SERVER_PORT=5050 .venv/bin/python3 -m src.server
# Suite Shell (React)
cd apps/shell && npm install && npm run dev
# Pipeline API + UI
cd apps/pipeline && ./start.sh
# Run backend tests
cd apps/pipeline && .venv/bin/python -m pytest tests/
# Run Playwright UI tests (services must be running)
cd apps/explorer/tests/e2e && npx playwright test
API Endpoints
Services & Ports
| Service | URL | Description |
|---|---|---|
| Suite Shell | http://localhost:4000 | Main navigation hub (React SPA) |
| Pipeline UI | http://localhost:5173 | Document upload & pipeline control |
| Pipeline API | http://localhost:8000 | FastAPI backend for extraction |
| Explorer | http://localhost:5000/app (or :5050) | Graph explorer & AI chat. Falls back to :5050 when :5000 is taken (Docker/AirPlay on macOS). |
| JanusGraph | localhost:8182 | Gremlin WebSocket endpoint |
| OpenSearch | http://localhost:9200 | Full-text & vector search |
| Cassandra | localhost:9042 | Graph storage backend |
| Redis | localhost:6379 | Query result cache |
KG Backend — FastAPI
| Method | Path | Description |
|---|---|---|
| GET | /api/health | Health check |
| GET | /api/documents | List document folders |
| POST | /api/documents/upload | Upload compliance documents |
| POST | /api/pipeline/start | Start extraction pipeline |
| GET | /api/pipeline/{run_id}/status | Get pipeline run status |
| GET | /api/graphs | List knowledge graphs |
| GET | /api/graphs/{name}/visualization | Get graph visualization data |
| POST | /api/compare | Compare two knowledge graphs |
| POST | /api/impact/analyze | Run impact analysis |
| POST | /api/obligations/{graph}/seed | Initialize obligations from graph |
| GET | /api/obligations/{graph}/heatmap | Get obligation heatmap |
| GET | /api/runs | List all pipeline runs |
| GET | /api/settings | Get pipeline settings |
Assistant — Flask
| Method | Path | Description |
|---|---|---|
| GET | /api/graph | Get full graph data |
| GET | /api/vertex/{id} | Get vertex details |
| GET | /api/search/text | Full-text search |
| GET | /api/search/semantic | Semantic vector search |
| POST | /api/gremlin/execute | Execute custom Gremlin traversal |
| GET | /api/annotations/{node_id} | Get node annotations |
| PUT | /api/annotations/{node_id} | Update annotation |
| POST | /api/graph/release | Create immutable release |
| GET | /api/graph/releases | List all releases |
| GET | /api/graph/available | List available graphs |
| POST | /api/rewrite | AI rewrite content |
| POST | /api/suggest-rule-id | Auto-generate Rule ID |
Admin Endpoints
| Method | Path | Description |
|---|---|---|
| POST | /api/admin/reset | Drop & re-load all graphs, rebuild indices |
| GET | /api/admin/consistency | Vertex/edge counts per graph (sanity check) |
| POST | /api/admin/rebuild-embeddings | Re-embed all vertices into OpenSearch k-NN |
| POST | /api/admin/rebuild-tasks | Regenerate task queue from graph state |
Azure Deployment
Policy to Knowledge can be deployed to Azure Container Apps. Build and push all 5 container images to an Azure Container Registry with a unique UTC timestamp tag per build, then roll each Container App to the freshly tagged image.
# Login to Azure
az login
# Build + push each image to ACR, then update the Container Apps
az acr build --registry p2kdemo --image p2k/<svc>:<tag> .
az containerapp update --name <app> --image p2kdemo.azurecr.io/app/<svc>:<tag>
Container App Layout
| App | Ingress | Port | Public Mount |
|---|---|---|---|
p2k-preview | External | 4000 | /app/ |
kg-frontend | External | 5173 | /app/{documents,pipeline,...} |
kg-backend | Internal | 8000 | /app/api/kg/ |
assistant | External | 5000 | /app/api/ca/ |
janusgraph | Internal | 8182 | (not exposed publicly) |