Policy to Knowledge

Enterprise compliance automation that transforms regulatory documents into structured, queryable knowledge graphs using a 10-agent AI pipeline.

10-Agent Pipeline Knowledge Graphs AI Chat Explorer Multi-Domain Semantic Search

▶️ At a glance

Policy to Knowledge takes a single rule set from raw compliance documents through a 10-agent extraction pipeline, into an explorable knowledge graph, and out to obligation registers, impact analysis, and graph comparison — all from one unified shell.

Explore the sections below, or jump straight to installation.

🎯 Goal & Use Case

Policy to Knowledge is a compliance knowledge-extraction suite. It addresses the challenge of navigating complex, interconnected regulatory rule sets where the relationships between rules are as critical as the rules themselves.

AI Agents in Pipeline

Built-in Compliance Domains

Pipeline Phases (extract · compare)

Graph Set Operations

The Problem

Financial compliance requires navigating complex, interconnected rule sets where rules reference and depend on other rules (transitive dependencies), relationships are first-class domain concepts, natural language queries must resolve to precise requirements, and multi-hop relationships must be understood for rule interpretation.

The Solution

Policy to Knowledge unifies document extraction, graph storage, semantic search, and an interactive AI-powered explorer into a single interface. It automatically ingests compliance documents (PDF, DOCX, Markdown, CSV, Excel), extracts business rules and entities using a multi-agent AI pipeline, builds a structured knowledge graph, and provides intelligent exploration through natural language chat, graph visualization, impact analysis, and obligation tracking.

Supported Domains

🏠

Mortgage Lending

Agency and investor guidelines, servicing policies, and lender-specific overlays

🏦

Anti-Money Laundering

Suspicious-activity, KYC/CDD, and sanctions-screening rules and entity relationships

💼

Commercial Lending

Business lending compliance rules and regulatory frameworks

🏥

Healthcare

Healthcare regulatory compliance and policy enforcement

✨ Key Features

🤖

10-Agent AI Pipeline

Automated extraction from documents through entities, rules, validation, optimization, and visualization.

💬

AI Chat Assistant

GPT-4o powered conversational agent with 13 tools for graph traversal, semantic search, and analysis.

🕸️

Interactive Graph Explorer

D3.js force-directed visualization with zoom, click-to-inspect, color-coded node types, and live search.

🔀

Knowledge Graph Merging

Compare graphs with intersection, union, differences, contradictions, and AI-powered semantic matching.

📊

Graph Analytics

Rule type distribution, risk levels, dependency analysis, confidence scores, and entity coverage charts.

⚡

Impact Analysis

Upload old and new regulatory docs to identify affected rules, severity, and recommended actions.

📋

Obligation Tracking

Map internal controls to compliance obligations, identify coverage gaps, and generate heatmaps.

🔖

Release Management

Create immutable graph snapshots with semantic versioning, lock graphs, and browse release history.

🔍

Semantic + Full-Text Search

OpenSearch k-NN with sentence-transformer embeddings plus JanusGraph mixed index full-text search.

🏗️ System Architecture

Policy to Knowledge is composed of three tightly integrated applications orchestrated via Docker Compose, with a central navigation hub (Suite Shell) routing to all services.

High-Level Architecture

graph TD
    Browser["🌐 Browser"]

    subgraph FE["Frontend Layer"]
        Shell["Suite Shell
React + Vite · :4000"]
        PipeUI["Pipeline UI
React + Vite · :5173"]
        ExplorerUI["KG Explorer
D3.js · :5000"]
    end

    subgraph BE["Backend Layer"]
        PipeAPI["Pipeline API
FastAPI · :8000"]
        AssistantAPI["Assistant
Flask · :5000"]
    end

    subgraph AI["AI Inference"]
        OpenAI["OpenAI GPT"]
    end

    subgraph Data["Data Layer"]
        JanusGraph["JanusGraph · :8182
Graph Database"]
        Cassandra["Apache Cassandra · :9042
Storage Backend"]
        OpenSearch["OpenSearch · :9200
Full-text + Vector k-NN"]
        Redis["Redis · :6379
Query Cache LRU"]
    end

    Browser --> Shell
    Shell -->|"iframe"| PipeUI
    Shell -->|"iframe"| ExplorerUI
    PipeUI --> PipeAPI
    ExplorerUI --> AssistantAPI
    PipeAPI --> OpenAI
    AssistantAPI --> Redis
    AssistantAPI --> JanusGraph
    JanusGraph --> Cassandra
    JanusGraph --> OpenSearch

Layered Service Architecture

graph TB
    subgraph Presentation["Presentation Layer"]
        UI["Web Client — D3.js + EventSource API
Visualization · User Interaction · SSE"]
    end

    subgraph Application["Application Layer"]
        API["REST API Gateway — Flask + CORS
Request Routing · Auth · Rate Limiting"]
        ConnMgr["Connection Manager — Singleton Pools
Resource Lifecycle · Circuit Breaking"]
    end

    subgraph Services["Service Layer"]
        AIOrch["AI Orchestration — GPT-4o + Tool Calling
Query Understanding · Tool Selection"]
        GraphSvc["Graph Query Service — Gremlin
Traversals · Pattern Matching"]
        VectorSvc["Vector Search — Embeddings + k-NN
Semantic Similarity · Ranking"]
    end

    subgraph DataLayer["Data Layer"]
        GraphDB[("JanusGraph
Multi-Graph")]
        VectorDB[("OpenSearch
k-NN Index")]
        Cache[("Redis
LRU Cache")]
    end

    UI --> API
    API --> ConnMgr
    API --> AIOrch
    AIOrch --> GraphSvc
    AIOrch --> VectorSvc
    ConnMgr --> GraphDB
    ConnMgr --> VectorDB
    GraphSvc --> ConnMgr
    VectorSvc --> ConnMgr
    Cache -.-> GraphSvc
    Cache -.-> VectorSvc

Startup Sequence

sequenceDiagram
    participant S as start.sh
    participant C as Cassandra :9042
    participant O as OpenSearch :9200
    participant R as Redis :6379
    participant J as JanusGraph :8182
    participant P as Pipeline API :8000
    participant U as Pipeline UI :5173
    participant K as Assistant :5000
    participant F as Suite Shell :4000

    S->>C: start + health check
    S->>O: start + health check
    S->>R: start + health check
    S->>J: start (depends on C + O)
    J-->>S: ready
    S->>P: start FastAPI
    S->>U: start Vite dev server
    S->>K: load KGs into JanusGraph
    K-->>S: graphs loaded
    S->>K: start Flask server
    S->>F: start Vite dev server
    F-->>S: Suite ready at localhost:4000

🤖 AI Pipeline Architecture

Policy to Knowledge uses a 10-agent pipeline divided into two phases: extraction and merging.

Pipeline Flow

flowchart TB
    Docs["📄 Compliance Documents\nPDF · DOCX · MD · CSV · XLSX"]

    subgraph Extraction["Phase 1 — Extraction Pipeline"]
        direction TB

        A1["🗂️ Agent 1 — Document Organizer\nTOC-based hierarchical chunking\nSplits documents into structured sections"]
        A1out["📁 Organized Chunks\nknowledge-files-organized/"]

        A2["🔍 Agent 2 — Entity Extractor\nMeta-agent prompt optimization\nExtracts domain entities & relationships"]
        A2out["📋 Entity Definitions\nentity_types_and_relationships.json"]

        A3["⚖️ Agent 3 — Rules Extractor\nParallel batch processing\n10-category rule taxonomy"]
        A3out["📜 Business Rules\ncompliance_rules_with_entities.json"]

        A35["✅ Agent 3.5 — Rule Validator\nSource verification · Numeric consistency\nContradiction detection · Confidence scoring"]
        A35out["📊 Validation Report\nvalidation_report.json"]

        A4["🔗 Agent 4 — Rules + Entities Merger\nEnriches rules with entity context\nAssembles complete knowledge graph"]
        A4out["🗃️ Complete Knowledge Graph\ncompliance_knowledge_graph.json"]

        A5["⚡ Agent 5 — KG Optimizer\nRule deduplication · 7 dependency types\nImpact analysis & confidence scoring"]
        A5out["💎 Optimized Knowledge Graph\noptimized_compliance_knowledge_graph.json"]

        A6["🎨 Agent 6 — Visualization Generator\nvis.js network graphs\nSearchable rule tables & metrics"]
        A6out["🌐 Interactive HTML Report\n{name}_knowledge_graph.html"]

        A1 --> A1out --> A2
        A2 --> A2out --> A3
        A3 --> A3out --> A35
        A35 --> A35out -.->|non-blocking| A4
        A2out -->|entities| A4
        A3out -->|rules| A4
        A4 --> A4out --> A5
        A5 --> A5out --> A6
        A6 --> A6out
    end

    subgraph Merging["Phase 2 — Merge Pipeline (Multi-Graph Comparison)"]
        direction TB

        KG_A["📊 Knowledge Graph A"]
        KG_B["📊 Knowledge Graph B"]

        A7["📦 Agent 7 — Rule Type Clusterer\nGroups by behavior type\nformula · threshold · sequence\nmethod · mandate · prohibition\nclassification · timing"]
        A7out["🏷️ Rule Clusters\nrule_clusters.json"]

        A8["🧠 Agent 8 — Semantic Rule Matcher\nLLM-powered pairwise comparison\nBatch parallelism for scale\nIDENTICAL · EQUIVALENT\nCONTRADICTORY · UNRELATED"]
        A8out["🔎 Match Results\nmatch_results.json"]

        A9["🔀 Agent 9 — Set Operations\nComputes 5 set operations"]
        A9i["∩ Intersection"]
        A9u["∪ Union"]
        A9d1["A \\ B Difference"]
        A9d2["B \\ A Difference"]
        A9c["⚠️ Contradictions"]

        A10["📈 Agent 10 — Set Visualization\nVenn diagrams · Dashboard\nComparison HTML reports"]
        A10out["🌐 6 HTML Reports\nindex · intersection · union\ndifferences · contradictions"]

        KG_A --> A7
        KG_B --> A7
        A7 --> A7out --> A8
        A8 --> A8out --> A9
        A9 --> A9i
        A9 --> A9u
        A9 --> A9d1
        A9 --> A9d2
        A9 --> A9c
        A9i --> A10
        A9u --> A10
        A9d1 --> A10
        A9d2 --> A10
        A9c --> A10
        A10 --> A10out
    end

    subgraph Infra["AI Infrastructure"]
        LLM["🤖 LLM Provider\nOpenAI GPT\nvia the official OpenAI SDK"]
    end

    Docs --> A1
    A6out -->|"Per-document KG"| KG_A
    A6out -->|"Per-document KG"| KG_B
    A3 -.->|API calls| LLM
    A2 -.->|API calls| LLM
    A5 -.->|API calls| LLM
    A8 -.->|API calls| LLM

Extraction Agents (1–6)

Agent	Name	Input	Output	Key Techniques
1	Document Organizer	Raw compliance files (PDF, DOCX, MD, CSV, XLSX)	`knowledge-files-organized/` — hierarchical text chunks	TOC-based hierarchical splitting; supports multi-format parsing via PyPDF2, python-docx, pandas, openpyxl; OCR fallback via pytesseract
2	Entity Extractor	Organized document chunks	`entity_types_and_relationships.json` — entities & relationship definitions	Meta-agent prompt optimization; iterative entity refinement loop; extracts domain entities (borrower, property, loan, etc.) with typed relationships
3	Rules Extractor	Organized chunks + entity definitions	`compliance_rules_with_entities.json` — structured business rules	Parallel batch processing (10 rules/batch) for 3–10× speedup; 10-category rule taxonomy (eligibility, calculation, validation, threshold, prohibition, timing, method, mandate, classification, reporting)
3.5	Rule Validator	Extracted business rules	`validation_report.json` — quality scores & recommendations	Non-blocking validation; source text verification; numeric consistency checks; contradiction detection; per-rule confidence scoring (0.0–1.0)
4	Rules + Entities Merger	Entity definitions (Agent 2) + Business rules (Agent 3)	`compliance_knowledge_graph.json` — complete KG	Dual-input assembly; enriches every rule with entity context; links rules to governing entities; batch parallelism for large graphs
5	KG Optimizer	Complete knowledge graph	`optimized_compliance_knowledge_graph.json` — deduplicated KG with dependencies	Conservative rule deduplication with rationale; 7 dependency types (prerequisite, sequential, conditional, complementary, contradictory, override, validation); impact analysis per dependency
6	Visualization Generator	Optimized knowledge graph	`{name}_knowledge_graph.html` — interactive report	vis.js network graph with color-coded dependency edges; searchable rules table; 5 key metrics dashboard (rules, dependencies, confidence, low-confidence alerts, duplicates removed); responsive design

Merge Agents (7–10)

Agent	Name	Input	Output	Key Techniques
7	Rule Type Clusterer	Two knowledge graphs (KG A + KG B)	`rule_clusters.json` — rules grouped by behavior	Groups rules into 8 behavior types: formula, classification, threshold, prohibition, timing, sequence, method, mandate; reduces comparison space for downstream matching
8	Semantic Rule Matcher	Clustered rules by behavior type	`match_results.json` — pairwise match classifications with confidence	LLM-powered pairwise comparison within each cluster; batch parallelism for scale; classifies each pair as IDENTICAL, EQUIVALENT, CONTRADICTORY, or UNRELATED with confidence score
9	Set Operations	Semantic match results	5 JSON files: `intersection.json`, `union.json`, `g1_minus_g2.json`, `g2_minus_g1.json`, `contradictions.json`	Computes intersection (rules in both graphs), union (all unique rules), A∖B and B∖A differences (exclusive rules), and contradictions (conflicting rule pairs) with AI-generated analysis
10	Set Visualization	Set operation JSON files	6 HTML files: `index.html` (dashboard) + one per set operation	Generates Venn diagram encoding; summary dashboard with overlap percentages; dedicated pages for intersection, union, differences, and contradictions; confidence score display per matched pair

📝 Prompt Architecture

The pipeline uses a two-tier prompt system that separates domain-agnostic pipeline logic from domain-specific terminology, examples, and entity vocabularies. This allows the same 10-agent pipeline to operate across entirely different compliance domains without code changes.

Resolution Strategy

flowchart LR
    A["Agent requests prompt\n(e.g. entity_extraction)"] --> B{"domain-prompts/\n{domain}/{prompt}.txt\nexists?"}
    B -- Yes --> C["Load domain-specific\nprompt"]
    B -- No --> D["Load shared fallback\nprompts/{prompt}.txt"]
    C --> E["Substitute parameters\n(entity_context, batch_num, etc.)"]
    D --> E
    E --> F["Send to LLM\n(OpenAI GPT)"]

The PromptManager class resolves prompts with domain-first precedence: it checks domain-prompts/{active_domain}/ first, then falls back to the shared prompts/ directory. This means every domain can override any prompt while inheriting the rest.

Directory Layout

Path	Scope	Description
`prompts/`	Shared (fallback)	Domain-agnostic prompt templates — the default baseline for all agents
`domain-prompts/mortgage/`	Mortgage Lending	Mortgage-specific terminology, entity types (Borrower, Property, Loan), and agency/investor regulatory references
`domain-prompts/aml/`	Anti-Money Laundering	AML/BSA compliance — SAR, CTR, CDD, KYC entity types; FinCEN regulatory references
`domain-prompts/commercial_lending/`	Commercial Lending	Commercial loan origination — borrower financials, collateral valuation, covenant tracking
`domain-prompts/healthcare/`	Healthcare	Healthcare compliance — HIPAA, patient entities, provider relationships, clinical protocols

Prompt Inventory (11 templates × 4 domains)

Each domain contains a full set of 11 prompt templates — one for every agent stage. The domain-specific versions inject specialized terminology, entity vocabularies, rule taxonomies, and worked examples.

Prompt Template	Agent	Lines	What Gets Domain-Specialized
`document_structure_analysis.txt`	1 — Document Organizer	~94	Section heading patterns, document structure conventions
`entity_extraction.txt`	2 — Entity Extractor	~245	Domain entity vocabulary (e.g., Borrower/Loan vs. SAR/CTR vs. Patient/Provider), relationship definitions, attribute schemas
`entity_refinement.txt`	2 — Meta-Agent	~266	5-dimensional quality scoring criteria, domain-specific completeness benchmarks
`entity_resolution.txt`	2 — Resolution	—	Synonyms and canonical name mappings per domain
`business_rules_extraction.txt`	3 — Rules Extractor	~343	Rule type taxonomy (10 categories), batch size tuning, domain examples, confidence scoring factors
`validation_report.txt`	3.5 — Rule Validator	~401	Domain-specific validation criteria, expected numeric ranges, regulatory cross-references
`rule_resolution.txt`	4 — Merger	—	Entity-rule linkage heuristics per domain
`dependency_analysis.txt`	5 — KG Optimizer	~401	7 dependency types with domain-specific examples and strength rating criteria
`rule_deduplication.txt`	5 — Deduplication	~280	Conservative merge criteria, domain-specific variation preservation rules
`rule_matcher.txt`	8 — Semantic Matcher	—	Match classification criteria (IDENTICAL / EQUIVALENT / CONTRADICTORY / UNRELATED) tuned per domain
`rule_matcher_batch.txt`	8 — Batch Matcher	—	Batch parallelism parameters and domain-specific pairwise comparison prompts

Domain Adaptation Example

The same entity_extraction.txt prompt is specialized for each domain by changing the opening context, entity vocabulary, and relationship definitions:

Domain	Opening Context	Example Entities	Example Relationships
Mortgage	"…specializing in domain modeling for compliance and regulatory systems… mortgage lending compliance knowledge graph"	Borrower, Loan, Property, Appraisal, Credit Report, Underwriting	BORROWER_APPLIES_FOR_LOAN, PROPERTY_SECURES_LOAN
AML	"…specializing in domain modeling for Anti-Money Laundering (AML) and Bank Secrecy Act (BSA) compliance"	Customer, Transaction, SAR, CTR, Beneficial Owner, CDD Profile	CUSTOMER_FILES_SAR, TRANSACTION_TRIGGERS_ALERT
Healthcare	"…specializing in domain modeling for healthcare compliance and regulatory systems"	Patient, Provider, Procedure, Diagnosis, Coverage, Authorization	PROVIDER_TREATS_PATIENT, PROCEDURE_REQUIRES_AUTH
Commercial Lending	"…specializing in domain modeling for commercial lending compliance"	Borrower, Facility, Collateral, Covenant, Financial Statement	FACILITY_SECURED_BY_COLLATERAL, BORROWER_COMPLIES_WITH_COVENANT

LLM Provider Optimization

Prompt batch strategy (OpenAI):

Parameter	OpenAI GPT
Rules per batch (Agent 3)	10
Rules per dependency batch (Agent 5)	50
Temperature	0.7
Total prompt engineering	~2,030 lines across 11 shared templates

🗄️ Data Architecture

Graph Schema

erDiagram
    business_rule ||--o{ business_rule : "depends_on"
    business_rule }o--|| entity_category : "belongs_to_category"
    business_rule ||--o{ business_rule : "relates_to"

    business_rule {
        string rule_id PK
        string rule_name
        string rule_type
        string description
        string conditions
        string consequences
        string exceptions
        string reference
        boolean mandatory
        double confidence_score
        boolean requires_review
        string review_reason
        string node_type
        string vertex_uuid UK
        string source_reference
        string effective_date
        string expiration_date
        string superseded_by
        string jurisdiction
        string risk_level
        string enforcement_action
        string applicability_scope
        string data_points_required
        string audit_frequency
        boolean reference_verified
        string reference_verification_note
        string confidence_breakdown
        string deduplication_info
        string related_rules
    }

    entity_category {
        string name
        string entity_type
        string entity_or_relationship
        string description
        string extraction_notes
        string content
        string category
    }

Vertex Labels

Label	Description	Key Properties
`business_rule`	Compliance rule extracted from regulatory documents. Carries full rule semantics, conditions, exceptions, confidence scores, and v2 metadata (jurisdiction, risk, audit).	`rule_id`, `rule_name`, `rule_type`, `confidence_score`, `mandatory`
`entity_category`	Domain entity or relationship category (e.g., Borrower, Property, Loan Product). Groups related business rules.	`name`, `entity_type`, `entity_or_relationship`, `category`

Edge Labels

Label	Source → Target	Edge Properties	Description
`depends_on`	`business_rule` → `business_rule`	`dependency_type`, `rationale`, `impact_if_fails`, `strength`	Models 7 dependency types: prerequisite, sequential, conditional, complementary, contradictory, override, validation
`belongs_to_category`	`business_rule` → `entity_category`	—	Links a business rule to its governing entity category
`relates_to`	`business_rule` → `business_rule`	—	General semantic relationship between related rules

Property Keys — Business Rule Vertex

Property	Type	Category	Description
`rule_id`	String	Core	Unique rule identifier
`rule_name`	String	Core	Human-readable rule name
`rule_type`	String	Core	Taxonomy (10 types): eligibility, constraint, calculation, validation, process, compliance, documentation, prohibition, definition, exception
`description`	String	Core	Full rule description text
`conditions`	String	Core	When-conditions triggering the rule
`consequences`	String	Core	Then-outcomes when rule fires
`exceptions`	String	Core	Unless-exceptions to the rule
`reference`	String	Core	Source document section reference
`mandatory`	Boolean	Core	Whether the rule is mandatory vs. advisory
`confidence_score`	Double	Quality	AI extraction confidence (0.0–1.0)
`requires_review`	Boolean	Quality	Flagged for human review
`review_reason`	String	Quality	Reason the rule was flagged
`confidence_breakdown`	String (JSON)	Quality	Per-dimension confidence scores
`reference_verified`	Boolean	Quality	Source reference verified by validator
`reference_verification_note`	String	Quality	Verification outcome detail
`deduplication_info`	String (JSON)	Quality	Dedup merge rationale from optimizer
`source_reference`	String	v2 Metadata	Full source document citation
`effective_date`	String	v2 Metadata	Rule effective date (ISO 8601)
`expiration_date`	String	v2 Metadata	Rule expiration date
`superseded_by`	String	v2 Metadata	ID of the superseding rule
`jurisdiction`	String	v2 Metadata	Geographic or regulatory jurisdiction
`risk_level`	String	v2 Metadata	Associated risk level
`enforcement_action`	String	v2 Metadata	Enforcement consequence for non-compliance
`applicability_scope`	String	v2 Metadata	Scope of the rule's applicability
`data_points_required`	String	v2 Metadata	Data needed to evaluate the rule
`audit_frequency`	String	v2 Metadata	How often the rule should be audited
`related_rules`	String (JSON)	v2 Metadata	IDs of semantically related rules

Edge Properties

Property	Type	Used On	Description
`dependency_type`	String	`depends_on`	One of: prerequisite, sequential, conditional, complementary, contradictory, override, validation
`rationale`	String	`depends_on`	AI-generated explanation of why the dependency exists
`impact_if_fails`	String	`depends_on`	Downstream impact description if the dependency is broken
`strength`	Integer	`depends_on`	Dependency strength score

Index Strategy

The mixedContentIndex is an OpenSearch-backed mixed index covering 12 property keys across both vertex labels, enabling full-text search, exact-match filtering, and faceted queries.

Index Name	Fields (Mapping)	Purpose	Backend
`mixedContentIndex`	`content` (TEXT), `name` (TEXTSTRING)	Full-text search + exact match	OpenSearch
	`rule_type`, `node_type`, `rule_id`, `category` (STRING)	Exact-match filtering	OpenSearch
	`entity_or_relationship`, `vertex_uuid` (STRING)	Entity lookups & dedup	OpenSearch
	`jurisdiction`, `risk_level`, `effective_date`, `audit_frequency` (STRING)	v2 metadata facets	OpenSearch
k-NN Vector	`embedding` (384-dim float[])	Semantic similarity search	OpenSearch

Multi-Graph Isolation

Each compliance domain gets its own logically isolated graph within JanusGraph, backed by separate Cassandra keyspaces and OpenSearch indices. The manifest is driven by graphs.yaml.

Graph	Domain
Sample Guidelines	Mortgage Lending
Example Policies	Mortgage Lending

⚡ Technology Stack

Layer	Technology
Frontend (Suite Shell)	React 19, TypeScript, Tailwind CSS, Vite
Frontend (Pipeline UI)	React 19, TypeScript, Tailwind CSS, Vite
Frontend (KG Explorer)	Vanilla JavaScript, D3.js v7, marked.js
Backend (Pipeline)	FastAPI, Uvicorn, SQLite, SQLAlchemy
Backend (Assistant)	Flask 3.1, Gremlin WebSocket, OpenSearch HTTP
AI Inference	OpenAI GPT (official OpenAI Python SDK)
Graph Database	JanusGraph 1.0
Storage Backend	Apache Cassandra 4.1
Search & Vectors	OpenSearch 2.17 (full-text + k-NN, 384-dim)
Embeddings	sentence-transformers (all-MiniLM-L6-v2)
Cache	Redis 7 (LRU eviction)
Document Parsing	PyPDF2, python-docx, pandas, openpyxl, pytesseract (OCR)
Containerization	Docker, Docker Compose
Cloud Deployment	Azure Container Apps, Bicep IaC
Testing	Playwright (E2E), pytest (backend)

📖 Step-by-Step Usage

Launch the Suite & Open the Dashboard

After running ./start.sh, open http://localhost:4000 in your browser. The Dashboard shows all loaded knowledge graphs, recent pipeline runs, service health, and quick action buttons.
Upload Compliance Documents

Navigate to Documents in the sidebar. Upload PDFs, DOCX, Markdown, CSV, or Excel files into organized folders. Each folder can be associated with a compliance domain (mortgage, AML, commercial lending, healthcare).
Run the Extraction Pipeline

Go to Pipeline, select your uploaded documents, then click Start Extraction. Monitor progress through the visual agent pipeline tracker showing all 6 extraction stages in real time.
Review Run History

Check Run History to see all past pipeline runs with their status (completed, running, failed), domain, source, duration, and timestamps. Expand any run for detailed agent-by-step progress.
Explore the Knowledge Graph

Open Assistant to interact with the knowledge graph. The split-pane interface shows an interactive D3.js force-directed graph on the left and an AI chat panel on the right. Ask questions like "Show me the full graph", "Count rules", "Find prohibitions", or "Search appraisal rules".
View Graph Analytics

Navigate to Analytics for a comprehensive dashboard showing rule type distribution, risk level breakdown, dependency types, confidence score distribution, entity coverage, and most connected rules.
Compare Knowledge Graphs

Use Compare Knowledge Graphs to select two graphs (e.g., two regulatory rule sets). The system uses LLM-powered semantic matching to detect common rules, unique rules, and contradictions with AI-generated analysis.
Run Impact Analysis

Open Impact Analysis to upload old and new versions of a regulatory document. The system identifies affected rules, assesses severity, and provides recommended actions — useful for tracking regulatory changes.
Track Obligations

Go to Obligations to initialize an obligation register from graph rules. Map internal controls to compliance obligations, identify coverage gaps, and generate heatmaps for gap analysis.
Edit, Review & Publish

In the Knowledge Graph Explorer, click any node to inspect details, add comments, edit content, or mark as reviewed/approved. When ready, click Release to create an immutable version snapshot with semantic versioning.

⬇️ Installation Guide

Prerequisites

🐳

Docker & Docker Compose

Required for running infrastructure services (JanusGraph, Cassandra, OpenSearch, Redis)

🐍

Python 3.11 or 3.12

Required for running agents and backends outside Docker

🟢

Node.js 20+

Required for frontend development and the Suite Shell

🔑

OpenAI API Key

Required — set OPENAI_API_KEY in your environment

Quick Start

Terminal — Quick Start

# 1. Clone the repository
git clone <repository-url>
cd policy-to-knowledge

# 2. Configure environment
cp .env.example .env
# Open .env and set OPENAI_API_KEY at minimum

# 3. Start the full stack (~2-3 minutes on first run)
./start.sh

# 4. Open the suite
open http://localhost:4000

Environment Variables

.env — Configuration

# Required
OPENAI_API_KEY=sk-...

# Optional
OPENAI_CHAT_MODEL=gpt-4o-mini       # Default chat model
OPENAI_REASONING_EFFORT=low          # For reasoning models
MAX_TOOL_ROUNDS=3                    # Max tool-call rounds per chat turn

# Port overrides (all optional)
SUITE_PORT=4000
KG_BACKEND_PORT=8000
KG_FRONTEND_PORT=5173
CA_PORT=5000
JANUSGRAPH_PORT=8182
CASSANDRA_PORT=9042
OPENSEARCH_PORT=9200
REDIS_PORT=6379

What `start.sh` Does

Validates Docker, Python venv, and port availability
Generates JanusGraph configuration from apps/explorer/conf/graphs.yaml
Starts infrastructure services in dependency order (Cassandra → OpenSearch → Redis → JanusGraph)
Waits for each service to pass its health check
Starts the Pipeline API (FastAPI, port 8000) and Pipeline UI (Vite, port 5173)
Loads knowledge graphs into JanusGraph (incremental by default)
Starts Assistant (Flask, port 5000)
Starts the Suite Shell (Vite, port 4000)

Stopping Services

Terminal — Stop Commands

# Stop application processes (Docker stays running for fast restart)
./stop.sh

# Stop everything including Docker infrastructure
./stop.sh --all

# Nuclear reset — wipe all volumes + DBs, then rebuild from scratch
./start.sh --clean

💻 CLI Reference

Suite Management

Command	Description
`./start.sh`	Start the full stack (Explorer DB + Pipeline API/UI + suite shell)
`./stop.sh`	Stop application processes (Docker infra stays running)
`./stop.sh --all`	Stop application processes + Docker containers
`cd apps/explorer && ./start.sh --fresh`	Explorer only: rebuild graphs & re-index; keeps Docker volumes intact
`cd apps/explorer && ./start.sh --clean`	Explorer only: wipe all data stores (volumes + SQLite + Redis) and rebuild

Extraction Pipeline

cli/extract.py — Document-to-KG Extraction

apps/pipeline/

# Process all files with OpenAI
python3 cli/extract.py --provider openai

# Process a single document
python3 cli/extract.py --provider openai --file compliance-files/graphA.pdf

# Run a specific agent step (1-6)
python3 cli/extract.py --step 1   # Document organizer only
python3 cli/extract.py --step 6   # Visualization only

# Process in batches by subdirectory
python3 cli/extract.py --batch

# Process a specific domain subdirectory
python3 cli/extract.py --batch-dir healthcare

# Run merge phase with existing KGs
python3 cli/extract.py --merge
python3 cli/extract.py --merge-only --merge-strategy provenance

cli/compare.py — Knowledge Graph Merging