Policy to Knowledge

Enterprise compliance automation that transforms regulatory documents into structured, queryable knowledge graphs using a 10-agent AI pipeline.

10-Agent Pipeline Knowledge Graphs AI Chat Explorer Multi-Domain Semantic Search

▶️ At a glance

Policy to Knowledge takes a single rule set from raw compliance documents through a 10-agent extraction pipeline, into an explorable knowledge graph, and out to obligation registers, impact analysis, and graph comparison — all from one unified shell.

Explore the sections below, or jump straight to installation.

🎯 Goal & Use Case

Policy to Knowledge is a compliance knowledge-extraction suite. It addresses the challenge of navigating complex, interconnected regulatory rule sets where the relationships between rules are as critical as the rules themselves.

10
AI Agents in Pipeline
4
Built-in Compliance Domains
2
Pipeline Phases (extract · compare)
5
Graph Set Operations

The Problem

Financial compliance requires navigating complex, interconnected rule sets where rules reference and depend on other rules (transitive dependencies), relationships are first-class domain concepts, natural language queries must resolve to precise requirements, and multi-hop relationships must be understood for rule interpretation.

The Solution

Policy to Knowledge unifies document extraction, graph storage, semantic search, and an interactive AI-powered explorer into a single interface. It automatically ingests compliance documents (PDF, DOCX, Markdown, CSV, Excel), extracts business rules and entities using a multi-agent AI pipeline, builds a structured knowledge graph, and provides intelligent exploration through natural language chat, graph visualization, impact analysis, and obligation tracking.

Supported Domains

🏠

Mortgage Lending

Agency and investor guidelines, servicing policies, and lender-specific overlays

🏦

Anti-Money Laundering

Suspicious-activity, KYC/CDD, and sanctions-screening rules and entity relationships

💼

Commercial Lending

Business lending compliance rules and regulatory frameworks

🏥

Healthcare

Healthcare regulatory compliance and policy enforcement

Key Features

🤖

10-Agent AI Pipeline

Automated extraction from documents through entities, rules, validation, optimization, and visualization.

💬

AI Chat Assistant

GPT-4o powered conversational agent with 13 tools for graph traversal, semantic search, and analysis.

🕸️

Interactive Graph Explorer

D3.js force-directed visualization with zoom, click-to-inspect, color-coded node types, and live search.

🔀

Knowledge Graph Merging

Compare graphs with intersection, union, differences, contradictions, and AI-powered semantic matching.

📊

Graph Analytics

Rule type distribution, risk levels, dependency analysis, confidence scores, and entity coverage charts.

Impact Analysis

Upload old and new regulatory docs to identify affected rules, severity, and recommended actions.

📋

Obligation Tracking

Map internal controls to compliance obligations, identify coverage gaps, and generate heatmaps.

🔖

Release Management

Create immutable graph snapshots with semantic versioning, lock graphs, and browse release history.

🔍

Semantic + Full-Text Search

OpenSearch k-NN with sentence-transformer embeddings plus JanusGraph mixed index full-text search.

🏗️ System Architecture

Policy to Knowledge is composed of three tightly integrated applications orchestrated via Docker Compose, with a central navigation hub (Suite Shell) routing to all services.

High-Level Architecture

graph TD
    Browser["🌐 Browser"]

    subgraph FE["Frontend Layer"]
        Shell["Suite Shell
React + Vite · :4000"] PipeUI["Pipeline UI
React + Vite · :5173"] ExplorerUI["KG Explorer
D3.js · :5000"] end subgraph BE["Backend Layer"] PipeAPI["Pipeline API
FastAPI · :8000"] AssistantAPI["Assistant
Flask · :5000"] end subgraph AI["AI Inference"] OpenAI["OpenAI GPT"] end subgraph Data["Data Layer"] JanusGraph["JanusGraph · :8182
Graph Database"] Cassandra["Apache Cassandra · :9042
Storage Backend"] OpenSearch["OpenSearch · :9200
Full-text + Vector k-NN"] Redis["Redis · :6379
Query Cache LRU"] end Browser --> Shell Shell -->|"iframe"| PipeUI Shell -->|"iframe"| ExplorerUI PipeUI --> PipeAPI ExplorerUI --> AssistantAPI PipeAPI --> OpenAI AssistantAPI --> Redis AssistantAPI --> JanusGraph JanusGraph --> Cassandra JanusGraph --> OpenSearch

Layered Service Architecture

graph TB
    subgraph Presentation["Presentation Layer"]
        UI["Web Client — D3.js + EventSource API
Visualization · User Interaction · SSE"] end subgraph Application["Application Layer"] API["REST API Gateway — Flask + CORS
Request Routing · Auth · Rate Limiting"] ConnMgr["Connection Manager — Singleton Pools
Resource Lifecycle · Circuit Breaking"] end subgraph Services["Service Layer"] AIOrch["AI Orchestration — GPT-4o + Tool Calling
Query Understanding · Tool Selection"] GraphSvc["Graph Query Service — Gremlin
Traversals · Pattern Matching"] VectorSvc["Vector Search — Embeddings + k-NN
Semantic Similarity · Ranking"] end subgraph DataLayer["Data Layer"] GraphDB[("JanusGraph
Multi-Graph")] VectorDB[("OpenSearch
k-NN Index")] Cache[("Redis
LRU Cache")] end UI --> API API --> ConnMgr API --> AIOrch AIOrch --> GraphSvc AIOrch --> VectorSvc ConnMgr --> GraphDB ConnMgr --> VectorDB GraphSvc --> ConnMgr VectorSvc --> ConnMgr Cache -.-> GraphSvc Cache -.-> VectorSvc

Startup Sequence

sequenceDiagram
    participant S as start.sh
    participant C as Cassandra :9042
    participant O as OpenSearch :9200
    participant R as Redis :6379
    participant J as JanusGraph :8182
    participant P as Pipeline API :8000
    participant U as Pipeline UI :5173
    participant K as Assistant :5000
    participant F as Suite Shell :4000

    S->>C: start + health check
    S->>O: start + health check
    S->>R: start + health check
    S->>J: start (depends on C + O)
    J-->>S: ready
    S->>P: start FastAPI
    S->>U: start Vite dev server
    S->>K: load KGs into JanusGraph
    K-->>S: graphs loaded
    S->>K: start Flask server
    S->>F: start Vite dev server
    F-->>S: Suite ready at localhost:4000
    

🤖 AI Pipeline Architecture

Policy to Knowledge uses a 10-agent pipeline divided into two phases: extraction and merging.

Pipeline Flow

flowchart TB
    Docs["📄 Compliance Documents\nPDF · DOCX · MD · CSV · XLSX"]

    subgraph Extraction["Phase 1 — Extraction Pipeline"]
        direction TB

        A1["🗂️ Agent 1 — Document Organizer\nTOC-based hierarchical chunking\nSplits documents into structured sections"]
        A1out["📁 Organized Chunks\nknowledge-files-organized/"]

        A2["🔍 Agent 2 — Entity Extractor\nMeta-agent prompt optimization\nExtracts domain entities & relationships"]
        A2out["📋 Entity Definitions\nentity_types_and_relationships.json"]

        A3["⚖️ Agent 3 — Rules Extractor\nParallel batch processing\n10-category rule taxonomy"]
        A3out["📜 Business Rules\ncompliance_rules_with_entities.json"]

        A35["✅ Agent 3.5 — Rule Validator\nSource verification · Numeric consistency\nContradiction detection · Confidence scoring"]
        A35out["📊 Validation Report\nvalidation_report.json"]

        A4["🔗 Agent 4 — Rules + Entities Merger\nEnriches rules with entity context\nAssembles complete knowledge graph"]
        A4out["🗃️ Complete Knowledge Graph\ncompliance_knowledge_graph.json"]

        A5["⚡ Agent 5 — KG Optimizer\nRule deduplication · 7 dependency types\nImpact analysis & confidence scoring"]
        A5out["💎 Optimized Knowledge Graph\noptimized_compliance_knowledge_graph.json"]

        A6["🎨 Agent 6 — Visualization Generator\nvis.js network graphs\nSearchable rule tables & metrics"]
        A6out["🌐 Interactive HTML Report\n{name}_knowledge_graph.html"]

        A1 --> A1out --> A2
        A2 --> A2out --> A3
        A3 --> A3out --> A35
        A35 --> A35out -.->|non-blocking| A4
        A2out -->|entities| A4
        A3out -->|rules| A4
        A4 --> A4out --> A5
        A5 --> A5out --> A6
        A6 --> A6out
    end

    subgraph Merging["Phase 2 — Merge Pipeline (Multi-Graph Comparison)"]
        direction TB

        KG_A["📊 Knowledge Graph A"]
        KG_B["📊 Knowledge Graph B"]

        A7["📦 Agent 7 — Rule Type Clusterer\nGroups by behavior type\nformula · threshold · sequence\nmethod · mandate · prohibition\nclassification · timing"]
        A7out["🏷️ Rule Clusters\nrule_clusters.json"]

        A8["🧠 Agent 8 — Semantic Rule Matcher\nLLM-powered pairwise comparison\nBatch parallelism for scale\nIDENTICAL · EQUIVALENT\nCONTRADICTORY · UNRELATED"]
        A8out["🔎 Match Results\nmatch_results.json"]

        A9["🔀 Agent 9 — Set Operations\nComputes 5 set operations"]
        A9i["∩ Intersection"]
        A9u["∪ Union"]
        A9d1["A \\ B Difference"]
        A9d2["B \\ A Difference"]
        A9c["⚠️ Contradictions"]

        A10["📈 Agent 10 — Set Visualization\nVenn diagrams · Dashboard\nComparison HTML reports"]
        A10out["🌐 6 HTML Reports\nindex · intersection · union\ndifferences · contradictions"]

        KG_A --> A7
        KG_B --> A7
        A7 --> A7out --> A8
        A8 --> A8out --> A9
        A9 --> A9i
        A9 --> A9u
        A9 --> A9d1
        A9 --> A9d2
        A9 --> A9c
        A9i --> A10
        A9u --> A10
        A9d1 --> A10
        A9d2 --> A10
        A9c --> A10
        A10 --> A10out
    end

    subgraph Infra["AI Infrastructure"]
        LLM["🤖 LLM Provider\nOpenAI GPT\nvia the official OpenAI SDK"]
    end

    Docs --> A1
    A6out -->|"Per-document KG"| KG_A
    A6out -->|"Per-document KG"| KG_B
    A3 -.->|API calls| LLM
    A2 -.->|API calls| LLM
    A5 -.->|API calls| LLM
    A8 -.->|API calls| LLM
    

Extraction Agents (1–6)

AgentNameInputOutputKey Techniques
1 Document Organizer Raw compliance files (PDF, DOCX, MD, CSV, XLSX) knowledge-files-organized/ — hierarchical text chunks TOC-based hierarchical splitting; supports multi-format parsing via PyPDF2, python-docx, pandas, openpyxl; OCR fallback via pytesseract
2 Entity Extractor Organized document chunks entity_types_and_relationships.json — entities & relationship definitions Meta-agent prompt optimization; iterative entity refinement loop; extracts domain entities (borrower, property, loan, etc.) with typed relationships
3 Rules Extractor Organized chunks + entity definitions compliance_rules_with_entities.json — structured business rules Parallel batch processing (10 rules/batch) for 3–10× speedup; 10-category rule taxonomy (eligibility, calculation, validation, threshold, prohibition, timing, method, mandate, classification, reporting)
3.5 Rule Validator Extracted business rules validation_report.json — quality scores & recommendations Non-blocking validation; source text verification; numeric consistency checks; contradiction detection; per-rule confidence scoring (0.0–1.0)
4 Rules + Entities Merger Entity definitions (Agent 2) + Business rules (Agent 3) compliance_knowledge_graph.json — complete KG Dual-input assembly; enriches every rule with entity context; links rules to governing entities; batch parallelism for large graphs
5 KG Optimizer Complete knowledge graph optimized_compliance_knowledge_graph.json — deduplicated KG with dependencies Conservative rule deduplication with rationale; 7 dependency types (prerequisite, sequential, conditional, complementary, contradictory, override, validation); impact analysis per dependency
6 Visualization Generator Optimized knowledge graph {name}_knowledge_graph.html — interactive report vis.js network graph with color-coded dependency edges; searchable rules table; 5 key metrics dashboard (rules, dependencies, confidence, low-confidence alerts, duplicates removed); responsive design

Merge Agents (7–10)

AgentNameInputOutputKey Techniques
7 Rule Type Clusterer Two knowledge graphs (KG A + KG B) rule_clusters.json — rules grouped by behavior Groups rules into 8 behavior types: formula, classification, threshold, prohibition, timing, sequence, method, mandate; reduces comparison space for downstream matching
8 Semantic Rule Matcher Clustered rules by behavior type match_results.json — pairwise match classifications with confidence LLM-powered pairwise comparison within each cluster; batch parallelism for scale; classifies each pair as IDENTICAL, EQUIVALENT, CONTRADICTORY, or UNRELATED with confidence score
9 Set Operations Semantic match results 5 JSON files: intersection.json, union.json, g1_minus_g2.json, g2_minus_g1.json, contradictions.json Computes intersection (rules in both graphs), union (all unique rules), A∖B and B∖A differences (exclusive rules), and contradictions (conflicting rule pairs) with AI-generated analysis
10 Set Visualization Set operation JSON files 6 HTML files: index.html (dashboard) + one per set operation Generates Venn diagram encoding; summary dashboard with overlap percentages; dedicated pages for intersection, union, differences, and contradictions; confidence score display per matched pair

📝 Prompt Architecture

The pipeline uses a two-tier prompt system that separates domain-agnostic pipeline logic from domain-specific terminology, examples, and entity vocabularies. This allows the same 10-agent pipeline to operate across entirely different compliance domains without code changes.

Resolution Strategy

flowchart LR
    A["Agent requests prompt\n(e.g. entity_extraction)"] --> B{"domain-prompts/\n{domain}/{prompt}.txt\nexists?"}
    B -- Yes --> C["Load domain-specific\nprompt"]
    B -- No --> D["Load shared fallback\nprompts/{prompt}.txt"]
    C --> E["Substitute parameters\n(entity_context, batch_num, etc.)"]
    D --> E
    E --> F["Send to LLM\n(OpenAI GPT)"]
    

The PromptManager class resolves prompts with domain-first precedence: it checks domain-prompts/{active_domain}/ first, then falls back to the shared prompts/ directory. This means every domain can override any prompt while inheriting the rest.

Directory Layout

PathScopeDescription
prompts/Shared (fallback)Domain-agnostic prompt templates — the default baseline for all agents
domain-prompts/mortgage/Mortgage LendingMortgage-specific terminology, entity types (Borrower, Property, Loan), and agency/investor regulatory references
domain-prompts/aml/Anti-Money LaunderingAML/BSA compliance — SAR, CTR, CDD, KYC entity types; FinCEN regulatory references
domain-prompts/commercial_lending/Commercial LendingCommercial loan origination — borrower financials, collateral valuation, covenant tracking
domain-prompts/healthcare/HealthcareHealthcare compliance — HIPAA, patient entities, provider relationships, clinical protocols

Prompt Inventory (11 templates × 4 domains)

Each domain contains a full set of 11 prompt templates — one for every agent stage. The domain-specific versions inject specialized terminology, entity vocabularies, rule taxonomies, and worked examples.

Prompt TemplateAgentLinesWhat Gets Domain-Specialized
document_structure_analysis.txt1 — Document Organizer~94Section heading patterns, document structure conventions
entity_extraction.txt2 — Entity Extractor~245Domain entity vocabulary (e.g., Borrower/Loan vs. SAR/CTR vs. Patient/Provider), relationship definitions, attribute schemas
entity_refinement.txt2 — Meta-Agent~2665-dimensional quality scoring criteria, domain-specific completeness benchmarks
entity_resolution.txt2 — ResolutionSynonyms and canonical name mappings per domain
business_rules_extraction.txt3 — Rules Extractor~343Rule type taxonomy (10 categories), batch size tuning, domain examples, confidence scoring factors
validation_report.txt3.5 — Rule Validator~401Domain-specific validation criteria, expected numeric ranges, regulatory cross-references
rule_resolution.txt4 — MergerEntity-rule linkage heuristics per domain
dependency_analysis.txt5 — KG Optimizer~4017 dependency types with domain-specific examples and strength rating criteria
rule_deduplication.txt5 — Deduplication~280Conservative merge criteria, domain-specific variation preservation rules
rule_matcher.txt8 — Semantic MatcherMatch classification criteria (IDENTICAL / EQUIVALENT / CONTRADICTORY / UNRELATED) tuned per domain
rule_matcher_batch.txt8 — Batch MatcherBatch parallelism parameters and domain-specific pairwise comparison prompts

Domain Adaptation Example

The same entity_extraction.txt prompt is specialized for each domain by changing the opening context, entity vocabulary, and relationship definitions:

DomainOpening ContextExample EntitiesExample Relationships
Mortgage"…specializing in domain modeling for compliance and regulatory systems… mortgage lending compliance knowledge graph"Borrower, Loan, Property, Appraisal, Credit Report, UnderwritingBORROWER_APPLIES_FOR_LOAN, PROPERTY_SECURES_LOAN
AML"…specializing in domain modeling for Anti-Money Laundering (AML) and Bank Secrecy Act (BSA) compliance"Customer, Transaction, SAR, CTR, Beneficial Owner, CDD ProfileCUSTOMER_FILES_SAR, TRANSACTION_TRIGGERS_ALERT
Healthcare"…specializing in domain modeling for healthcare compliance and regulatory systems"Patient, Provider, Procedure, Diagnosis, Coverage, AuthorizationPROVIDER_TREATS_PATIENT, PROCEDURE_REQUIRES_AUTH
Commercial Lending"…specializing in domain modeling for commercial lending compliance"Borrower, Facility, Collateral, Covenant, Financial StatementFACILITY_SECURED_BY_COLLATERAL, BORROWER_COMPLIES_WITH_COVENANT

LLM Provider Optimization

Prompt batch strategy (OpenAI):

ParameterOpenAI GPT
Rules per batch (Agent 3)10
Rules per dependency batch (Agent 5)50
Temperature0.7
Total prompt engineering~2,030 lines across 11 shared templates

🗄️ Data Architecture

Graph Schema

erDiagram
    business_rule ||--o{ business_rule : "depends_on"
    business_rule }o--|| entity_category : "belongs_to_category"
    business_rule ||--o{ business_rule : "relates_to"

    business_rule {
        string rule_id PK
        string rule_name
        string rule_type
        string description
        string conditions
        string consequences
        string exceptions
        string reference
        boolean mandatory
        double confidence_score
        boolean requires_review
        string review_reason
        string node_type
        string vertex_uuid UK
        string source_reference
        string effective_date
        string expiration_date
        string superseded_by
        string jurisdiction
        string risk_level
        string enforcement_action
        string applicability_scope
        string data_points_required
        string audit_frequency
        boolean reference_verified
        string reference_verification_note
        string confidence_breakdown
        string deduplication_info
        string related_rules
    }

    entity_category {
        string name
        string entity_type
        string entity_or_relationship
        string description
        string extraction_notes
        string content
        string category
    }
    

Vertex Labels

LabelDescriptionKey Properties
business_ruleCompliance rule extracted from regulatory documents. Carries full rule semantics, conditions, exceptions, confidence scores, and v2 metadata (jurisdiction, risk, audit).rule_id, rule_name, rule_type, confidence_score, mandatory
entity_categoryDomain entity or relationship category (e.g., Borrower, Property, Loan Product). Groups related business rules.name, entity_type, entity_or_relationship, category

Edge Labels

LabelSource → TargetEdge PropertiesDescription
depends_onbusiness_rulebusiness_ruledependency_type, rationale, impact_if_fails, strengthModels 7 dependency types: prerequisite, sequential, conditional, complementary, contradictory, override, validation
belongs_to_categorybusiness_ruleentity_categoryLinks a business rule to its governing entity category
relates_tobusiness_rulebusiness_ruleGeneral semantic relationship between related rules

Property Keys — Business Rule Vertex

PropertyTypeCategoryDescription
rule_idStringCoreUnique rule identifier
rule_nameStringCoreHuman-readable rule name
rule_typeStringCoreTaxonomy (10 types): eligibility, constraint, calculation, validation, process, compliance, documentation, prohibition, definition, exception
descriptionStringCoreFull rule description text
conditionsStringCoreWhen-conditions triggering the rule
consequencesStringCoreThen-outcomes when rule fires
exceptionsStringCoreUnless-exceptions to the rule
referenceStringCoreSource document section reference
mandatoryBooleanCoreWhether the rule is mandatory vs. advisory
confidence_scoreDoubleQualityAI extraction confidence (0.0–1.0)
requires_reviewBooleanQualityFlagged for human review
review_reasonStringQualityReason the rule was flagged
confidence_breakdownString (JSON)QualityPer-dimension confidence scores
reference_verifiedBooleanQualitySource reference verified by validator
reference_verification_noteStringQualityVerification outcome detail
deduplication_infoString (JSON)QualityDedup merge rationale from optimizer
source_referenceStringv2 MetadataFull source document citation
effective_dateStringv2 MetadataRule effective date (ISO 8601)
expiration_dateStringv2 MetadataRule expiration date
superseded_byStringv2 MetadataID of the superseding rule
jurisdictionStringv2 MetadataGeographic or regulatory jurisdiction
risk_levelStringv2 MetadataAssociated risk level
enforcement_actionStringv2 MetadataEnforcement consequence for non-compliance
applicability_scopeStringv2 MetadataScope of the rule's applicability
data_points_requiredStringv2 MetadataData needed to evaluate the rule
audit_frequencyStringv2 MetadataHow often the rule should be audited
related_rulesString (JSON)v2 MetadataIDs of semantically related rules

Edge Properties

PropertyTypeUsed OnDescription
dependency_typeStringdepends_onOne of: prerequisite, sequential, conditional, complementary, contradictory, override, validation
rationaleStringdepends_onAI-generated explanation of why the dependency exists
impact_if_failsStringdepends_onDownstream impact description if the dependency is broken
strengthIntegerdepends_onDependency strength score

Index Strategy

The mixedContentIndex is an OpenSearch-backed mixed index covering 12 property keys across both vertex labels, enabling full-text search, exact-match filtering, and faceted queries.

Index NameFields (Mapping)PurposeBackend
mixedContentIndexcontent (TEXT), name (TEXTSTRING)Full-text search + exact matchOpenSearch
rule_type, node_type, rule_id, category (STRING)Exact-match filteringOpenSearch
entity_or_relationship, vertex_uuid (STRING)Entity lookups & dedupOpenSearch
jurisdiction, risk_level, effective_date, audit_frequency (STRING)v2 metadata facetsOpenSearch
k-NN Vectorembedding (384-dim float[])Semantic similarity searchOpenSearch

Multi-Graph Isolation

Each compliance domain gets its own logically isolated graph within JanusGraph, backed by separate Cassandra keyspaces and OpenSearch indices. The manifest is driven by graphs.yaml.

GraphDomain
Sample GuidelinesMortgage Lending
Example PoliciesMortgage Lending

Technology Stack

LayerTechnology
Frontend (Suite Shell)React 19, TypeScript, Tailwind CSS, Vite
Frontend (Pipeline UI)React 19, TypeScript, Tailwind CSS, Vite
Frontend (KG Explorer)Vanilla JavaScript, D3.js v7, marked.js
Backend (Pipeline)FastAPI, Uvicorn, SQLite, SQLAlchemy
Backend (Assistant)Flask 3.1, Gremlin WebSocket, OpenSearch HTTP
AI InferenceOpenAI GPT (official OpenAI Python SDK)
Graph DatabaseJanusGraph 1.0
Storage BackendApache Cassandra 4.1
Search & VectorsOpenSearch 2.17 (full-text + k-NN, 384-dim)
Embeddingssentence-transformers (all-MiniLM-L6-v2)
CacheRedis 7 (LRU eviction)
Document ParsingPyPDF2, python-docx, pandas, openpyxl, pytesseract (OCR)
ContainerizationDocker, Docker Compose
Cloud DeploymentAzure Container Apps, Bicep IaC
TestingPlaywright (E2E), pytest (backend)

📖 Step-by-Step Usage

  1. Launch the Suite & Open the Dashboard

    After running ./start.sh, open http://localhost:4000 in your browser. The Dashboard shows all loaded knowledge graphs, recent pipeline runs, service health, and quick action buttons.

  2. Upload Compliance Documents

    Navigate to Documents in the sidebar. Upload PDFs, DOCX, Markdown, CSV, or Excel files into organized folders. Each folder can be associated with a compliance domain (mortgage, AML, commercial lending, healthcare).

  3. Run the Extraction Pipeline

    Go to Pipeline, select your uploaded documents, then click Start Extraction. Monitor progress through the visual agent pipeline tracker showing all 6 extraction stages in real time.

  4. Review Run History

    Check Run History to see all past pipeline runs with their status (completed, running, failed), domain, source, duration, and timestamps. Expand any run for detailed agent-by-step progress.

  5. Explore the Knowledge Graph

    Open Assistant to interact with the knowledge graph. The split-pane interface shows an interactive D3.js force-directed graph on the left and an AI chat panel on the right. Ask questions like "Show me the full graph", "Count rules", "Find prohibitions", or "Search appraisal rules".

  6. View Graph Analytics

    Navigate to Analytics for a comprehensive dashboard showing rule type distribution, risk level breakdown, dependency types, confidence score distribution, entity coverage, and most connected rules.

  7. Compare Knowledge Graphs

    Use Compare Knowledge Graphs to select two graphs (e.g., two regulatory rule sets). The system uses LLM-powered semantic matching to detect common rules, unique rules, and contradictions with AI-generated analysis.

  8. Run Impact Analysis

    Open Impact Analysis to upload old and new versions of a regulatory document. The system identifies affected rules, assesses severity, and provides recommended actions — useful for tracking regulatory changes.

  9. Track Obligations

    Go to Obligations to initialize an obligation register from graph rules. Map internal controls to compliance obligations, identify coverage gaps, and generate heatmaps for gap analysis.

  10. Edit, Review & Publish

    In the Knowledge Graph Explorer, click any node to inspect details, add comments, edit content, or mark as reviewed/approved. When ready, click Release to create an immutable version snapshot with semantic versioning.

⬇️ Installation Guide

Prerequisites

🐳

Docker & Docker Compose

Required for running infrastructure services (JanusGraph, Cassandra, OpenSearch, Redis)

🐍

Python 3.11 or 3.12

Required for running agents and backends outside Docker

🟢

Node.js 20+

Required for frontend development and the Suite Shell

🔑

OpenAI API Key

Required — set OPENAI_API_KEY in your environment

Quick Start

Terminal — Quick Start
# 1. Clone the repository
git clone <repository-url>
cd policy-to-knowledge

# 2. Configure environment
cp .env.example .env
# Open .env and set OPENAI_API_KEY at minimum

# 3. Start the full stack (~2-3 minutes on first run)
./start.sh

# 4. Open the suite
open http://localhost:4000

Environment Variables

.env — Configuration
# Required
OPENAI_API_KEY=sk-...

# Optional
OPENAI_CHAT_MODEL=gpt-4o-mini       # Default chat model
OPENAI_REASONING_EFFORT=low          # For reasoning models
MAX_TOOL_ROUNDS=3                    # Max tool-call rounds per chat turn

# Port overrides (all optional)
SUITE_PORT=4000
KG_BACKEND_PORT=8000
KG_FRONTEND_PORT=5173
CA_PORT=5000
JANUSGRAPH_PORT=8182
CASSANDRA_PORT=9042
OPENSEARCH_PORT=9200
REDIS_PORT=6379

What start.sh Does

  1. Validates Docker, Python venv, and port availability
  2. Generates JanusGraph configuration from apps/explorer/conf/graphs.yaml
  3. Starts infrastructure services in dependency order (Cassandra → OpenSearch → Redis → JanusGraph)
  4. Waits for each service to pass its health check
  5. Starts the Pipeline API (FastAPI, port 8000) and Pipeline UI (Vite, port 5173)
  6. Loads knowledge graphs into JanusGraph (incremental by default)
  7. Starts Assistant (Flask, port 5000)
  8. Starts the Suite Shell (Vite, port 4000)

Stopping Services

Terminal — Stop Commands
# Stop application processes (Docker stays running for fast restart)
./stop.sh

# Stop everything including Docker infrastructure
./stop.sh --all

# Nuclear reset — wipe all volumes + DBs, then rebuild from scratch
./start.sh --clean

💻 CLI Reference

Suite Management

CommandDescription
./start.shStart the full stack (Explorer DB + Pipeline API/UI + suite shell)
./stop.shStop application processes (Docker infra stays running)
./stop.sh --allStop application processes + Docker containers
cd apps/explorer && ./start.sh --freshExplorer only: rebuild graphs & re-index; keeps Docker volumes intact
cd apps/explorer && ./start.sh --cleanExplorer only: wipe all data stores (volumes + SQLite + Redis) and rebuild

Extraction Pipeline

cli/extract.py — Document-to-KG Extraction
apps/pipeline/
# Process all files with OpenAI
python3 cli/extract.py --provider openai

# Process a single document
python3 cli/extract.py --provider openai --file compliance-files/graphA.pdf

# Run a specific agent step (1-6)
python3 cli/extract.py --step 1   # Document organizer only
python3 cli/extract.py --step 6   # Visualization only

# Process in batches by subdirectory
python3 cli/extract.py --batch

# Process a specific domain subdirectory
python3 cli/extract.py --batch-dir healthcare

# Run merge phase with existing KGs
python3 cli/extract.py --merge
python3 cli/extract.py --merge-only --merge-strategy provenance
cli/compare.py — Knowledge Graph Merging
apps/pipeline/
# List available knowledge graphs
python3 cli/compare.py --list

# Merge two graphs (computes all set operations)
python3 cli/compare.py --g1 graphA --g2 graphB --workers 15

# Merge with custom batch size
python3 cli/compare.py --g1 FM --g2 graphA --workers 20 --batch-size 15
Assistant — Knowledge Graph Management
apps/explorer/
# Full rebuild (clears and reloads all graphs)
python3 -m src.main setup

# Incremental load (only load what is missing)
python3 -m src.main setup-if-empty

# Nuclear reset (wipe all data and rebuild)
python3 -m src.main force-clean

# Start server only (infra must be running)
python3 -m src.main serve
Individual Service Start Commands
Development — Run Individual Services
# Explorer (Flask + JanusGraph)
cd apps/explorer && SERVER_PORT=5050 .venv/bin/python3 -m src.server

# Suite Shell (React)
cd apps/shell && npm install && npm run dev

# Pipeline API + UI
cd apps/pipeline && ./start.sh

# Run backend tests
cd apps/pipeline && .venv/bin/python -m pytest tests/

# Run Playwright UI tests (services must be running)
cd apps/explorer/tests/e2e && npx playwright test

🔌 API Endpoints

Services & Ports

ServiceURLDescription
Suite Shellhttp://localhost:4000Main navigation hub (React SPA)
Pipeline UIhttp://localhost:5173Document upload & pipeline control
Pipeline APIhttp://localhost:8000FastAPI backend for extraction
Explorerhttp://localhost:5000/app (or :5050)Graph explorer & AI chat. Falls back to :5050 when :5000 is taken (Docker/AirPlay on macOS).
JanusGraphlocalhost:8182Gremlin WebSocket endpoint
OpenSearchhttp://localhost:9200Full-text & vector search
Cassandralocalhost:9042Graph storage backend
Redislocalhost:6379Query result cache

KG Backend — FastAPI

MethodPathDescription
GET/api/healthHealth check
GET/api/documentsList document folders
POST/api/documents/uploadUpload compliance documents
POST/api/pipeline/startStart extraction pipeline
GET/api/pipeline/{run_id}/statusGet pipeline run status
GET/api/graphsList knowledge graphs
GET/api/graphs/{name}/visualizationGet graph visualization data
POST/api/compareCompare two knowledge graphs
POST/api/impact/analyzeRun impact analysis
POST/api/obligations/{graph}/seedInitialize obligations from graph
GET/api/obligations/{graph}/heatmapGet obligation heatmap
GET/api/runsList all pipeline runs
GET/api/settingsGet pipeline settings

Assistant — Flask

MethodPathDescription
GET/api/graphGet full graph data
GET/api/vertex/{id}Get vertex details
GET/api/search/textFull-text search
GET/api/search/semanticSemantic vector search
POST/api/gremlin/executeExecute custom Gremlin traversal
GET/api/annotations/{node_id}Get node annotations
PUT/api/annotations/{node_id}Update annotation
POST/api/graph/releaseCreate immutable release
GET/api/graph/releasesList all releases
GET/api/graph/availableList available graphs
POST/api/rewriteAI rewrite content
POST/api/suggest-rule-idAuto-generate Rule ID

Admin Endpoints

MethodPathDescription
POST/api/admin/resetDrop & re-load all graphs, rebuild indices
GET/api/admin/consistencyVertex/edge counts per graph (sanity check)
POST/api/admin/rebuild-embeddingsRe-embed all vertices into OpenSearch k-NN
POST/api/admin/rebuild-tasksRegenerate task queue from graph state

☁️ Azure Deployment

Policy to Knowledge can be deployed to Azure Container Apps. Build and push all 5 container images to an Azure Container Registry with a unique UTC timestamp tag per build, then roll each Container App to the freshly tagged image.

Azure Deployment
# Login to Azure
az login

# Build + push each image to ACR, then update the Container Apps
az acr build --registry p2kdemo --image p2k/<svc>:<tag> .
az containerapp update --name <app> --image p2kdemo.azurecr.io/app/<svc>:<tag>

Container App Layout

AppIngressPortPublic Mount
p2k-previewExternal4000/app/
kg-frontendExternal5173/app/{documents,pipeline,...}
kg-backendInternal8000/app/api/kg/
assistantExternal5000/app/api/ca/
janusgraphInternal8182(not exposed publicly)