Policy to Knowledge — Architecture

How compliance documents become a queryable, traceable knowledge graph. ← Back to overview

1. System context

Three apps in one monorepo turn source policy documents into an explorable knowledge graph.

flowchart TB
    user["Compliance analyst"]
    docs["Source documents
(PDF / DOCX / MD)"] subgraph P2K["Policy to Knowledge"] shell["Suite Shell (React)"] pipeline["Pipeline (FastAPI + 10 agents)"] explorer["Explorer (Flask + D3)"] end openai["OpenAI API"] data[("JanusGraph · Cassandra
OpenSearch · Redis")] user --> shell docs --> pipeline shell --> pipeline shell --> explorer pipeline --> openai explorer --> openai pipeline --> data explorer --> data

2. Containers & ports

ContainerTechPortRole
suite-shellReact + Vite (nginx)4000Navigation hub; embeds the apps via iframes
kg-frontendReact + Vite5173Pipeline UI: upload, run, compare, explore
kg-backendFastAPI8000Pipeline API + WebSocket run streaming
explorerFlask + vanilla JS/D35000Graph explorer + AI chat
janusgraphJanusGraph 1.08182Graph database (Gremlin)
cassandraCassandra 4.19042JanusGraph storage backend
opensearchOpenSearch 2.179200Full-text + k-NN vector index
redisRedis 76379Query cache (LRU)
flowchart LR
    browser["Browser :4000"] --> shell["suite-shell"]
    shell -->|iframe| kgfe["kg-frontend :5173"]
    shell -->|iframe| exp["explorer :5000"]
    kgfe --> kgbe["kg-backend :8000"]
    kgbe --> oai["OpenAI"]
    exp --> jg["janusgraph :8182"]
    exp --> os["opensearch :9200"]
    exp --> redis["redis :6379"]
    jg --> cass["cassandra :9042"]
    jg --> os

3. Extraction pipeline (10 agents)

Each document flows through six per-document agents; multi-graph merges add agents 7–10.

flowchart TD
    A0["Source PDF/DOCX"] --> A1["Agent 1 · Document Organizer
chunk + normalize"] A1 --> A2["Agent 2 · Entity Extractor"] A2 --> A3["Agent 3 · Rules Extractor
(word-balanced batches)"] A3 --> A35["Agent 3.5 · Rule Validator"] A35 --> A4["Agent 4 · Rules + Entities Merger"] A4 --> A5["Agent 5 · KG Optimizer
dedupe + dependencies"] A5 --> A6["Agent 6 · Visualization & Report"] A5 --> KG["optimized_compliance_knowledge_graph.json"] subgraph Merge["Multi-graph merge (compare)"] A7["Agent 7 · Rule Clusterer"] --> A8["Agent 8 · Semantic Matcher"] A8 --> A9["Agent 9 · Set Operations"] A9 --> A10["Agent 10 · Set Visualization"] end KG --> A7

Outputs land under pipeline-output/<source>/agent-N-.../ (provider-flat). The optimizer emits the canonical KG JSON the Explorer loads.

4. Graph comparison

flowchart LR
    g1["KG: graph A"] --> M["Agent 7-9
cluster · match · set-ops"] g2["KG: graph B"] --> M M --> U["Union"] M --> I["Intersection"] M --> D["A − B / B − A"] M --> C["Contradictions"] U & I & D & C --> V["Agent 10 · interactive Venn/HTML"] V --> out["pipeline-output/_merged/<a_b>/"]

5. Explorer runtime

sequenceDiagram
    participant U as User
    participant E as Explorer (Flask)
    participant R as Redis
    participant J as JanusGraph
    participant O as OpenSearch
    participant L as OpenAI
    U->>E: Ask question (SSE)
    E->>R: cache lookup
    alt cache miss
      E->>L: plan tool calls
      L-->>E: tool (gremlin / semantic_search / …)
      E->>J: Gremlin traversal (read-only guard)
      E->>O: k-NN semantic search
      E->>R: cache result
    end
    E-->>U: streamed answer + graph events

The raw Gremlin endpoint is read-only by default (mutating steps require GREMLIN_ALLOW_MUTATIONS=true).

6. Data layer

flowchart TB
    subgraph JG["JanusGraph"]
      V["business_rule / entity_category vertices"]
      Edg["depends_on / belongs_to_category edges"]
    end
    JG --> CASS[("Cassandra
adjacency storage")] JG --> OSF[("OpenSearch
mixed full-text index")] OSK[("OpenSearch k-NN
384-dim embeddings")] SQL[("SQLite app.db
annotations / review state")] V --> OSK explorer["Explorer"] --> JG explorer --> OSK explorer --> SQL

7. Document-to-graph traceability

Every rule carries a structured source_reference back to the exact document chunk and word range.

flowchart LR
    chunk["kbs/<graph>/...md
(document chunk)"] --> rule["business_rule vertex
source_reference {chunk, words}"] rule --> node["Graph node in Explorer"] node -->|click reference| resolve["/api/reference/resolve"] resolve --> chunk resolve --> hl["highlighted source span"]

8. Suite shell & iframe bridge

flowchart TB
    shell["Suite Shell (React Router)"]
    shell -->|postMessage theme-change| kgfe["Pipeline UI iframe"]
    shell -->|postMessage theme-change| exp["Explorer iframe"]
    kgfe -->|navigate / status| shell
    exp -->|navigate / status| shell
    note["Origin-checked bridge:
messages validated against an allowlist"]

9. CI & testing

flowchart LR
    push["push / PR → main"] --> ci{"GitHub Actions"}
    ci --> p1["pipeline · pytest + cov"]
    ci --> p2["explorer · pytest + cov"]
    ci --> p3["shell · vitest + cov + build"]
    ci --> p4["pipeline-ui · vitest + cov + build"]
    p1 & p2 & p3 & p4 --> allure["Allure · merged report artifact"]

Every suite emits Allure results and coverage; CI publishes a merged Allure report artifact.