How compliance documents become a queryable, traceable knowledge graph. ← Back to overview
Three apps in one monorepo turn source policy documents into an explorable knowledge graph.
flowchart TB
user["Compliance analyst"]
docs["Source documents
(PDF / DOCX / MD)"]
subgraph P2K["Policy to Knowledge"]
shell["Suite Shell (React)"]
pipeline["Pipeline (FastAPI + 10 agents)"]
explorer["Explorer (Flask + D3)"]
end
openai["OpenAI API"]
data[("JanusGraph · Cassandra
OpenSearch · Redis")]
user --> shell
docs --> pipeline
shell --> pipeline
shell --> explorer
pipeline --> openai
explorer --> openai
pipeline --> data
explorer --> data
| Container | Tech | Port | Role |
|---|---|---|---|
| suite-shell | React + Vite (nginx) | 4000 | Navigation hub; embeds the apps via iframes |
| kg-frontend | React + Vite | 5173 | Pipeline UI: upload, run, compare, explore |
| kg-backend | FastAPI | 8000 | Pipeline API + WebSocket run streaming |
| explorer | Flask + vanilla JS/D3 | 5000 | Graph explorer + AI chat |
| janusgraph | JanusGraph 1.0 | 8182 | Graph database (Gremlin) |
| cassandra | Cassandra 4.1 | 9042 | JanusGraph storage backend |
| opensearch | OpenSearch 2.17 | 9200 | Full-text + k-NN vector index |
| redis | Redis 7 | 6379 | Query cache (LRU) |
flowchart LR
browser["Browser :4000"] --> shell["suite-shell"]
shell -->|iframe| kgfe["kg-frontend :5173"]
shell -->|iframe| exp["explorer :5000"]
kgfe --> kgbe["kg-backend :8000"]
kgbe --> oai["OpenAI"]
exp --> jg["janusgraph :8182"]
exp --> os["opensearch :9200"]
exp --> redis["redis :6379"]
jg --> cass["cassandra :9042"]
jg --> os
Each document flows through six per-document agents; multi-graph merges add agents 7–10.
flowchart TD
A0["Source PDF/DOCX"] --> A1["Agent 1 · Document Organizer
chunk + normalize"]
A1 --> A2["Agent 2 · Entity Extractor"]
A2 --> A3["Agent 3 · Rules Extractor
(word-balanced batches)"]
A3 --> A35["Agent 3.5 · Rule Validator"]
A35 --> A4["Agent 4 · Rules + Entities Merger"]
A4 --> A5["Agent 5 · KG Optimizer
dedupe + dependencies"]
A5 --> A6["Agent 6 · Visualization & Report"]
A5 --> KG["optimized_compliance_knowledge_graph.json"]
subgraph Merge["Multi-graph merge (compare)"]
A7["Agent 7 · Rule Clusterer"] --> A8["Agent 8 · Semantic Matcher"]
A8 --> A9["Agent 9 · Set Operations"]
A9 --> A10["Agent 10 · Set Visualization"]
end
KG --> A7
Outputs land under pipeline-output/<source>/agent-N-.../ (provider-flat). The optimizer emits the canonical KG JSON the Explorer loads.
flowchart LR
g1["KG: graph A"] --> M["Agent 7-9
cluster · match · set-ops"]
g2["KG: graph B"] --> M
M --> U["Union"]
M --> I["Intersection"]
M --> D["A − B / B − A"]
M --> C["Contradictions"]
U & I & D & C --> V["Agent 10 · interactive Venn/HTML"]
V --> out["pipeline-output/_merged/<a_b>/"]
sequenceDiagram
participant U as User
participant E as Explorer (Flask)
participant R as Redis
participant J as JanusGraph
participant O as OpenSearch
participant L as OpenAI
U->>E: Ask question (SSE)
E->>R: cache lookup
alt cache miss
E->>L: plan tool calls
L-->>E: tool (gremlin / semantic_search / …)
E->>J: Gremlin traversal (read-only guard)
E->>O: k-NN semantic search
E->>R: cache result
end
E-->>U: streamed answer + graph events
The raw Gremlin endpoint is read-only by default (mutating steps require GREMLIN_ALLOW_MUTATIONS=true).
flowchart TB
subgraph JG["JanusGraph"]
V["business_rule / entity_category vertices"]
Edg["depends_on / belongs_to_category edges"]
end
JG --> CASS[("Cassandra
adjacency storage")]
JG --> OSF[("OpenSearch
mixed full-text index")]
OSK[("OpenSearch k-NN
384-dim embeddings")]
SQL[("SQLite app.db
annotations / review state")]
V --> OSK
explorer["Explorer"] --> JG
explorer --> OSK
explorer --> SQL
Every rule carries a structured source_reference back to the exact document chunk and word range.
flowchart LR
chunk["kbs/<graph>/...md
(document chunk)"] --> rule["business_rule vertex
source_reference {chunk, words}"]
rule --> node["Graph node in Explorer"]
node -->|click reference| resolve["/api/reference/resolve"]
resolve --> chunk
resolve --> hl["highlighted source span"]
flowchart TB
shell["Suite Shell (React Router)"]
shell -->|postMessage theme-change| kgfe["Pipeline UI iframe"]
shell -->|postMessage theme-change| exp["Explorer iframe"]
kgfe -->|navigate / status| shell
exp -->|navigate / status| shell
note["Origin-checked bridge:
messages validated against an allowlist"]
flowchart LR
push["push / PR → main"] --> ci{"GitHub Actions"}
ci --> p1["pipeline · pytest + cov"]
ci --> p2["explorer · pytest + cov"]
ci --> p3["shell · vitest + cov + build"]
ci --> p4["pipeline-ui · vitest + cov + build"]
p1 & p2 & p3 & p4 --> allure["Allure · merged report artifact"]
Every suite emits Allure results and coverage; CI publishes a merged Allure report artifact.