Skip to content

Knowledge Graph Architecture

⚠️ Status: Under Development - Basic structure implemented, full functionality pending.

Current Implementation

The omnivore-core::graph module provides foundational data structures for graph representation:

Data Structures

// Basic node representation
pub struct Node {
    pub id: String,
    pub node_type: NodeType,
    pub properties: HashMap<String, Value>,
}

// Basic edge representation  
pub struct Edge {
    pub source: String,
    pub target: String,
    pub edge_type: EdgeType,
    pub properties: HashMap<String, Value>,
}

Implemented Components

  1. Graph Structure (omnivore-core/src/graph/mod.rs)
  2. Basic graph container using petgraph
  3. Node and edge data structures
  4. Property storage using HashMaps

  5. Schema Module (omnivore-core/src/graph/schema.rs)

  6. Basic type definitions for nodes and edges
  7. Entity and relationship type enums

  8. Builder Module (omnivore-core/src/graph/builder.rs)

  9. Basic structure for graph construction
  10. Methods for adding nodes and edges

Not Yet Implemented

Graph Database Integration

The graph_db module exists but contains only stub implementations: - No actual database connections - No persistence layer - No query execution

Advanced Features

  • Entity recognition and extraction
  • Relationship inference
  • Graph algorithms (PageRank, community detection)
  • Graph visualization exports
  • SPARQL or Cypher query support

Planned Architecture

┌─────────────────────────────────────┐
│         Crawled Content             │
└────────────┬────────────────────────┘
┌─────────────────────────────────────┐
│      Content Extraction             │
│   (Metadata, Text, Structure)       │
└────────────┬────────────────────────┘
┌─────────────────────────────────────┐
│      Entity Recognition             │ ◀── Not Implemented
│    (NER, Entity Linking)            │
└────────────┬────────────────────────┘
┌─────────────────────────────────────┐
│    Relationship Extraction          │ ◀── Not Implemented
│   (Patterns, Co-occurrence)         │
└────────────┬────────────────────────┘
┌─────────────────────────────────────┐
│       Graph Construction            │ ◀── Basic Structure Only
│    (Nodes, Edges, Properties)       │
└────────────┬────────────────────────┘
┌─────────────────────────────────────┐
│      Graph Persistence              │ ◀── Not Implemented
│   (Neo4j, ArangoDB, or Custom)      │
└─────────────────────────────────────┘

Usage (Current State)

Currently, the graph module can only be used programmatically for basic graph operations:

use omnivore_core::graph::{Graph, Node, Edge};

// Create a graph
let mut graph = Graph::new();

// Add nodes (basic implementation)
let node = Node {
    id: "page1".to_string(),
    node_type: NodeType::WebPage,
    properties: HashMap::new(),
};
graph.add_node(node);

// Add edges (basic implementation)
let edge = Edge {
    source: "page1".to_string(),
    target: "page2".to_string(),
    edge_type: EdgeType::LinksTo,
    properties: HashMap::new(),
};
graph.add_edge(edge);

Limitations

  1. No Persistence: Graphs exist only in memory
  2. No Entity Extraction: Must manually define nodes
  3. No Relationship Inference: Must manually define edges
  4. No Query Language: Only programmatic access
  5. No Visualization: No export to graph formats
  6. No Graph Algorithms: Basic structure only

Future Development

Phase 1: Storage Layer

  • Implement graph database adapter interface
  • Add Neo4j or ArangoDB integration
  • Create persistence layer

Phase 2: Entity Extraction

  • Integrate NLP libraries for NER
  • Add entity linking capabilities
  • Implement entity resolution

Phase 3: Relationship Extraction

  • Pattern-based extraction
  • Co-occurrence analysis
  • Link prediction

Phase 4: Query and Analysis

  • Graph query language support
  • Graph algorithms (centrality, clustering)
  • Export to standard formats (GraphML, GEXF)

Alternative Approaches

For immediate graph needs, consider:

  1. Export to External Tools
  2. Export crawled data as JSON
  3. Import into Neo4j or other graph databases
  4. Use external tools for visualization

  5. Custom Processing

  6. Use the parser to extract structured data
  7. Process with external NLP tools
  8. Build graphs using dedicated graph libraries

  9. Wait for Updates

  10. Monitor the GitHub repository
  11. Contribute to development
  12. Use current crawler features until graph support matures