Initial commit: Phase 1 Dual Manifold Cognitive Architecture
- Complete dual manifold memory system (episodic, semantic, persona layers) - Braiding engine with structural gates for optimal learning suggestions - FastAPI backend with comprehensive REST endpoints - Domain directory templates and configuration system - Comprehensive planning documentation for 5-phase development - Integrated UI design specifications for Phase 2 - Fixed linting issues and code quality standards
This commit is contained in:
commit
5ede9e2e7e
93
README.md
Normal file
93
README.md
Normal file
@ -0,0 +1,93 @@
|
|||||||
|
# Think Bigger - Advanced Second Brain PKM System
|
||||||
|
|
||||||
|
A comprehensive Personal Knowledge Management system with AI-powered agents, knowledge graphs, and intelligent content processing.
|
||||||
|
|
||||||
|
## Features
|
||||||
|
|
||||||
|
- **Multi-Domain Knowledge Organization**: Structured knowledge domains with consistent templates
|
||||||
|
- **AI Agent Integration**: Dana language runtime for custom automation
|
||||||
|
- **Knowledge Graphs**: Neo4j-powered relationship mapping
|
||||||
|
- **Intelligent Search**: Semantic search with embeddings
|
||||||
|
- **File System Integration**: Real-time monitoring and processing
|
||||||
|
- **Extensible Architecture**: Plugin system for custom integrations
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
1. Install dependencies:
|
||||||
|
```bash
|
||||||
|
uv sync
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Set up configuration:
|
||||||
|
```bash
|
||||||
|
cp .config/think_bigger/config.json config/local.json
|
||||||
|
# Edit config/local.json with your settings
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Run the development server:
|
||||||
|
```bash
|
||||||
|
uv run uvicorn src.api.main:app --reload
|
||||||
|
```
|
||||||
|
|
||||||
|
## Project Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
think_bigger/
|
||||||
|
├── src/
|
||||||
|
│ ├── core/ # Core system components
|
||||||
|
│ ├── agents/ # AI agent management
|
||||||
|
│ ├── processing/ # Content processing pipelines
|
||||||
|
│ ├── api/ # FastAPI application
|
||||||
|
│ └── ui/ # Frontend components (future)
|
||||||
|
├── tests/ # Test suite
|
||||||
|
├── docs/ # Documentation
|
||||||
|
├── scripts/ # Utility scripts
|
||||||
|
└── config/ # Configuration files
|
||||||
|
```
|
||||||
|
|
||||||
|
## Development
|
||||||
|
|
||||||
|
### Setup
|
||||||
|
```bash
|
||||||
|
# Install dependencies
|
||||||
|
uv sync
|
||||||
|
|
||||||
|
# Install pre-commit hooks
|
||||||
|
uv run pre-commit install
|
||||||
|
|
||||||
|
# Run tests
|
||||||
|
uv run pytest
|
||||||
|
|
||||||
|
# Format code
|
||||||
|
uv run black src/
|
||||||
|
uv run isort src/
|
||||||
|
```
|
||||||
|
|
||||||
|
### Key Components
|
||||||
|
|
||||||
|
- **File Monitor**: Cross-platform file system watching
|
||||||
|
- **Content Processor**: Document parsing and chunking
|
||||||
|
- **Embedding Service**: Vector generation for semantic search
|
||||||
|
- **Knowledge Graph**: Relationship storage and querying
|
||||||
|
- **Agent Runtime**: Dana language execution environment
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
The system uses a hierarchical configuration system:
|
||||||
|
|
||||||
|
1. `.config/think_bigger/config.json` - Global defaults
|
||||||
|
2. `config/local.json` - Local overrides
|
||||||
|
3. Domain-specific configs in each domain's `_meta/` directory
|
||||||
|
|
||||||
|
## Contributing
|
||||||
|
|
||||||
|
This project follows subagent-driven development principles. Use the appropriate agent for your task:
|
||||||
|
|
||||||
|
- `explore`: Codebase analysis and discovery
|
||||||
|
- `code-task-executor`: Isolated coding tasks
|
||||||
|
- `testing-agent`: Automated testing
|
||||||
|
- `documentation-updater`: Documentation maintenance
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
MIT License - see LICENSE file for details.
|
||||||
43
pyproject.toml
Normal file
43
pyproject.toml
Normal file
@ -0,0 +1,43 @@
|
|||||||
|
[project]
|
||||||
|
name = "think-bigger"
|
||||||
|
version = "0.1.0"
|
||||||
|
description = "Advanced Second Brain PKM System"
|
||||||
|
readme = "README.md"
|
||||||
|
requires-python = ">=3.9"
|
||||||
|
dependencies = [
|
||||||
|
"fastapi>=0.104.0",
|
||||||
|
"uvicorn>=0.24.0",
|
||||||
|
"pydantic>=2.5.0",
|
||||||
|
"watchdog>=3.0.0",
|
||||||
|
"sentence-transformers>=2.2.0",
|
||||||
|
"neo4j>=5.0.0",
|
||||||
|
"python-multipart>=0.0.6",
|
||||||
|
"aiofiles>=23.0.0",
|
||||||
|
"loguru>=0.7.0",
|
||||||
|
"pyyaml>=6.0.0",
|
||||||
|
"faiss-cpu>=1.7.0",
|
||||||
|
"rank-bm25>=0.2.0",
|
||||||
|
"networkx>=3.0.0",
|
||||||
|
"openai>=1.0.0",
|
||||||
|
"numpy>=1.24.0",
|
||||||
|
"scipy>=1.11.0",
|
||||||
|
"pandas>=2.0.0",
|
||||||
|
"tiktoken>=0.5.0",
|
||||||
|
]
|
||||||
|
|
||||||
|
[build-system]
|
||||||
|
requires = ["hatchling"]
|
||||||
|
build-backend = "hatchling.build"
|
||||||
|
|
||||||
|
[tool.hatch.build.targets.wheel]
|
||||||
|
packages = ["src"]
|
||||||
|
|
||||||
|
[tool.uv]
|
||||||
|
dev-dependencies = [
|
||||||
|
"pytest>=7.4.0",
|
||||||
|
"pytest-asyncio>=0.21.0",
|
||||||
|
"black>=23.0.0",
|
||||||
|
"isort>=5.12.0",
|
||||||
|
"mypy>=1.7.0",
|
||||||
|
"ruff>=0.1.0",
|
||||||
|
]
|
||||||
BIN
src/think_bigger/api/__pycache__/main.cpython-312.pyc
Normal file
BIN
src/think_bigger/api/__pycache__/main.cpython-312.pyc
Normal file
Binary file not shown.
BIN
src/think_bigger/api/__pycache__/routes.cpython-312.pyc
Normal file
BIN
src/think_bigger/api/__pycache__/routes.cpython-312.pyc
Normal file
Binary file not shown.
Binary file not shown.
BIN
src/think_bigger/api/endpoints/__pycache__/files.cpython-312.pyc
Normal file
BIN
src/think_bigger/api/endpoints/__pycache__/files.cpython-312.pyc
Normal file
Binary file not shown.
Binary file not shown.
26
src/think_bigger/api/endpoints/agents.py
Normal file
26
src/think_bigger/api/endpoints/agents.py
Normal file
@ -0,0 +1,26 @@
|
|||||||
|
"""AI agent endpoints."""
|
||||||
|
|
||||||
|
from fastapi import APIRouter
|
||||||
|
|
||||||
|
router = APIRouter()
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/")
|
||||||
|
async def list_agents():
|
||||||
|
"""List all available agents."""
|
||||||
|
# TODO: Implement agent listing
|
||||||
|
return {"agents": []}
|
||||||
|
|
||||||
|
|
||||||
|
@router.post("/{agent_id}/execute")
|
||||||
|
async def execute_agent(agent_id: str, input_data: dict):
|
||||||
|
"""Execute an AI agent."""
|
||||||
|
# TODO: Implement agent execution
|
||||||
|
return {"agent_id": agent_id, "status": "executing", "input": input_data}
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/dana/status")
|
||||||
|
async def dana_runtime_status():
|
||||||
|
"""Get Dana runtime status."""
|
||||||
|
# TODO: Implement Dana status checking
|
||||||
|
return {"status": "ready", "version": "1.0.0"}
|
||||||
26
src/think_bigger/api/endpoints/files.py
Normal file
26
src/think_bigger/api/endpoints/files.py
Normal file
@ -0,0 +1,26 @@
|
|||||||
|
"""File system endpoints."""
|
||||||
|
|
||||||
|
from fastapi import APIRouter, UploadFile, File
|
||||||
|
|
||||||
|
router = APIRouter()
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/domains")
|
||||||
|
async def list_domains():
|
||||||
|
"""List all knowledge domains."""
|
||||||
|
# TODO: Implement domain listing
|
||||||
|
return {"domains": []}
|
||||||
|
|
||||||
|
|
||||||
|
@router.post("/upload")
|
||||||
|
async def upload_file(file: UploadFile = File(...)):
|
||||||
|
"""Upload a file to the system."""
|
||||||
|
# TODO: Implement file upload
|
||||||
|
return {"filename": file.filename, "status": "uploaded"}
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/status")
|
||||||
|
async def file_system_status():
|
||||||
|
"""Get file system monitoring status."""
|
||||||
|
# TODO: Implement status checking
|
||||||
|
return {"status": "active", "watched_paths": []}
|
||||||
317
src/think_bigger/api/endpoints/knowledge.py
Normal file
317
src/think_bigger/api/endpoints/knowledge.py
Normal file
@ -0,0 +1,317 @@
|
|||||||
|
"""Knowledge graph endpoints with dual manifold cognitive architecture."""
|
||||||
|
|
||||||
|
from typing import List, Optional, Dict, Any
|
||||||
|
from fastapi import APIRouter, HTTPException, BackgroundTasks
|
||||||
|
from pydantic import BaseModel
|
||||||
|
|
||||||
|
from think_bigger.core.memory.episodic.episodic_memory import (
|
||||||
|
EpisodicMemory,
|
||||||
|
)
|
||||||
|
from think_bigger.core.memory.semantic.semantic_distiller import (
|
||||||
|
SemanticDistiller,
|
||||||
|
)
|
||||||
|
from think_bigger.core.memory.persona.persona_graph import PersonaGraph
|
||||||
|
from think_bigger.core.memory.manifolds.dual_manifold import DualManifoldEngine
|
||||||
|
from think_bigger.core.memory.braiding.braiding_engine import BraidingEngine, GateType
|
||||||
|
from think_bigger.core.config import config_manager
|
||||||
|
|
||||||
|
router = APIRouter()
|
||||||
|
|
||||||
|
# Global instances (in production, these would be dependency injected)
|
||||||
|
episodic_memory = None
|
||||||
|
semantic_distiller = None
|
||||||
|
persona_graph = None
|
||||||
|
dual_manifold_engine = None
|
||||||
|
braiding_engine = None
|
||||||
|
|
||||||
|
|
||||||
|
def get_memory_instances():
|
||||||
|
"""Get or create memory system instances."""
|
||||||
|
global \
|
||||||
|
episodic_memory, \
|
||||||
|
semantic_distiller, \
|
||||||
|
persona_graph, \
|
||||||
|
dual_manifold_engine, \
|
||||||
|
braiding_engine
|
||||||
|
|
||||||
|
if episodic_memory is None:
|
||||||
|
config = config_manager.config
|
||||||
|
data_dir = config.system.data_directory
|
||||||
|
|
||||||
|
# Initialize embedding model
|
||||||
|
from sentence_transformers import SentenceTransformer
|
||||||
|
|
||||||
|
embedding_model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
|
||||||
|
|
||||||
|
# Initialize memory systems
|
||||||
|
episodic_memory = EpisodicMemory(f"{data_dir}/episodic")
|
||||||
|
semantic_distiller = SemanticDistiller(
|
||||||
|
f"{data_dir}/semantic", "your-openai-api-key-here"
|
||||||
|
) # TODO: Get from config
|
||||||
|
persona_graph = PersonaGraph(f"{data_dir}/persona")
|
||||||
|
dual_manifold_engine = DualManifoldEngine(
|
||||||
|
f"{data_dir}/manifolds",
|
||||||
|
embedding_model,
|
||||||
|
"your-openai-api-key-here", # TODO: Get from config
|
||||||
|
)
|
||||||
|
braiding_engine = BraidingEngine(
|
||||||
|
f"{data_dir}/braiding",
|
||||||
|
dual_manifold_engine,
|
||||||
|
"your-openai-api-key-here", # TODO: Get from config
|
||||||
|
)
|
||||||
|
|
||||||
|
return (
|
||||||
|
episodic_memory,
|
||||||
|
semantic_distiller,
|
||||||
|
persona_graph,
|
||||||
|
dual_manifold_engine,
|
||||||
|
braiding_engine,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
# Pydantic models for API
|
||||||
|
class MemoryEntryModel(BaseModel):
|
||||||
|
content: str
|
||||||
|
source_file: str
|
||||||
|
metadata: Optional[Dict[str, Any]] = None
|
||||||
|
|
||||||
|
|
||||||
|
class SearchQuery(BaseModel):
|
||||||
|
query: str
|
||||||
|
manifold_type: Optional[str] = "individual"
|
||||||
|
top_k: Optional[int] = 10
|
||||||
|
gate_types: Optional[List[str]] = None
|
||||||
|
|
||||||
|
|
||||||
|
class ConceptExtractionRequest(BaseModel):
|
||||||
|
entries: List[str] # Entry IDs to process
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/graph")
|
||||||
|
async def get_knowledge_graph():
|
||||||
|
"""Get the knowledge graph structure."""
|
||||||
|
try:
|
||||||
|
_, _, persona_graph, _, _ = get_memory_instances()
|
||||||
|
|
||||||
|
nodes = []
|
||||||
|
for node in persona_graph.nodes.values():
|
||||||
|
nodes.append(
|
||||||
|
{
|
||||||
|
"id": node.id,
|
||||||
|
"type": node.type,
|
||||||
|
"label": node.label,
|
||||||
|
"properties": node.properties,
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
edges = []
|
||||||
|
for edge in persona_graph.edges:
|
||||||
|
edges.append(
|
||||||
|
{
|
||||||
|
"source": edge.source_id,
|
||||||
|
"target": edge.target_id,
|
||||||
|
"relationship": edge.relationship,
|
||||||
|
"weight": edge.weight,
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
return {"nodes": nodes, "edges": edges}
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
raise HTTPException(status_code=500, detail=f"Error retrieving graph: {str(e)}")
|
||||||
|
|
||||||
|
|
||||||
|
@router.post("/search")
|
||||||
|
async def search_knowledge(search_query: SearchQuery):
|
||||||
|
"""Search the knowledge base using braided search."""
|
||||||
|
try:
|
||||||
|
_, _, _, _, braiding_engine = get_memory_instances()
|
||||||
|
|
||||||
|
# Convert gate type strings to enum
|
||||||
|
gate_types = None
|
||||||
|
if search_query.gate_types:
|
||||||
|
gate_types = [
|
||||||
|
GateType(gt.lower())
|
||||||
|
for gt in search_query.gate_types
|
||||||
|
if gt.lower() in [e.value for e in GateType]
|
||||||
|
]
|
||||||
|
|
||||||
|
# Perform braided search
|
||||||
|
results = await braiding_engine.braid_search(
|
||||||
|
search_query.query,
|
||||||
|
search_query.manifold_type,
|
||||||
|
search_query.top_k,
|
||||||
|
gate_types,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Convert results to dict format
|
||||||
|
formatted_results = []
|
||||||
|
for result in results:
|
||||||
|
formatted_results.append(
|
||||||
|
{
|
||||||
|
"content_id": result.content_id,
|
||||||
|
"content": result.content,
|
||||||
|
"alpha_score": result.alpha_score,
|
||||||
|
"beta_score": result.beta_score,
|
||||||
|
"combined_score": result.combined_score,
|
||||||
|
"gate_type": result.gate_type.value,
|
||||||
|
"confidence": result.confidence,
|
||||||
|
"hallucination_risk": result.hallucination_risk,
|
||||||
|
"metadata": result.metadata,
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
return {
|
||||||
|
"query": search_query.query,
|
||||||
|
"results": formatted_results,
|
||||||
|
"total_results": len(formatted_results),
|
||||||
|
}
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=500, detail=f"Error searching knowledge: {str(e)}"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@router.post("/entries")
|
||||||
|
async def add_memory_entry(entry: MemoryEntryModel):
|
||||||
|
"""Add a new memory entry to the system."""
|
||||||
|
try:
|
||||||
|
episodic_memory, _, _, dual_manifold_engine, _ = get_memory_instances()
|
||||||
|
|
||||||
|
# Add to episodic memory
|
||||||
|
entry_id = episodic_memory.add_entry(
|
||||||
|
entry.content, entry.source_file, entry.metadata
|
||||||
|
)
|
||||||
|
|
||||||
|
# Process through dual manifold
|
||||||
|
memory_entry = episodic_memory.get_entry_by_id(entry_id)
|
||||||
|
if memory_entry:
|
||||||
|
processing_results = dual_manifold_engine.process_memory_entries(
|
||||||
|
[memory_entry]
|
||||||
|
)
|
||||||
|
|
||||||
|
return {
|
||||||
|
"entry_id": entry_id,
|
||||||
|
"status": "added",
|
||||||
|
"processing_results": processing_results,
|
||||||
|
}
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
raise HTTPException(status_code=500, detail=f"Error adding entry: {str(e)}")
|
||||||
|
|
||||||
|
|
||||||
|
@router.post("/concepts/extract")
|
||||||
|
async def extract_concepts(
|
||||||
|
request: ConceptExtractionRequest, background_tasks: BackgroundTasks
|
||||||
|
):
|
||||||
|
"""Extract semantic concepts from memory entries."""
|
||||||
|
try:
|
||||||
|
episodic_memory, semantic_distiller, _, _, _ = get_memory_instances()
|
||||||
|
|
||||||
|
# Get the requested entries
|
||||||
|
entries = []
|
||||||
|
for entry_id in request.entries:
|
||||||
|
entry = episodic_memory.get_entry_by_id(entry_id)
|
||||||
|
if entry:
|
||||||
|
entries.append(entry)
|
||||||
|
|
||||||
|
if not entries:
|
||||||
|
raise HTTPException(status_code=404, detail="No valid entries found")
|
||||||
|
|
||||||
|
# Extract concepts (run in background for heavy processing)
|
||||||
|
background_tasks.add_task(semantic_distiller.distill_entries, entries)
|
||||||
|
|
||||||
|
return {
|
||||||
|
"status": "processing",
|
||||||
|
"entries_processed": len(entries),
|
||||||
|
"message": "Concept extraction started in background",
|
||||||
|
}
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=500, detail=f"Error extracting concepts: {str(e)}"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/concepts")
|
||||||
|
async def get_concepts():
|
||||||
|
"""Get all extracted semantic concepts."""
|
||||||
|
try:
|
||||||
|
_, semantic_distiller, _, _, _ = get_memory_instances()
|
||||||
|
|
||||||
|
concepts = []
|
||||||
|
for concept in semantic_distiller.concepts.values():
|
||||||
|
concepts.append(
|
||||||
|
{
|
||||||
|
"id": concept.id,
|
||||||
|
"name": concept.name,
|
||||||
|
"description": concept.description,
|
||||||
|
"confidence": concept.confidence,
|
||||||
|
"related_entries": concept.related_entries,
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
return {"concepts": concepts, "total": len(concepts)}
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=500, detail=f"Error retrieving concepts: {str(e)}"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/manifolds/stats")
|
||||||
|
async def get_manifold_stats():
|
||||||
|
"""Get statistics for the dual manifold system."""
|
||||||
|
try:
|
||||||
|
_, _, _, dual_manifold_engine, braiding_engine = get_memory_instances()
|
||||||
|
|
||||||
|
return {
|
||||||
|
"dual_manifold": dual_manifold_engine.get_manifold_stats(),
|
||||||
|
"braiding_engine": braiding_engine.get_gate_stats(),
|
||||||
|
}
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
raise HTTPException(status_code=500, detail=f"Error retrieving stats: {str(e)}")
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/embeddings/status")
|
||||||
|
async def embedding_status():
|
||||||
|
"""Get embedding service status."""
|
||||||
|
try:
|
||||||
|
episodic_memory, _, _, _, _ = get_memory_instances()
|
||||||
|
|
||||||
|
stats = episodic_memory.get_stats()
|
||||||
|
return {
|
||||||
|
"status": "ready",
|
||||||
|
"model": "sentence-transformers/all-MiniLM-L6-v2",
|
||||||
|
"episodic_memory": stats,
|
||||||
|
}
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
return {
|
||||||
|
"status": "error",
|
||||||
|
"error": str(e),
|
||||||
|
"model": "sentence-transformers/all-MiniLM-L6-v2",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/health")
|
||||||
|
async def memory_system_health():
|
||||||
|
"""Check the health of the memory system."""
|
||||||
|
try:
|
||||||
|
instances = get_memory_instances()
|
||||||
|
|
||||||
|
return {
|
||||||
|
"status": "healthy",
|
||||||
|
"components": {
|
||||||
|
"episodic_memory": "initialized" if instances[0] else "failed",
|
||||||
|
"semantic_distiller": "initialized" if instances[1] else "failed",
|
||||||
|
"persona_graph": "initialized" if instances[2] else "failed",
|
||||||
|
"dual_manifold": "initialized" if instances[3] else "failed",
|
||||||
|
"braiding_engine": "initialized" if instances[4] else "failed",
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
return {"status": "unhealthy", "error": str(e)}
|
||||||
52
src/think_bigger/api/main.py
Normal file
52
src/think_bigger/api/main.py
Normal file
@ -0,0 +1,52 @@
|
|||||||
|
"""Main FastAPI application for Think Bigger."""
|
||||||
|
|
||||||
|
from fastapi import FastAPI
|
||||||
|
from fastapi.middleware.cors import CORSMiddleware
|
||||||
|
|
||||||
|
from think_bigger.core.config import config_manager
|
||||||
|
from think_bigger.api.routes import api_router
|
||||||
|
|
||||||
|
# Create FastAPI app
|
||||||
|
app = FastAPI(
|
||||||
|
title="Think Bigger API",
|
||||||
|
description="Advanced Second Brain PKM System API",
|
||||||
|
version="0.1.0",
|
||||||
|
)
|
||||||
|
|
||||||
|
# Configure CORS
|
||||||
|
app.add_middleware(
|
||||||
|
CORSMiddleware,
|
||||||
|
allow_origins=[
|
||||||
|
"http://localhost:3000",
|
||||||
|
"http://localhost:5173",
|
||||||
|
], # React dev servers
|
||||||
|
allow_credentials=True,
|
||||||
|
allow_methods=["*"],
|
||||||
|
allow_headers=["*"],
|
||||||
|
)
|
||||||
|
|
||||||
|
# Include API routes
|
||||||
|
app.include_router(api_router, prefix="/api/v1")
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/")
|
||||||
|
async def root():
|
||||||
|
"""Root endpoint."""
|
||||||
|
return {"message": "Think Bigger API", "version": "0.1.0"}
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/health")
|
||||||
|
async def health_check():
|
||||||
|
"""Health check endpoint."""
|
||||||
|
config = config_manager.config
|
||||||
|
return {
|
||||||
|
"status": "healthy",
|
||||||
|
"version": config.version,
|
||||||
|
"data_directory": config.system.data_directory,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
import uvicorn
|
||||||
|
|
||||||
|
uvicorn.run(app, host="0.0.0.0", port=8000)
|
||||||
14
src/think_bigger/api/routes.py
Normal file
14
src/think_bigger/api/routes.py
Normal file
@ -0,0 +1,14 @@
|
|||||||
|
"""API routes for Think Bigger."""
|
||||||
|
|
||||||
|
from fastapi import APIRouter
|
||||||
|
|
||||||
|
from think_bigger.api.endpoints import files, knowledge, agents
|
||||||
|
|
||||||
|
api_router = APIRouter()
|
||||||
|
|
||||||
|
# Include endpoint routers
|
||||||
|
api_router.include_router(files.router, prefix="/files", tags=["files"])
|
||||||
|
|
||||||
|
api_router.include_router(knowledge.router, prefix="/knowledge", tags=["knowledge"])
|
||||||
|
|
||||||
|
api_router.include_router(agents.router, prefix="/agents", tags=["agents"])
|
||||||
BIN
src/think_bigger/core/__pycache__/config.cpython-312.pyc
Normal file
BIN
src/think_bigger/core/__pycache__/config.cpython-312.pyc
Normal file
Binary file not shown.
131
src/think_bigger/core/config.py
Normal file
131
src/think_bigger/core/config.py
Normal file
@ -0,0 +1,131 @@
|
|||||||
|
"""Configuration management for Think Bigger."""
|
||||||
|
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any, Dict, Optional
|
||||||
|
|
||||||
|
from pydantic import BaseModel, Field
|
||||||
|
|
||||||
|
|
||||||
|
class SystemConfig(BaseModel):
|
||||||
|
"""System-level configuration."""
|
||||||
|
|
||||||
|
data_directory: str = "~/think_bigger_data"
|
||||||
|
backup_directory: str = "~/think_bigger_backups"
|
||||||
|
log_level: str = "INFO"
|
||||||
|
auto_backup: bool = True
|
||||||
|
backup_frequency: str = "daily"
|
||||||
|
|
||||||
|
|
||||||
|
class ProcessingConfig(BaseModel):
|
||||||
|
"""Content processing configuration."""
|
||||||
|
|
||||||
|
embedding_model: str = "sentence-transformers/all-MiniLM-L6-v2"
|
||||||
|
chunk_size: int = 512
|
||||||
|
overlap: int = 50
|
||||||
|
max_file_size: str = "100MB"
|
||||||
|
supported_formats: list[str] = ["pdf", "md", "txt", "html", "docx"]
|
||||||
|
|
||||||
|
|
||||||
|
class UIConfig(BaseModel):
|
||||||
|
"""UI configuration."""
|
||||||
|
|
||||||
|
theme: str = "dark"
|
||||||
|
font_size: str = "medium"
|
||||||
|
sidebar_width: int = 300
|
||||||
|
graph_layout: str = "force"
|
||||||
|
default_view: str = "graph"
|
||||||
|
|
||||||
|
|
||||||
|
class AgentConfig(BaseModel):
|
||||||
|
"""AI agent configuration."""
|
||||||
|
|
||||||
|
enabled: bool = True
|
||||||
|
max_concurrent: int = 3
|
||||||
|
timeout: int = 300
|
||||||
|
sandbox: bool = True
|
||||||
|
|
||||||
|
|
||||||
|
class IntegrationConfig(BaseModel):
|
||||||
|
"""External integration configuration."""
|
||||||
|
|
||||||
|
notion: Dict[str, Any] = Field(default_factory=dict)
|
||||||
|
obsidian: Dict[str, Any] = Field(default_factory=dict)
|
||||||
|
|
||||||
|
|
||||||
|
class ThinkBiggerConfig(BaseModel):
|
||||||
|
"""Main configuration model."""
|
||||||
|
|
||||||
|
version: str = "1.0.0"
|
||||||
|
system: SystemConfig = Field(default_factory=SystemConfig)
|
||||||
|
processing: ProcessingConfig = Field(default_factory=ProcessingConfig)
|
||||||
|
ui: UIConfig = Field(default_factory=UIConfig)
|
||||||
|
agents: AgentConfig = Field(default_factory=AgentConfig)
|
||||||
|
integrations: IntegrationConfig = Field(default_factory=IntegrationConfig)
|
||||||
|
|
||||||
|
|
||||||
|
class ConfigManager:
|
||||||
|
"""Configuration manager with hierarchical loading."""
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
self.config_dir = Path.home() / ".config" / "think_bigger"
|
||||||
|
self.project_config_dir = Path("config")
|
||||||
|
self._config: Optional[ThinkBiggerConfig] = None
|
||||||
|
|
||||||
|
def load_config(self) -> ThinkBiggerConfig:
|
||||||
|
"""Load configuration from multiple sources."""
|
||||||
|
if self._config is not None:
|
||||||
|
return self._config
|
||||||
|
|
||||||
|
# Start with defaults
|
||||||
|
config = ThinkBiggerConfig()
|
||||||
|
|
||||||
|
# Load global config
|
||||||
|
global_config_path = self.config_dir / "config.json"
|
||||||
|
if global_config_path.exists():
|
||||||
|
global_data = json.loads(global_config_path.read_text())
|
||||||
|
config = ThinkBiggerConfig(**global_data)
|
||||||
|
|
||||||
|
# Load local overrides
|
||||||
|
local_config_path = self.project_config_dir / "local.json"
|
||||||
|
if local_config_path.exists():
|
||||||
|
local_data = json.loads(local_config_path.read_text())
|
||||||
|
# Deep merge local config over global
|
||||||
|
config = self._merge_configs(config, local_data)
|
||||||
|
|
||||||
|
# Expand paths
|
||||||
|
config.system.data_directory = os.path.expanduser(config.system.data_directory)
|
||||||
|
config.system.backup_directory = os.path.expanduser(
|
||||||
|
config.system.backup_directory
|
||||||
|
)
|
||||||
|
|
||||||
|
self._config = config
|
||||||
|
return config
|
||||||
|
|
||||||
|
def _merge_configs(
|
||||||
|
self, base: ThinkBiggerConfig, override: Dict[str, Any]
|
||||||
|
) -> ThinkBiggerConfig:
|
||||||
|
"""Deep merge override config into base config."""
|
||||||
|
base_dict = base.model_dump()
|
||||||
|
self._deep_merge(base_dict, override)
|
||||||
|
return ThinkBiggerConfig(**base_dict)
|
||||||
|
|
||||||
|
def _deep_merge(self, base: Dict[str, Any], override: Dict[str, Any]) -> None:
|
||||||
|
"""Recursively merge override dict into base dict."""
|
||||||
|
for key, value in override.items():
|
||||||
|
if key in base and isinstance(base[key], dict) and isinstance(value, dict):
|
||||||
|
self._deep_merge(base[key], value)
|
||||||
|
else:
|
||||||
|
base[key] = value
|
||||||
|
|
||||||
|
@property
|
||||||
|
def config(self) -> ThinkBiggerConfig:
|
||||||
|
"""Get the loaded configuration."""
|
||||||
|
if self._config is None:
|
||||||
|
return self.load_config()
|
||||||
|
return self._config
|
||||||
|
|
||||||
|
|
||||||
|
# Global config instance
|
||||||
|
config_manager = ConfigManager()
|
||||||
Binary file not shown.
501
src/think_bigger/core/memory/braiding/braiding_engine.py
Normal file
501
src/think_bigger/core/memory/braiding/braiding_engine.py
Normal file
@ -0,0 +1,501 @@
|
|||||||
|
"""Braiding engine with structural gates and hallucination filtering."""
|
||||||
|
|
||||||
|
import json
|
||||||
|
from datetime import datetime
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Dict, List, Optional, Any, Tuple
|
||||||
|
from dataclasses import dataclass, asdict
|
||||||
|
from enum import Enum
|
||||||
|
|
||||||
|
from loguru import logger
|
||||||
|
|
||||||
|
from think_bigger.core.memory.manifolds.dual_manifold import DualManifoldEngine
|
||||||
|
|
||||||
|
|
||||||
|
class GateType(Enum):
|
||||||
|
"""Types of structural gates in the braiding engine."""
|
||||||
|
|
||||||
|
SEMANTIC = "semantic"
|
||||||
|
TEMPORAL = "temporal"
|
||||||
|
STRUCTURAL = "structural"
|
||||||
|
CONTEXTUAL = "contextual"
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class StructuralGate:
|
||||||
|
"""Represents a structural gate in the braiding engine."""
|
||||||
|
|
||||||
|
id: str
|
||||||
|
gate_type: GateType
|
||||||
|
threshold: float
|
||||||
|
alpha_weight: float # Weight for vector similarity
|
||||||
|
beta_weight: float # Weight for graph centrality
|
||||||
|
filters: List[str] # Active filters
|
||||||
|
created_at: datetime
|
||||||
|
metadata: Dict[str, Any]
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class BraidedResult:
|
||||||
|
"""Result from the braiding engine with scoring."""
|
||||||
|
|
||||||
|
content_id: str
|
||||||
|
content: str
|
||||||
|
alpha_score: float # Vector similarity score
|
||||||
|
beta_score: float # Graph centrality score
|
||||||
|
combined_score: float
|
||||||
|
gate_type: GateType
|
||||||
|
confidence: float
|
||||||
|
hallucination_risk: float
|
||||||
|
metadata: Dict[str, Any]
|
||||||
|
|
||||||
|
|
||||||
|
class BraidingEngine:
|
||||||
|
"""Braiding engine with structural gates for optimal knowledge suggestions."""
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
data_dir: str,
|
||||||
|
dual_manifold_engine: DualManifoldEngine,
|
||||||
|
openai_api_key: str,
|
||||||
|
):
|
||||||
|
self.data_dir = Path(data_dir)
|
||||||
|
self.data_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
self.dual_manifold_engine = dual_manifold_engine
|
||||||
|
self.openai_api_key = openai_api_key
|
||||||
|
|
||||||
|
# Structural gates
|
||||||
|
self.gates: Dict[str, StructuralGate] = {}
|
||||||
|
|
||||||
|
# Braiding results cache
|
||||||
|
self.results_cache: Dict[str, List[Dict[str, Any]]] = {}
|
||||||
|
|
||||||
|
# Persistence paths
|
||||||
|
self.gates_path = self.data_dir / "structural_gates.json"
|
||||||
|
self.cache_path = self.data_dir / "braiding_cache.json"
|
||||||
|
|
||||||
|
# Load existing data
|
||||||
|
self._load_persistent_data()
|
||||||
|
|
||||||
|
# Initialize default gates
|
||||||
|
self._initialize_default_gates()
|
||||||
|
|
||||||
|
def _load_persistent_data(self):
|
||||||
|
"""Load persisted braiding data."""
|
||||||
|
try:
|
||||||
|
# Load gates
|
||||||
|
if self.gates_path.exists():
|
||||||
|
with open(self.gates_path, "r") as f:
|
||||||
|
gates_data = json.load(f)
|
||||||
|
for gate_dict in gates_data:
|
||||||
|
gate = StructuralGate(**gate_dict)
|
||||||
|
gate.gate_type = GateType(gate.gate_type)
|
||||||
|
gate.created_at = datetime.fromisoformat(
|
||||||
|
gate_dict["created_at"]
|
||||||
|
)
|
||||||
|
self.gates[gate.id] = gate
|
||||||
|
|
||||||
|
# Load cache
|
||||||
|
if self.cache_path.exists():
|
||||||
|
with open(self.cache_path, "r") as f:
|
||||||
|
self.results_cache = json.load(f)
|
||||||
|
|
||||||
|
logger.info(f"Loaded braiding engine with {len(self.gates)} gates")
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error loading braiding data: {e}")
|
||||||
|
|
||||||
|
def _save_persistent_data(self):
|
||||||
|
"""Save braiding data to disk."""
|
||||||
|
try:
|
||||||
|
# Save gates
|
||||||
|
gates_data = []
|
||||||
|
for gate in self.gates.values():
|
||||||
|
gate_dict = asdict(gate)
|
||||||
|
gate_dict["gate_type"] = gate.gate_type.value
|
||||||
|
gate_dict["created_at"] = gate.created_at.isoformat()
|
||||||
|
gates_data.append(gate_dict)
|
||||||
|
|
||||||
|
with open(self.gates_path, "w") as f:
|
||||||
|
json.dump(gates_data, f, indent=2)
|
||||||
|
|
||||||
|
# Save cache
|
||||||
|
with open(self.cache_path, "w") as f:
|
||||||
|
json.dump(self.results_cache, f, indent=2)
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error saving braiding data: {e}")
|
||||||
|
|
||||||
|
def _initialize_default_gates(self):
|
||||||
|
"""Initialize default structural gates."""
|
||||||
|
default_gates = [
|
||||||
|
{
|
||||||
|
"id": "semantic_gate",
|
||||||
|
"gate_type": GateType.SEMANTIC,
|
||||||
|
"threshold": 0.7,
|
||||||
|
"alpha_weight": 0.6,
|
||||||
|
"beta_weight": 0.4,
|
||||||
|
"filters": ["hallucination_check", "relevance_filter"],
|
||||||
|
"metadata": {
|
||||||
|
"description": "Semantic similarity with concept validation"
|
||||||
|
},
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "temporal_gate",
|
||||||
|
"gate_type": GateType.TEMPORAL,
|
||||||
|
"threshold": 0.5,
|
||||||
|
"alpha_weight": 0.3,
|
||||||
|
"beta_weight": 0.7,
|
||||||
|
"filters": ["recency_boost", "temporal_coherence"],
|
||||||
|
"metadata": {
|
||||||
|
"description": "Temporal relevance with recency weighting"
|
||||||
|
},
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "structural_gate",
|
||||||
|
"gate_type": GateType.STRUCTURAL,
|
||||||
|
"threshold": 0.8,
|
||||||
|
"alpha_weight": 0.4,
|
||||||
|
"beta_weight": 0.6,
|
||||||
|
"filters": ["graph_centrality", "connectivity_filter"],
|
||||||
|
"metadata": {"description": "Graph structure with centrality measures"},
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "contextual_gate",
|
||||||
|
"gate_type": GateType.CONTEXTUAL,
|
||||||
|
"threshold": 0.6,
|
||||||
|
"alpha_weight": 0.5,
|
||||||
|
"beta_weight": 0.5,
|
||||||
|
"filters": ["context_awareness", "domain_relevance"],
|
||||||
|
"metadata": {
|
||||||
|
"description": "Context-aware filtering with domain knowledge"
|
||||||
|
},
|
||||||
|
},
|
||||||
|
]
|
||||||
|
|
||||||
|
for gate_data in default_gates:
|
||||||
|
if gate_data["id"] not in self.gates:
|
||||||
|
gate = StructuralGate(
|
||||||
|
id=gate_data["id"],
|
||||||
|
gate_type=gate_data["gate_type"],
|
||||||
|
threshold=gate_data["threshold"],
|
||||||
|
alpha_weight=gate_data["alpha_weight"],
|
||||||
|
beta_weight=gate_data["beta_weight"],
|
||||||
|
filters=gate_data["filters"],
|
||||||
|
created_at=datetime.now(),
|
||||||
|
metadata=gate_data["metadata"],
|
||||||
|
)
|
||||||
|
self.gates[gate.id] = gate
|
||||||
|
|
||||||
|
self._save_persistent_data()
|
||||||
|
|
||||||
|
async def braid_search(
|
||||||
|
self,
|
||||||
|
query: str,
|
||||||
|
manifold_type: str = "individual",
|
||||||
|
top_k: int = 10,
|
||||||
|
gate_types: Optional[List[GateType]] = None,
|
||||||
|
) -> List[BraidedResult]:
|
||||||
|
"""Perform braided search using structural gates."""
|
||||||
|
|
||||||
|
# Use cache if available
|
||||||
|
cache_key = f"{query}_{manifold_type}_{top_k}"
|
||||||
|
if cache_key in self.results_cache:
|
||||||
|
return [BraidedResult(**r) for r in self.results_cache[cache_key]]
|
||||||
|
|
||||||
|
# Get base search results from dual manifold
|
||||||
|
search_results = self.dual_manifold_engine.search_dual_manifold(
|
||||||
|
query,
|
||||||
|
manifold_type,
|
||||||
|
top_k * 2, # Get more results for filtering
|
||||||
|
)
|
||||||
|
|
||||||
|
# Apply structural gates
|
||||||
|
if gate_types is None:
|
||||||
|
gate_types = list(GateType)
|
||||||
|
|
||||||
|
braided_results = []
|
||||||
|
active_gates = [g for g in self.gates.values() if g.gate_type in gate_types]
|
||||||
|
|
||||||
|
for gate in active_gates:
|
||||||
|
gate_results = await self._apply_gate(query, search_results, gate)
|
||||||
|
braided_results.extend(gate_results)
|
||||||
|
|
||||||
|
# Remove duplicates and sort by combined score
|
||||||
|
seen_ids = set()
|
||||||
|
unique_results = []
|
||||||
|
for result in braided_results:
|
||||||
|
if result.content_id not in seen_ids:
|
||||||
|
seen_ids.add(result.content_id)
|
||||||
|
unique_results.append(result)
|
||||||
|
|
||||||
|
unique_results.sort(key=lambda x: x.combined_score, reverse=True)
|
||||||
|
final_results = unique_results[:top_k]
|
||||||
|
|
||||||
|
# Cache results
|
||||||
|
self.results_cache[cache_key] = [asdict(r) for r in final_results]
|
||||||
|
self._save_persistent_data()
|
||||||
|
|
||||||
|
return final_results
|
||||||
|
|
||||||
|
async def _apply_gate(
|
||||||
|
self, query: str, search_results: Dict[str, Any], gate: StructuralGate
|
||||||
|
) -> List[BraidedResult]:
|
||||||
|
"""Apply a structural gate to search results."""
|
||||||
|
|
||||||
|
results = []
|
||||||
|
|
||||||
|
for manifold_type, manifold_results in search_results.items():
|
||||||
|
for result in manifold_results:
|
||||||
|
# Calculate alpha score (vector similarity)
|
||||||
|
alpha_score = self._calculate_alpha_score(result, gate)
|
||||||
|
|
||||||
|
# Calculate beta score (graph/structural measures)
|
||||||
|
beta_score = self._calculate_beta_score(result, gate, manifold_type)
|
||||||
|
|
||||||
|
# Apply filters
|
||||||
|
filtered_scores = self._apply_filters(
|
||||||
|
result, gate, alpha_score, beta_score
|
||||||
|
)
|
||||||
|
|
||||||
|
if filtered_scores is None:
|
||||||
|
continue # Filtered out
|
||||||
|
|
||||||
|
alpha_score, beta_score = filtered_scores
|
||||||
|
|
||||||
|
# Calculate combined score
|
||||||
|
combined_score = (
|
||||||
|
gate.alpha_weight * alpha_score + gate.beta_weight * beta_score
|
||||||
|
)
|
||||||
|
|
||||||
|
# Check threshold
|
||||||
|
if combined_score < gate.threshold:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Calculate hallucination risk
|
||||||
|
hallucination_risk = await self._assess_hallucination_risk(
|
||||||
|
query, result
|
||||||
|
)
|
||||||
|
|
||||||
|
# Calculate confidence
|
||||||
|
confidence = min(1.0, combined_score * (1 - hallucination_risk))
|
||||||
|
|
||||||
|
braided_result = BraidedResult(
|
||||||
|
content_id=result.get(
|
||||||
|
"point_id", result.get("entry_id", "unknown")
|
||||||
|
),
|
||||||
|
content=result.get("content", ""),
|
||||||
|
alpha_score=alpha_score,
|
||||||
|
beta_score=beta_score,
|
||||||
|
combined_score=combined_score,
|
||||||
|
gate_type=gate.gate_type,
|
||||||
|
confidence=confidence,
|
||||||
|
hallucination_risk=hallucination_risk,
|
||||||
|
metadata={
|
||||||
|
"gate_id": gate.id,
|
||||||
|
"manifold_type": manifold_type,
|
||||||
|
"original_result": result,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
results.append(braided_result)
|
||||||
|
|
||||||
|
return results
|
||||||
|
|
||||||
|
def _calculate_alpha_score(
|
||||||
|
self, result: Dict[str, Any], gate: StructuralGate
|
||||||
|
) -> float:
|
||||||
|
"""Calculate alpha score (vector similarity)."""
|
||||||
|
if gate.gate_type == GateType.SEMANTIC:
|
||||||
|
return result.get("similarity", 0.5)
|
||||||
|
elif gate.gate_type == GateType.TEMPORAL:
|
||||||
|
# Consider recency
|
||||||
|
timestamp_str = result.get("timestamp", "")
|
||||||
|
if timestamp_str:
|
||||||
|
try:
|
||||||
|
timestamp = datetime.fromisoformat(
|
||||||
|
timestamp_str.replace("Z", "+00:00")
|
||||||
|
)
|
||||||
|
days_old = (datetime.now() - timestamp).days
|
||||||
|
recency_score = max(
|
||||||
|
0.1, 1.0 / (1.0 + days_old / 30)
|
||||||
|
) # Decay over months
|
||||||
|
return recency_score
|
||||||
|
except (ValueError, TypeError, AttributeError):
|
||||||
|
pass
|
||||||
|
return 0.5
|
||||||
|
else:
|
||||||
|
return result.get("similarity", 0.5)
|
||||||
|
|
||||||
|
def _calculate_beta_score(
|
||||||
|
self, result: Dict[str, Any], gate: StructuralGate, manifold_type: str
|
||||||
|
) -> float:
|
||||||
|
"""Calculate beta score (graph/structural measures)."""
|
||||||
|
if gate.gate_type == GateType.STRUCTURAL:
|
||||||
|
# Use centrality measures if available
|
||||||
|
centrality = result.get("centrality", 0.5)
|
||||||
|
connectivity = result.get("connectivity", 0.5)
|
||||||
|
return (centrality + connectivity) / 2
|
||||||
|
elif gate.gate_type == GateType.CONTEXTUAL:
|
||||||
|
# Consider metadata richness and relevance
|
||||||
|
metadata_score = min(1.0, len(result.get("metadata", {})) / 10)
|
||||||
|
return metadata_score
|
||||||
|
else:
|
||||||
|
return 0.5
|
||||||
|
|
||||||
|
def _apply_filters(
|
||||||
|
self,
|
||||||
|
result: Dict[str, Any],
|
||||||
|
gate: StructuralGate,
|
||||||
|
alpha_score: float,
|
||||||
|
beta_score: float,
|
||||||
|
) -> Optional[Tuple[float, float]]:
|
||||||
|
"""Apply active filters to scores."""
|
||||||
|
|
||||||
|
for filter_name in gate.filters:
|
||||||
|
if filter_name == "hallucination_check":
|
||||||
|
# Basic hallucination check based on content coherence
|
||||||
|
content = result.get("content", "")
|
||||||
|
if len(content.split()) < 3: # Too short
|
||||||
|
return None
|
||||||
|
if content.count("?") > content.count("."): # Too many questions
|
||||||
|
alpha_score *= 0.5
|
||||||
|
|
||||||
|
elif filter_name == "relevance_filter":
|
||||||
|
# Check if content actually relates to the query structure
|
||||||
|
if not self._check_relevance(result):
|
||||||
|
return None
|
||||||
|
|
||||||
|
elif filter_name == "recency_boost":
|
||||||
|
# Already handled in alpha score calculation
|
||||||
|
pass
|
||||||
|
|
||||||
|
elif filter_name == "temporal_coherence":
|
||||||
|
# Check temporal consistency
|
||||||
|
if not self._check_temporal_coherence(result):
|
||||||
|
beta_score *= 0.7
|
||||||
|
|
||||||
|
elif filter_name == "graph_centrality":
|
||||||
|
# Boost based on graph position
|
||||||
|
centrality = result.get("centrality", 0.5)
|
||||||
|
beta_score = (beta_score + centrality) / 2
|
||||||
|
|
||||||
|
elif filter_name == "connectivity_filter":
|
||||||
|
# Filter based on connectivity
|
||||||
|
connections = result.get("connections", 0)
|
||||||
|
if connections < 1:
|
||||||
|
beta_score *= 0.8
|
||||||
|
|
||||||
|
return alpha_score, beta_score
|
||||||
|
|
||||||
|
def _check_relevance(self, result: Dict[str, Any]) -> bool:
|
||||||
|
"""Check if result is relevant."""
|
||||||
|
content = result.get("content", "").lower()
|
||||||
|
|
||||||
|
# Basic relevance checks
|
||||||
|
if len(content) < 10: # Too short
|
||||||
|
return False
|
||||||
|
|
||||||
|
if content.count("lorem ipsum") > 0: # Placeholder content
|
||||||
|
return False
|
||||||
|
|
||||||
|
return True
|
||||||
|
|
||||||
|
def _check_temporal_coherence(self, result: Dict[str, Any]) -> bool:
|
||||||
|
"""Check temporal coherence of the result."""
|
||||||
|
timestamp_str = result.get("timestamp", "")
|
||||||
|
if not timestamp_str:
|
||||||
|
return True # No timestamp to check
|
||||||
|
|
||||||
|
try:
|
||||||
|
timestamp = datetime.fromisoformat(timestamp_str.replace("Z", "+00:00"))
|
||||||
|
now = datetime.now()
|
||||||
|
# Check if timestamp is not in the future and not too far in the past
|
||||||
|
return (
|
||||||
|
timestamp <= now and (now - timestamp).days < 365 * 10
|
||||||
|
) # Within 10 years
|
||||||
|
except (ValueError, TypeError, AttributeError):
|
||||||
|
return False
|
||||||
|
|
||||||
|
async def _assess_hallucination_risk(
|
||||||
|
self, query: str, result: Dict[str, Any]
|
||||||
|
) -> float:
|
||||||
|
"""Assess hallucination risk using LLM analysis."""
|
||||||
|
|
||||||
|
content = result.get("content", "")
|
||||||
|
if len(content) < 50: # Too short for meaningful analysis
|
||||||
|
return 0.5
|
||||||
|
|
||||||
|
# Simple heuristic-based assessment (could be enhanced with LLM)
|
||||||
|
risk_factors = 0
|
||||||
|
|
||||||
|
# Check for generic/placeholder content
|
||||||
|
generic_phrases = ["lorem ipsum", "example text", "placeholder", "todo"]
|
||||||
|
for phrase in generic_phrases:
|
||||||
|
if phrase in content.lower():
|
||||||
|
risk_factors += 0.3
|
||||||
|
|
||||||
|
# Check for repetitive patterns
|
||||||
|
words = content.split()
|
||||||
|
if len(words) > 10:
|
||||||
|
unique_ratio = len(set(words)) / len(words)
|
||||||
|
if unique_ratio < 0.3: # Very repetitive
|
||||||
|
risk_factors += 0.4
|
||||||
|
|
||||||
|
# Check for nonsensical content
|
||||||
|
if content.count("!") > 5 or content.count("?") > 5: # Excessive punctuation
|
||||||
|
risk_factors += 0.2
|
||||||
|
|
||||||
|
return min(1.0, risk_factors)
|
||||||
|
|
||||||
|
def add_custom_gate(
|
||||||
|
self,
|
||||||
|
gate_id: str,
|
||||||
|
gate_type: GateType,
|
||||||
|
threshold: float,
|
||||||
|
alpha_weight: float,
|
||||||
|
beta_weight: float,
|
||||||
|
filters: List[str],
|
||||||
|
metadata: Optional[Dict[str, Any]] = None,
|
||||||
|
) -> str:
|
||||||
|
"""Add a custom structural gate."""
|
||||||
|
|
||||||
|
gate = StructuralGate(
|
||||||
|
id=gate_id,
|
||||||
|
gate_type=gate_type,
|
||||||
|
threshold=threshold,
|
||||||
|
alpha_weight=alpha_weight,
|
||||||
|
beta_weight=beta_weight,
|
||||||
|
filters=filters,
|
||||||
|
created_at=datetime.now(),
|
||||||
|
metadata=metadata or {},
|
||||||
|
)
|
||||||
|
|
||||||
|
self.gates[gate_id] = gate
|
||||||
|
self._save_persistent_data()
|
||||||
|
|
||||||
|
logger.info(f"Added custom gate: {gate_id}")
|
||||||
|
return gate_id
|
||||||
|
|
||||||
|
def get_gate_stats(self) -> Dict[str, Any]:
|
||||||
|
"""Get statistics about structural gates."""
|
||||||
|
gate_types = {}
|
||||||
|
for gate in self.gates.values():
|
||||||
|
gate_type = gate.gate_type.value
|
||||||
|
if gate_type not in gate_types:
|
||||||
|
gate_types[gate_type] = 0
|
||||||
|
gate_types[gate_type] += 1
|
||||||
|
|
||||||
|
return {
|
||||||
|
"total_gates": len(self.gates),
|
||||||
|
"gate_types": gate_types,
|
||||||
|
"cache_size": len(self.results_cache),
|
||||||
|
"data_directory": str(self.data_dir),
|
||||||
|
}
|
||||||
|
|
||||||
|
def clear_cache(self):
|
||||||
|
"""Clear the braiding results cache."""
|
||||||
|
self.results_cache = {}
|
||||||
|
self._save_persistent_data()
|
||||||
|
logger.info("Cleared braiding cache")
|
||||||
Binary file not shown.
269
src/think_bigger/core/memory/episodic/episodic_memory.py
Normal file
269
src/think_bigger/core/memory/episodic/episodic_memory.py
Normal file
@ -0,0 +1,269 @@
|
|||||||
|
"""Episodic memory layer with hybrid indexing (dense vectors + BM25)."""
|
||||||
|
|
||||||
|
import json
|
||||||
|
import pickle
|
||||||
|
from datetime import datetime
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Dict, List, Optional, Any
|
||||||
|
from dataclasses import dataclass, asdict
|
||||||
|
|
||||||
|
import faiss
|
||||||
|
import numpy as np
|
||||||
|
from rank_bm25 import BM25Okapi
|
||||||
|
from sentence_transformers import SentenceTransformer
|
||||||
|
from loguru import logger
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class MemoryEntry:
|
||||||
|
"""Represents a single episodic memory entry."""
|
||||||
|
|
||||||
|
id: str
|
||||||
|
content: str
|
||||||
|
timestamp: datetime
|
||||||
|
source_file: str
|
||||||
|
metadata: Dict[str, Any]
|
||||||
|
embedding: Optional[np.ndarray] = None
|
||||||
|
tokenized_content: Optional[List[str]] = None
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class SearchResult:
|
||||||
|
"""Search result with hybrid scoring."""
|
||||||
|
|
||||||
|
entry_id: str
|
||||||
|
content: str
|
||||||
|
source_file: str
|
||||||
|
timestamp: datetime
|
||||||
|
vector_score: float
|
||||||
|
bm25_score: float
|
||||||
|
combined_score: float
|
||||||
|
metadata: Dict[str, Any]
|
||||||
|
|
||||||
|
|
||||||
|
class EpisodicMemory:
|
||||||
|
"""Episodic memory layer with hybrid FAISS + BM25 indexing."""
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
data_dir: str,
|
||||||
|
embedding_model: str = "sentence-transformers/all-MiniLM-L6-v2",
|
||||||
|
):
|
||||||
|
self.data_dir = Path(data_dir)
|
||||||
|
self.data_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
# Initialize embedding model
|
||||||
|
self.embedding_model = SentenceTransformer(embedding_model)
|
||||||
|
self.embedding_dim = self.embedding_model.get_sentence_embedding_dimension()
|
||||||
|
|
||||||
|
# Initialize FAISS index
|
||||||
|
self.vector_index = faiss.IndexFlatIP(
|
||||||
|
self.embedding_dim
|
||||||
|
) # Inner product for cosine similarity
|
||||||
|
|
||||||
|
# BM25 components
|
||||||
|
self.bm25_index: Optional[BM25Okapi] = None
|
||||||
|
self.corpus: List[List[str]] = []
|
||||||
|
|
||||||
|
# Memory storage
|
||||||
|
self.entries: Dict[str, MemoryEntry] = {}
|
||||||
|
self.id_to_idx: Dict[str, int] = {} # Maps entry ID to FAISS index
|
||||||
|
|
||||||
|
# Persistence paths
|
||||||
|
self.entries_path = self.data_dir / "episodic_entries.json"
|
||||||
|
self.vector_index_path = self.data_dir / "vector_index.faiss"
|
||||||
|
self.bm25_corpus_path = self.data_dir / "bm25_corpus.pkl"
|
||||||
|
|
||||||
|
# Load existing data
|
||||||
|
self._load_persistent_data()
|
||||||
|
|
||||||
|
def _load_persistent_data(self):
|
||||||
|
"""Load persisted memory data."""
|
||||||
|
try:
|
||||||
|
# Load entries
|
||||||
|
if self.entries_path.exists():
|
||||||
|
with open(self.entries_path, "r") as f:
|
||||||
|
entries_data = json.load(f)
|
||||||
|
for entry_dict in entries_data:
|
||||||
|
entry = MemoryEntry(**entry_dict)
|
||||||
|
entry.timestamp = datetime.fromisoformat(
|
||||||
|
entry_dict["timestamp"]
|
||||||
|
)
|
||||||
|
if entry.embedding is not None:
|
||||||
|
entry.embedding = np.array(entry.embedding)
|
||||||
|
self.entries[entry.id] = entry
|
||||||
|
|
||||||
|
# Load FAISS index
|
||||||
|
if self.vector_index_path.exists():
|
||||||
|
self.vector_index = faiss.read_index(str(self.vector_index_path))
|
||||||
|
|
||||||
|
# Load BM25 corpus
|
||||||
|
if self.bm25_corpus_path.exists():
|
||||||
|
with open(self.bm25_corpus_path, "rb") as f:
|
||||||
|
self.corpus = pickle.load(f)
|
||||||
|
self.bm25_index = BM25Okapi(self.corpus)
|
||||||
|
|
||||||
|
# Rebuild ID to index mapping
|
||||||
|
self.id_to_idx = {
|
||||||
|
entry_id: idx for idx, entry_id in enumerate(self.entries.keys())
|
||||||
|
}
|
||||||
|
|
||||||
|
logger.info(f"Loaded {len(self.entries)} episodic memory entries")
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error loading persistent data: {e}")
|
||||||
|
# Reset to empty state
|
||||||
|
self.entries = {}
|
||||||
|
self.id_to_idx = {}
|
||||||
|
self.vector_index = faiss.IndexFlatIP(self.embedding_dim)
|
||||||
|
|
||||||
|
def _save_persistent_data(self):
|
||||||
|
"""Save memory data to disk."""
|
||||||
|
try:
|
||||||
|
# Save entries
|
||||||
|
entries_data = []
|
||||||
|
for entry in self.entries.values():
|
||||||
|
entry_dict = asdict(entry)
|
||||||
|
entry_dict["timestamp"] = entry.timestamp.isoformat()
|
||||||
|
if entry.embedding is not None:
|
||||||
|
entry_dict["embedding"] = entry.embedding.tolist()
|
||||||
|
entries_data.append(entry_dict)
|
||||||
|
|
||||||
|
with open(self.entries_path, "w") as f:
|
||||||
|
json.dump(entries_data, f, indent=2)
|
||||||
|
|
||||||
|
# Save FAISS index
|
||||||
|
faiss.write_index(self.vector_index, str(self.vector_index_path))
|
||||||
|
|
||||||
|
# Save BM25 corpus
|
||||||
|
with open(self.bm25_corpus_path, "wb") as f:
|
||||||
|
pickle.dump(self.corpus, f)
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error saving persistent data: {e}")
|
||||||
|
|
||||||
|
def add_entry(
|
||||||
|
self, content: str, source_file: str, metadata: Optional[Dict[str, Any]] = None
|
||||||
|
) -> str:
|
||||||
|
"""Add a new memory entry."""
|
||||||
|
entry_id = f"{source_file}_{datetime.now().isoformat()}"
|
||||||
|
|
||||||
|
# Create embedding
|
||||||
|
embedding = self.embedding_model.encode(content, convert_to_numpy=True)
|
||||||
|
embedding = embedding / np.linalg.norm(
|
||||||
|
embedding
|
||||||
|
) # Normalize for cosine similarity
|
||||||
|
|
||||||
|
# Tokenize for BM25
|
||||||
|
tokenized = content.lower().split() # Simple tokenization
|
||||||
|
|
||||||
|
# Create entry
|
||||||
|
entry = MemoryEntry(
|
||||||
|
id=entry_id,
|
||||||
|
content=content,
|
||||||
|
timestamp=datetime.now(),
|
||||||
|
source_file=source_file,
|
||||||
|
metadata=metadata or {},
|
||||||
|
embedding=embedding,
|
||||||
|
tokenized_content=tokenized,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Add to storage
|
||||||
|
self.entries[entry_id] = entry
|
||||||
|
self.id_to_idx[entry_id] = len(self.entries) - 1
|
||||||
|
|
||||||
|
# Update vector index
|
||||||
|
self.vector_index.add(embedding.reshape(1, -1))
|
||||||
|
|
||||||
|
# Update BM25 index
|
||||||
|
self.corpus.append(tokenized)
|
||||||
|
self.bm25_index = BM25Okapi(self.corpus)
|
||||||
|
|
||||||
|
# Persist changes
|
||||||
|
self._save_persistent_data()
|
||||||
|
|
||||||
|
logger.info(f"Added episodic memory entry: {entry_id}")
|
||||||
|
return entry_id
|
||||||
|
|
||||||
|
def hybrid_search(
|
||||||
|
self, query: str, top_k: int = 10, alpha: float = 0.5
|
||||||
|
) -> List[SearchResult]:
|
||||||
|
"""Perform hybrid search combining vector and BM25 scores."""
|
||||||
|
if not self.entries:
|
||||||
|
return []
|
||||||
|
|
||||||
|
# Vector search
|
||||||
|
query_embedding = self.embedding_model.encode(query, convert_to_numpy=True)
|
||||||
|
query_embedding = query_embedding / np.linalg.norm(query_embedding)
|
||||||
|
|
||||||
|
vector_scores, vector_indices = self.vector_index.search(
|
||||||
|
query_embedding.reshape(1, -1), min(top_k * 2, len(self.entries))
|
||||||
|
)
|
||||||
|
|
||||||
|
# BM25 search
|
||||||
|
query_tokens = query.lower().split()
|
||||||
|
bm25_scores = (
|
||||||
|
self.bm25_index.get_scores(query_tokens)
|
||||||
|
if self.bm25_index
|
||||||
|
else np.zeros(len(self.entries))
|
||||||
|
)
|
||||||
|
|
||||||
|
# Combine results
|
||||||
|
results = []
|
||||||
|
entry_ids = list(self.entries.keys())
|
||||||
|
|
||||||
|
for idx, (vector_score, bm25_score) in enumerate(
|
||||||
|
zip(vector_scores[0], bm25_scores)
|
||||||
|
):
|
||||||
|
if idx >= len(entry_ids):
|
||||||
|
break
|
||||||
|
|
||||||
|
entry_id = entry_ids[idx]
|
||||||
|
entry = self.entries[entry_id]
|
||||||
|
|
||||||
|
# Normalize scores (vector scores are already in [0,1] for cosine, BM25 can be > 1)
|
||||||
|
normalized_bm25 = (
|
||||||
|
bm25_score / max(bm25_scores) if max(bm25_scores) > 0 else 0
|
||||||
|
)
|
||||||
|
|
||||||
|
# Combined score with alpha weighting
|
||||||
|
combined_score = alpha * vector_score + (1 - alpha) * normalized_bm25
|
||||||
|
|
||||||
|
results.append(
|
||||||
|
SearchResult(
|
||||||
|
entry_id=entry_id,
|
||||||
|
content=entry.content,
|
||||||
|
source_file=entry.source_file,
|
||||||
|
timestamp=entry.timestamp,
|
||||||
|
vector_score=float(vector_score),
|
||||||
|
bm25_score=float(normalized_bm25),
|
||||||
|
combined_score=float(combined_score),
|
||||||
|
metadata=entry.metadata,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
# Sort by combined score and return top_k
|
||||||
|
results.sort(key=lambda x: x.combined_score, reverse=True)
|
||||||
|
return results[:top_k]
|
||||||
|
|
||||||
|
def get_recent_entries(self, limit: int = 50) -> List[MemoryEntry]:
|
||||||
|
"""Get most recent memory entries."""
|
||||||
|
sorted_entries = sorted(
|
||||||
|
self.entries.values(), key=lambda x: x.timestamp, reverse=True
|
||||||
|
)
|
||||||
|
return sorted_entries[:limit]
|
||||||
|
|
||||||
|
def get_entry_by_id(self, entry_id: str) -> Optional[MemoryEntry]:
|
||||||
|
"""Get a specific memory entry by ID."""
|
||||||
|
return self.entries.get(entry_id)
|
||||||
|
|
||||||
|
def get_stats(self) -> Dict[str, Any]:
|
||||||
|
"""Get memory statistics."""
|
||||||
|
return {
|
||||||
|
"total_entries": len(self.entries),
|
||||||
|
"vector_dimension": self.embedding_dim,
|
||||||
|
"data_directory": str(self.data_dir),
|
||||||
|
"last_updated": max(
|
||||||
|
(e.timestamp for e in self.entries.values()), default=None
|
||||||
|
),
|
||||||
|
}
|
||||||
Binary file not shown.
607
src/think_bigger/core/memory/manifolds/dual_manifold.py
Normal file
607
src/think_bigger/core/memory/manifolds/dual_manifold.py
Normal file
@ -0,0 +1,607 @@
|
|||||||
|
"""Dual manifold construction for individual and collective knowledge representation."""
|
||||||
|
|
||||||
|
import json
|
||||||
|
from datetime import datetime
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Dict, List, Optional, Any, Tuple
|
||||||
|
from dataclasses import dataclass, asdict
|
||||||
|
from collections import defaultdict
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
import networkx as nx
|
||||||
|
from loguru import logger
|
||||||
|
|
||||||
|
from think_bigger.core.memory.episodic.episodic_memory import MemoryEntry
|
||||||
|
from think_bigger.core.memory.semantic.semantic_distiller import (
|
||||||
|
SemanticConcept,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class ManifoldPoint:
|
||||||
|
"""Represents a point in the knowledge manifold."""
|
||||||
|
|
||||||
|
id: str
|
||||||
|
coordinates: np.ndarray # High-dimensional coordinates
|
||||||
|
content: str
|
||||||
|
metadata: Dict[str, Any]
|
||||||
|
created_at: datetime
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class ManifoldConnection:
|
||||||
|
"""Represents a connection between manifold points."""
|
||||||
|
|
||||||
|
source_id: str
|
||||||
|
target_id: str
|
||||||
|
strength: float
|
||||||
|
connection_type: str # 'semantic', 'temporal', 'structural'
|
||||||
|
metadata: Dict[str, Any]
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class GravityWell:
|
||||||
|
"""Represents a gravity well in the manifold."""
|
||||||
|
|
||||||
|
center_point_id: str
|
||||||
|
radius: float
|
||||||
|
mass: float
|
||||||
|
influenced_points: List[str]
|
||||||
|
well_type: str # 'individual', 'collective'
|
||||||
|
created_at: datetime
|
||||||
|
|
||||||
|
|
||||||
|
class IndividualManifold:
|
||||||
|
"""Individual knowledge manifold for personal knowledge representation."""
|
||||||
|
|
||||||
|
def __init__(self, data_dir: str, embedding_model):
|
||||||
|
self.data_dir = Path(data_dir) / "individual"
|
||||||
|
self.data_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
self.embedding_model = embedding_model
|
||||||
|
|
||||||
|
# Manifold components
|
||||||
|
self.points: Dict[str, ManifoldPoint] = {}
|
||||||
|
self.connections: List[ManifoldConnection] = []
|
||||||
|
self.gravity_wells: Dict[str, GravityWell] = {}
|
||||||
|
|
||||||
|
# NetworkX graph for manifold structure
|
||||||
|
self.manifold_graph: nx.Graph = nx.Graph()
|
||||||
|
|
||||||
|
# Persistence paths
|
||||||
|
self.points_path = self.data_dir / "manifold_points.json"
|
||||||
|
self.connections_path = self.data_dir / "manifold_connections.json"
|
||||||
|
self.wells_path = self.data_dir / "gravity_wells.json"
|
||||||
|
|
||||||
|
# Load existing data
|
||||||
|
self._load_persistent_data()
|
||||||
|
|
||||||
|
def _load_persistent_data(self):
|
||||||
|
"""Load persisted manifold data."""
|
||||||
|
try:
|
||||||
|
# Load points
|
||||||
|
if self.points_path.exists():
|
||||||
|
with open(self.points_path, "r") as f:
|
||||||
|
points_data = json.load(f)
|
||||||
|
for point_dict in points_data:
|
||||||
|
point = ManifoldPoint(**point_dict)
|
||||||
|
point.coordinates = np.array(point.coordinates)
|
||||||
|
point.created_at = datetime.fromisoformat(
|
||||||
|
point_dict["created_at"]
|
||||||
|
)
|
||||||
|
self.points[point.id] = point
|
||||||
|
self.manifold_graph.add_node(point.id, **asdict(point))
|
||||||
|
|
||||||
|
# Load connections
|
||||||
|
if self.connections_path.exists():
|
||||||
|
with open(self.connections_path, "r") as f:
|
||||||
|
connections_data = json.load(f)
|
||||||
|
for conn_dict in connections_data:
|
||||||
|
connection = ManifoldConnection(**conn_dict)
|
||||||
|
self.connections.append(connection)
|
||||||
|
self.manifold_graph.add_edge(
|
||||||
|
connection.source_id,
|
||||||
|
connection.target_id,
|
||||||
|
weight=connection.strength,
|
||||||
|
type=connection.connection_type,
|
||||||
|
**connection.metadata,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Load gravity wells
|
||||||
|
if self.wells_path.exists():
|
||||||
|
with open(self.wells_path, "r") as f:
|
||||||
|
wells_data = json.load(f)
|
||||||
|
for well_dict in wells_data:
|
||||||
|
well = GravityWell(**well_dict)
|
||||||
|
well.created_at = datetime.fromisoformat(
|
||||||
|
well_dict["created_at"]
|
||||||
|
)
|
||||||
|
self.gravity_wells[well.center_point_id] = well
|
||||||
|
|
||||||
|
logger.info(
|
||||||
|
f"Loaded individual manifold with {len(self.points)} points, {len(self.connections)} connections"
|
||||||
|
)
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error loading manifold data: {e}")
|
||||||
|
|
||||||
|
def _save_persistent_data(self):
|
||||||
|
"""Save manifold data to disk."""
|
||||||
|
try:
|
||||||
|
# Save points
|
||||||
|
points_data = []
|
||||||
|
for point in self.points.values():
|
||||||
|
point_dict = asdict(point)
|
||||||
|
point_dict["coordinates"] = point.coordinates.tolist()
|
||||||
|
point_dict["created_at"] = point.created_at.isoformat()
|
||||||
|
points_data.append(point_dict)
|
||||||
|
|
||||||
|
with open(self.points_path, "w") as f:
|
||||||
|
json.dump(points_data, f, indent=2)
|
||||||
|
|
||||||
|
# Save connections
|
||||||
|
connections_data = [asdict(conn) for conn in self.connections]
|
||||||
|
with open(self.connections_path, "w") as f:
|
||||||
|
json.dump(connections_data, f, indent=2)
|
||||||
|
|
||||||
|
# Save gravity wells
|
||||||
|
wells_data = []
|
||||||
|
for well in self.gravity_wells.values():
|
||||||
|
well_dict = asdict(well)
|
||||||
|
well_dict["created_at"] = well.created_at.isoformat()
|
||||||
|
wells_data.append(well_dict)
|
||||||
|
|
||||||
|
with open(self.wells_path, "w") as f:
|
||||||
|
json.dump(wells_data, f, indent=2)
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error saving manifold data: {e}")
|
||||||
|
|
||||||
|
def add_memory_entry(self, entry: MemoryEntry) -> str:
|
||||||
|
"""Add a memory entry as a point in the manifold."""
|
||||||
|
# Create embedding for the entry
|
||||||
|
coordinates = self.embedding_model.encode(entry.content, convert_to_numpy=True)
|
||||||
|
|
||||||
|
point_id = f"point_{entry.id}"
|
||||||
|
|
||||||
|
point = ManifoldPoint(
|
||||||
|
id=point_id,
|
||||||
|
coordinates=coordinates,
|
||||||
|
content=entry.content,
|
||||||
|
metadata={
|
||||||
|
"entry_id": entry.id,
|
||||||
|
"source_file": entry.source_file,
|
||||||
|
"timestamp": entry.timestamp.isoformat(),
|
||||||
|
"type": "memory_entry",
|
||||||
|
},
|
||||||
|
created_at=datetime.now(),
|
||||||
|
)
|
||||||
|
|
||||||
|
self.points[point_id] = point
|
||||||
|
self.manifold_graph.add_node(point_id, **asdict(point))
|
||||||
|
|
||||||
|
self._save_persistent_data()
|
||||||
|
return point_id
|
||||||
|
|
||||||
|
def add_semantic_concept(self, concept: SemanticConcept) -> str:
|
||||||
|
"""Add a semantic concept as a point in the manifold."""
|
||||||
|
# Create embedding for the concept
|
||||||
|
coordinates = self.embedding_model.encode(
|
||||||
|
f"{concept.name}: {concept.description}", convert_to_numpy=True
|
||||||
|
)
|
||||||
|
|
||||||
|
point_id = f"concept_{concept.id}"
|
||||||
|
|
||||||
|
point = ManifoldPoint(
|
||||||
|
id=point_id,
|
||||||
|
coordinates=coordinates,
|
||||||
|
content=f"{concept.name}: {concept.description}",
|
||||||
|
metadata={
|
||||||
|
"concept_id": concept.id,
|
||||||
|
"confidence": concept.confidence,
|
||||||
|
"related_entries": concept.related_entries,
|
||||||
|
"type": "semantic_concept",
|
||||||
|
},
|
||||||
|
created_at=datetime.now(),
|
||||||
|
)
|
||||||
|
|
||||||
|
self.points[point_id] = point
|
||||||
|
self.manifold_graph.add_node(point_id, **asdict(point))
|
||||||
|
|
||||||
|
self._save_persistent_data()
|
||||||
|
return point_id
|
||||||
|
|
||||||
|
def create_connections(
|
||||||
|
self, entries: List[MemoryEntry], concepts: List[SemanticConcept]
|
||||||
|
) -> None:
|
||||||
|
"""Create connections between points in the manifold."""
|
||||||
|
# Connect entries to their related concepts
|
||||||
|
for concept in concepts:
|
||||||
|
concept_point_id = f"concept_{concept.id}"
|
||||||
|
if concept_point_id not in self.points:
|
||||||
|
continue
|
||||||
|
|
||||||
|
for entry_id in concept.related_entries:
|
||||||
|
entry_point_id = f"point_{entry_id}"
|
||||||
|
if entry_point_id in self.points:
|
||||||
|
# Calculate semantic similarity
|
||||||
|
concept_coords = self.points[concept_point_id].coordinates
|
||||||
|
entry_coords = self.points[entry_point_id].coordinates
|
||||||
|
|
||||||
|
similarity = np.dot(concept_coords, entry_coords) / (
|
||||||
|
np.linalg.norm(concept_coords) * np.linalg.norm(entry_coords)
|
||||||
|
)
|
||||||
|
|
||||||
|
connection = ManifoldConnection(
|
||||||
|
source_id=concept_point_id,
|
||||||
|
target_id=entry_point_id,
|
||||||
|
strength=float(similarity),
|
||||||
|
connection_type="semantic",
|
||||||
|
metadata={"confidence": concept.confidence},
|
||||||
|
)
|
||||||
|
|
||||||
|
self.connections.append(connection)
|
||||||
|
self.manifold_graph.add_edge(
|
||||||
|
concept_point_id,
|
||||||
|
entry_point_id,
|
||||||
|
weight=connection.strength,
|
||||||
|
type=connection.connection_type,
|
||||||
|
**connection.metadata,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Create temporal connections between entries
|
||||||
|
sorted_entries = sorted(entries, key=lambda x: x.timestamp)
|
||||||
|
for i in range(len(sorted_entries) - 1):
|
||||||
|
current_entry = sorted_entries[i]
|
||||||
|
next_entry = sorted_entries[i + 1]
|
||||||
|
|
||||||
|
current_point_id = f"point_{current_entry.id}"
|
||||||
|
next_point_id = f"point_{next_entry.id}"
|
||||||
|
|
||||||
|
if current_point_id in self.points and next_point_id in self.points:
|
||||||
|
# Temporal proximity weight (inverse of time difference)
|
||||||
|
time_diff = (
|
||||||
|
next_entry.timestamp - current_entry.timestamp
|
||||||
|
).total_seconds()
|
||||||
|
temporal_weight = max(
|
||||||
|
0.1, 1.0 / (1.0 + time_diff / 86400)
|
||||||
|
) # Decay over days
|
||||||
|
|
||||||
|
connection = ManifoldConnection(
|
||||||
|
source_id=current_point_id,
|
||||||
|
target_id=next_point_id,
|
||||||
|
strength=temporal_weight,
|
||||||
|
connection_type="temporal",
|
||||||
|
metadata={"time_diff_seconds": time_diff},
|
||||||
|
)
|
||||||
|
|
||||||
|
self.connections.append(connection)
|
||||||
|
self.manifold_graph.add_edge(
|
||||||
|
current_point_id,
|
||||||
|
next_point_id,
|
||||||
|
weight=connection.strength,
|
||||||
|
type=connection.connection_type,
|
||||||
|
**connection.metadata,
|
||||||
|
)
|
||||||
|
|
||||||
|
self._save_persistent_data()
|
||||||
|
|
||||||
|
def compute_gravity_wells(self) -> List[GravityWell]:
|
||||||
|
"""Compute gravity wells based on manifold structure."""
|
||||||
|
if not self.points:
|
||||||
|
return []
|
||||||
|
|
||||||
|
# Use graph centrality to identify potential centers
|
||||||
|
try:
|
||||||
|
centrality = nx.eigenvector_centrality_numpy(
|
||||||
|
self.manifold_graph, weight="weight"
|
||||||
|
)
|
||||||
|
except (nx.PowerIterationFailedConvergence, ValueError):
|
||||||
|
centrality = nx.degree_centrality(self.manifold_graph)
|
||||||
|
|
||||||
|
# Sort points by centrality
|
||||||
|
sorted_points = sorted(centrality.items(), key=lambda x: x[1], reverse=True)
|
||||||
|
|
||||||
|
gravity_wells = []
|
||||||
|
for point_id, centrality_score in sorted_points[:5]: # Top 5 centers
|
||||||
|
if centrality_score > 0.1: # Minimum threshold
|
||||||
|
# Find influenced points within radius
|
||||||
|
influenced_points = self._find_nearby_points(point_id, radius=0.8)
|
||||||
|
|
||||||
|
# Calculate mass based on centrality and connections
|
||||||
|
mass = centrality_score * (1 + len(influenced_points) * 0.1)
|
||||||
|
|
||||||
|
well = GravityWell(
|
||||||
|
center_point_id=point_id,
|
||||||
|
radius=0.8,
|
||||||
|
mass=mass,
|
||||||
|
influenced_points=influenced_points,
|
||||||
|
well_type="individual",
|
||||||
|
created_at=datetime.now(),
|
||||||
|
)
|
||||||
|
|
||||||
|
gravity_wells.append(well)
|
||||||
|
self.gravity_wells[point_id] = well
|
||||||
|
|
||||||
|
self._save_persistent_data()
|
||||||
|
return gravity_wells
|
||||||
|
|
||||||
|
def _find_nearby_points(self, center_point_id: str, radius: float) -> List[str]:
|
||||||
|
"""Find points within a certain distance in the manifold."""
|
||||||
|
if center_point_id not in self.points:
|
||||||
|
return []
|
||||||
|
|
||||||
|
center_coords = self.points[center_point_id].coordinates
|
||||||
|
nearby_points = []
|
||||||
|
|
||||||
|
for point_id, point in self.points.items():
|
||||||
|
if point_id == center_point_id:
|
||||||
|
continue
|
||||||
|
|
||||||
|
distance = np.linalg.norm(point.coordinates - center_coords)
|
||||||
|
if distance <= radius:
|
||||||
|
nearby_points.append(point_id)
|
||||||
|
|
||||||
|
return nearby_points
|
||||||
|
|
||||||
|
def manifold_search(self, query: str, top_k: int = 10) -> List[Tuple[str, float]]:
|
||||||
|
"""Search the manifold for similar content."""
|
||||||
|
if not self.points:
|
||||||
|
return []
|
||||||
|
|
||||||
|
query_coords = self.embedding_model.encode(query, convert_to_numpy=True)
|
||||||
|
similarities = []
|
||||||
|
|
||||||
|
for point_id, point in self.points.items():
|
||||||
|
similarity = np.dot(query_coords, point.coordinates) / (
|
||||||
|
np.linalg.norm(query_coords) * np.linalg.norm(point.coordinates)
|
||||||
|
)
|
||||||
|
similarities.append((point_id, float(similarity)))
|
||||||
|
|
||||||
|
# Sort by similarity
|
||||||
|
similarities.sort(key=lambda x: x[1], reverse=True)
|
||||||
|
return similarities[:top_k]
|
||||||
|
|
||||||
|
def get_stats(self) -> Dict[str, Any]:
|
||||||
|
"""Get manifold statistics."""
|
||||||
|
return {
|
||||||
|
"total_points": len(self.points),
|
||||||
|
"total_connections": len(self.connections),
|
||||||
|
"total_gravity_wells": len(self.gravity_wells),
|
||||||
|
"connection_types": self._count_connection_types(),
|
||||||
|
"point_types": self._count_point_types(),
|
||||||
|
"is_connected": nx.is_connected(self.manifold_graph)
|
||||||
|
if self.points
|
||||||
|
else False,
|
||||||
|
"average_degree": 0.0, # Placeholder - would need proper calculation
|
||||||
|
}
|
||||||
|
|
||||||
|
def _count_connection_types(self) -> Dict[str, int]:
|
||||||
|
"""Count connections by type."""
|
||||||
|
type_counts = defaultdict(int)
|
||||||
|
for conn in self.connections:
|
||||||
|
type_counts[conn.connection_type] += 1
|
||||||
|
return dict(type_counts)
|
||||||
|
|
||||||
|
def _count_point_types(self) -> Dict[str, int]:
|
||||||
|
"""Count points by type."""
|
||||||
|
type_counts = defaultdict(int)
|
||||||
|
for point in self.points.values():
|
||||||
|
point_type = point.metadata.get("type", "unknown")
|
||||||
|
type_counts[point_type] += 1
|
||||||
|
return dict(type_counts)
|
||||||
|
|
||||||
|
|
||||||
|
class CollectiveManifold:
|
||||||
|
"""Collective knowledge manifold for shared knowledge representation."""
|
||||||
|
|
||||||
|
def __init__(self, data_dir: str, openalex_api_key: Optional[str] = None):
|
||||||
|
self.data_dir = Path(data_dir) / "collective"
|
||||||
|
self.data_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
self.openalex_api_key = openalex_api_key
|
||||||
|
|
||||||
|
# Collective manifold components
|
||||||
|
self.points: Dict[str, ManifoldPoint] = {}
|
||||||
|
self.connections: List[ManifoldConnection] = []
|
||||||
|
self.gravity_wells: Dict[str, GravityWell] = {}
|
||||||
|
|
||||||
|
# NetworkX graph for collective structure
|
||||||
|
self.manifold_graph: nx.Graph = nx.Graph()
|
||||||
|
|
||||||
|
# External knowledge sources
|
||||||
|
self.external_sources = {
|
||||||
|
"openalex": self._integrate_openalex,
|
||||||
|
"wikipedia": self._integrate_wikipedia,
|
||||||
|
"arxiv": self._integrate_arxiv,
|
||||||
|
}
|
||||||
|
|
||||||
|
def integrate_external_knowledge(
|
||||||
|
self, topic: str, source: str = "openalex"
|
||||||
|
) -> List[str]:
|
||||||
|
"""Integrate knowledge from external sources into the collective manifold."""
|
||||||
|
if source not in self.external_sources:
|
||||||
|
logger.warning(f"Unknown external source: {source}")
|
||||||
|
return []
|
||||||
|
|
||||||
|
integrator = self.external_sources[source]
|
||||||
|
return integrator(topic)
|
||||||
|
|
||||||
|
def _integrate_openalex(self, topic: str) -> List[str]:
|
||||||
|
"""Integrate knowledge from OpenAlex API."""
|
||||||
|
# Placeholder for OpenAlex integration
|
||||||
|
# This would make API calls to OpenAlex to get research papers, concepts, etc.
|
||||||
|
logger.info(f"Integrating OpenAlex knowledge for topic: {topic}")
|
||||||
|
|
||||||
|
# For now, return empty list - would need actual API implementation
|
||||||
|
return []
|
||||||
|
|
||||||
|
def _integrate_wikipedia(self, topic: str) -> List[str]:
|
||||||
|
"""Integrate knowledge from Wikipedia."""
|
||||||
|
# Placeholder for Wikipedia integration
|
||||||
|
logger.info(f"Integrating Wikipedia knowledge for topic: {topic}")
|
||||||
|
return []
|
||||||
|
|
||||||
|
def _integrate_arxiv(self, topic: str) -> List[str]:
|
||||||
|
"""Integrate knowledge from arXiv."""
|
||||||
|
# Placeholder for arXiv integration
|
||||||
|
logger.info(f"Integrating arXiv knowledge for topic: {topic}")
|
||||||
|
return []
|
||||||
|
|
||||||
|
def merge_individual_manifold(
|
||||||
|
self, individual_manifold: IndividualManifold, merge_threshold: float = 0.8
|
||||||
|
) -> None:
|
||||||
|
"""Merge an individual manifold into the collective manifold."""
|
||||||
|
# Add points that don't exist or are significantly different
|
||||||
|
for point_id, point in individual_manifold.points.items():
|
||||||
|
if point_id not in self.points:
|
||||||
|
self.points[point_id] = point
|
||||||
|
self.manifold_graph.add_node(point_id, **asdict(point))
|
||||||
|
else:
|
||||||
|
# Check if points are similar enough to merge
|
||||||
|
existing_coords = self.points[point_id].coordinates
|
||||||
|
similarity = np.dot(point.coordinates, existing_coords) / (
|
||||||
|
np.linalg.norm(point.coordinates) * np.linalg.norm(existing_coords)
|
||||||
|
)
|
||||||
|
|
||||||
|
if similarity < merge_threshold:
|
||||||
|
# Create new point with combined information
|
||||||
|
combined_coords = (point.coordinates + existing_coords) / 2
|
||||||
|
combined_point = ManifoldPoint(
|
||||||
|
id=f"{point_id}_merged",
|
||||||
|
coordinates=combined_coords,
|
||||||
|
content=f"{point.content} | {self.points[point_id].content}",
|
||||||
|
metadata={**point.metadata, **self.points[point_id].metadata},
|
||||||
|
created_at=datetime.now(),
|
||||||
|
)
|
||||||
|
self.points[f"{point_id}_merged"] = combined_point
|
||||||
|
self.manifold_graph.add_node(
|
||||||
|
f"{point_id}_merged", **asdict(combined_point)
|
||||||
|
)
|
||||||
|
|
||||||
|
# Add connections
|
||||||
|
for connection in individual_manifold.connections:
|
||||||
|
if (
|
||||||
|
connection.source_id in self.points
|
||||||
|
and connection.target_id in self.points
|
||||||
|
):
|
||||||
|
self.connections.append(connection)
|
||||||
|
self.manifold_graph.add_edge(
|
||||||
|
connection.source_id,
|
||||||
|
connection.target_id,
|
||||||
|
weight=connection.strength,
|
||||||
|
type=connection.connection_type,
|
||||||
|
**connection.metadata,
|
||||||
|
)
|
||||||
|
|
||||||
|
def get_stats(self) -> Dict[str, Any]:
|
||||||
|
"""Get collective manifold statistics."""
|
||||||
|
return {
|
||||||
|
"total_points": len(self.points),
|
||||||
|
"total_connections": len(self.connections),
|
||||||
|
"total_gravity_wells": len(self.gravity_wells),
|
||||||
|
"external_sources": list(self.external_sources.keys()),
|
||||||
|
"is_connected": nx.is_connected(self.manifold_graph)
|
||||||
|
if self.points
|
||||||
|
else False,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
class DualManifoldEngine:
|
||||||
|
"""Engine for managing dual manifold construction and operations."""
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
data_dir: str,
|
||||||
|
embedding_model,
|
||||||
|
openai_api_key: str,
|
||||||
|
openalex_api_key: Optional[str] = None,
|
||||||
|
):
|
||||||
|
self.data_dir = Path(data_dir)
|
||||||
|
self.data_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
self.individual_manifold = IndividualManifold(
|
||||||
|
str(self.data_dir), embedding_model
|
||||||
|
)
|
||||||
|
self.collective_manifold = CollectiveManifold(
|
||||||
|
str(self.data_dir), openalex_api_key
|
||||||
|
)
|
||||||
|
|
||||||
|
# Semantic distiller for concept extraction
|
||||||
|
from think_bigger.core.memory.semantic.semantic_distiller import (
|
||||||
|
SemanticDistiller,
|
||||||
|
)
|
||||||
|
|
||||||
|
self.semantic_distiller = SemanticDistiller(
|
||||||
|
str(self.data_dir / "semantic"), openai_api_key
|
||||||
|
)
|
||||||
|
|
||||||
|
async def process_memory_entries(
|
||||||
|
self, entries: List[MemoryEntry]
|
||||||
|
) -> Dict[str, Any]:
|
||||||
|
"""Process memory entries through the dual manifold pipeline."""
|
||||||
|
results = {
|
||||||
|
"individual_points_added": 0,
|
||||||
|
"concepts_extracted": 0,
|
||||||
|
"connections_created": 0,
|
||||||
|
"gravity_wells_computed": 0,
|
||||||
|
}
|
||||||
|
|
||||||
|
# Add entries to individual manifold
|
||||||
|
for entry in entries:
|
||||||
|
self.individual_manifold.add_memory_entry(entry)
|
||||||
|
results["individual_points_added"] += 1
|
||||||
|
|
||||||
|
# Extract semantic concepts
|
||||||
|
concepts = await self.semantic_distiller.distill_entries(entries)
|
||||||
|
results["concepts_extracted"] = len(concepts)
|
||||||
|
|
||||||
|
# Add concepts to individual manifold
|
||||||
|
for concept in concepts:
|
||||||
|
self.individual_manifold.add_semantic_concept(concept)
|
||||||
|
|
||||||
|
# Create connections in individual manifold
|
||||||
|
self.individual_manifold.create_connections(entries, concepts)
|
||||||
|
results["connections_created"] = len(self.individual_manifold.connections)
|
||||||
|
|
||||||
|
# Compute gravity wells
|
||||||
|
gravity_wells = self.individual_manifold.compute_gravity_wells()
|
||||||
|
results["gravity_wells_computed"] = len(gravity_wells)
|
||||||
|
|
||||||
|
# Optionally merge into collective manifold
|
||||||
|
self.collective_manifold.merge_individual_manifold(self.individual_manifold)
|
||||||
|
|
||||||
|
return results
|
||||||
|
|
||||||
|
def search_dual_manifold(
|
||||||
|
self, query: str, manifold_type: str = "individual", top_k: int = 10
|
||||||
|
) -> Dict[str, Any]:
|
||||||
|
"""Search across the dual manifold system."""
|
||||||
|
results = {}
|
||||||
|
|
||||||
|
if manifold_type in ["individual", "both"]:
|
||||||
|
individual_results = self.individual_manifold.manifold_search(query, top_k)
|
||||||
|
results["individual"] = [
|
||||||
|
{
|
||||||
|
"point_id": point_id,
|
||||||
|
"similarity": similarity,
|
||||||
|
"content": self.individual_manifold.points[point_id].content[:200]
|
||||||
|
+ "...",
|
||||||
|
"metadata": self.individual_manifold.points[point_id].metadata,
|
||||||
|
}
|
||||||
|
for point_id, similarity in individual_results
|
||||||
|
]
|
||||||
|
|
||||||
|
if manifold_type in ["collective", "both"]:
|
||||||
|
# Collective search would be more complex, involving external sources
|
||||||
|
results["collective"] = []
|
||||||
|
|
||||||
|
return results
|
||||||
|
|
||||||
|
def get_manifold_stats(self) -> Dict[str, Any]:
|
||||||
|
"""Get statistics for both manifolds."""
|
||||||
|
return {
|
||||||
|
"individual": self.individual_manifold.get_stats(),
|
||||||
|
"collective": self.collective_manifold.get_stats(),
|
||||||
|
"semantic": self.semantic_distiller.get_stats(),
|
||||||
|
}
|
||||||
Binary file not shown.
454
src/think_bigger/core/memory/persona/persona_graph.py
Normal file
454
src/think_bigger/core/memory/persona/persona_graph.py
Normal file
@ -0,0 +1,454 @@
|
|||||||
|
"""Persona layer with knowledge graphs and centrality measures."""
|
||||||
|
|
||||||
|
import json
|
||||||
|
import networkx as nx
|
||||||
|
from datetime import datetime
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Dict, List, Optional, Any, Tuple
|
||||||
|
from dataclasses import dataclass, asdict
|
||||||
|
from collections import defaultdict
|
||||||
|
|
||||||
|
from loguru import logger
|
||||||
|
|
||||||
|
from think_bigger.core.memory.episodic.episodic_memory import MemoryEntry
|
||||||
|
from think_bigger.core.memory.semantic.semantic_distiller import SemanticConcept
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class KnowledgeNode:
|
||||||
|
"""Represents a node in the knowledge graph."""
|
||||||
|
|
||||||
|
id: str
|
||||||
|
type: str # 'concept', 'entry', 'entity', 'topic'
|
||||||
|
label: str
|
||||||
|
properties: Dict[str, Any]
|
||||||
|
created_at: datetime
|
||||||
|
updated_at: datetime
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class KnowledgeEdge:
|
||||||
|
"""Represents an edge in the knowledge graph."""
|
||||||
|
|
||||||
|
source_id: str
|
||||||
|
target_id: str
|
||||||
|
relationship: str
|
||||||
|
weight: float
|
||||||
|
properties: Dict[str, Any]
|
||||||
|
created_at: datetime
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class GravityWell:
|
||||||
|
"""Represents a gravity well in the knowledge graph."""
|
||||||
|
|
||||||
|
center_node_id: str
|
||||||
|
radius: float
|
||||||
|
mass: float # Based on centrality and connectivity
|
||||||
|
influenced_nodes: List[str]
|
||||||
|
created_at: datetime
|
||||||
|
|
||||||
|
|
||||||
|
class PersonaGraph:
|
||||||
|
"""Knowledge graph for persona layer with centrality measures and gravity wells."""
|
||||||
|
|
||||||
|
def __init__(self, data_dir: str):
|
||||||
|
self.data_dir = Path(data_dir)
|
||||||
|
self.data_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
# NetworkX graph
|
||||||
|
self.graph: nx.DiGraph = nx.DiGraph()
|
||||||
|
|
||||||
|
# Storage
|
||||||
|
self.nodes: Dict[str, KnowledgeNode] = {}
|
||||||
|
self.edges: List[KnowledgeEdge] = []
|
||||||
|
self.gravity_wells: Dict[str, GravityWell] = {}
|
||||||
|
|
||||||
|
# Persistence paths
|
||||||
|
self.graph_path = self.data_dir / "persona_graph.graphml"
|
||||||
|
self.nodes_path = self.data_dir / "persona_nodes.json"
|
||||||
|
self.edges_path = self.data_dir / "persona_edges.json"
|
||||||
|
self.wells_path = self.data_dir / "gravity_wells.json"
|
||||||
|
|
||||||
|
# Load existing data
|
||||||
|
self._load_persistent_data()
|
||||||
|
|
||||||
|
def _load_persistent_data(self):
|
||||||
|
"""Load persisted graph data."""
|
||||||
|
try:
|
||||||
|
# Load nodes
|
||||||
|
if self.nodes_path.exists():
|
||||||
|
with open(self.nodes_path, "r") as f:
|
||||||
|
nodes_data = json.load(f)
|
||||||
|
for node_dict in nodes_data:
|
||||||
|
node = KnowledgeNode(**node_dict)
|
||||||
|
node.created_at = datetime.fromisoformat(
|
||||||
|
node_dict["created_at"]
|
||||||
|
)
|
||||||
|
node.updated_at = datetime.fromisoformat(
|
||||||
|
node_dict["updated_at"]
|
||||||
|
)
|
||||||
|
self.nodes[node.id] = node
|
||||||
|
self.graph.add_node(node.id, **asdict(node))
|
||||||
|
|
||||||
|
# Load edges
|
||||||
|
if self.edges_path.exists():
|
||||||
|
with open(self.edges_path, "r") as f:
|
||||||
|
edges_data = json.load(f)
|
||||||
|
for edge_dict in edges_data:
|
||||||
|
edge = KnowledgeEdge(**edge_dict)
|
||||||
|
edge.created_at = datetime.fromisoformat(
|
||||||
|
edge_dict["created_at"]
|
||||||
|
)
|
||||||
|
self.edges.append(edge)
|
||||||
|
self.graph.add_edge(
|
||||||
|
edge.source_id,
|
||||||
|
edge.target_id,
|
||||||
|
relationship=edge.relationship,
|
||||||
|
weight=edge.weight,
|
||||||
|
**edge.properties,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Load gravity wells
|
||||||
|
if self.wells_path.exists():
|
||||||
|
with open(self.wells_path, "r") as f:
|
||||||
|
wells_data = json.load(f)
|
||||||
|
for well_dict in wells_data:
|
||||||
|
well = GravityWell(**well_dict)
|
||||||
|
well.created_at = datetime.fromisoformat(
|
||||||
|
well_dict["created_at"]
|
||||||
|
)
|
||||||
|
self.gravity_wells[well.center_node_id] = well
|
||||||
|
|
||||||
|
logger.info(
|
||||||
|
f"Loaded persona graph with {len(self.nodes)} nodes, {len(self.edges)} edges, {len(self.gravity_wells)} gravity wells"
|
||||||
|
)
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error loading graph data: {e}")
|
||||||
|
# Reset to empty graph
|
||||||
|
self.graph = nx.DiGraph()
|
||||||
|
self.nodes = {}
|
||||||
|
self.edges = []
|
||||||
|
self.gravity_wells = {}
|
||||||
|
|
||||||
|
def _save_persistent_data(self):
|
||||||
|
"""Save graph data to disk."""
|
||||||
|
try:
|
||||||
|
# Save nodes
|
||||||
|
nodes_data = []
|
||||||
|
for node in self.nodes.values():
|
||||||
|
node_dict = asdict(node)
|
||||||
|
node_dict["created_at"] = node.created_at.isoformat()
|
||||||
|
node_dict["updated_at"] = node.updated_at.isoformat()
|
||||||
|
nodes_data.append(node_dict)
|
||||||
|
|
||||||
|
with open(self.nodes_path, "w") as f:
|
||||||
|
json.dump(nodes_data, f, indent=2)
|
||||||
|
|
||||||
|
# Save edges
|
||||||
|
edges_data = []
|
||||||
|
for edge in self.edges:
|
||||||
|
edge_dict = asdict(edge)
|
||||||
|
edge_dict["created_at"] = edge.created_at.isoformat()
|
||||||
|
edges_data.append(edge_dict)
|
||||||
|
|
||||||
|
with open(self.edges_path, "w") as f:
|
||||||
|
json.dump(edges_data, f, indent=2)
|
||||||
|
|
||||||
|
# Save gravity wells
|
||||||
|
wells_data = []
|
||||||
|
for well in self.gravity_wells.values():
|
||||||
|
well_dict = asdict(well)
|
||||||
|
well_dict["created_at"] = well.created_at.isoformat()
|
||||||
|
wells_data.append(well_dict)
|
||||||
|
|
||||||
|
with open(self.wells_path, "w") as f:
|
||||||
|
json.dump(wells_data, f, indent=2)
|
||||||
|
|
||||||
|
# Save NetworkX graph
|
||||||
|
nx.write_graphml(self.graph, self.graph_path)
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error saving graph data: {e}")
|
||||||
|
|
||||||
|
def add_concept_node(self, concept: SemanticConcept) -> str:
|
||||||
|
"""Add a concept as a node in the knowledge graph."""
|
||||||
|
node_id = f"concept_{concept.id}"
|
||||||
|
|
||||||
|
node = KnowledgeNode(
|
||||||
|
id=node_id,
|
||||||
|
type="concept",
|
||||||
|
label=concept.name,
|
||||||
|
properties={
|
||||||
|
"description": concept.description,
|
||||||
|
"confidence": concept.confidence,
|
||||||
|
"related_entries": concept.related_entries,
|
||||||
|
"semantic_id": concept.id,
|
||||||
|
},
|
||||||
|
created_at=datetime.now(),
|
||||||
|
updated_at=datetime.now(),
|
||||||
|
)
|
||||||
|
|
||||||
|
self.nodes[node_id] = node
|
||||||
|
self.graph.add_node(node_id, **asdict(node))
|
||||||
|
|
||||||
|
self._save_persistent_data()
|
||||||
|
return node_id
|
||||||
|
|
||||||
|
def add_entry_node(self, entry: MemoryEntry) -> str:
|
||||||
|
"""Add a memory entry as a node in the knowledge graph."""
|
||||||
|
node_id = f"entry_{entry.id}"
|
||||||
|
|
||||||
|
node = KnowledgeNode(
|
||||||
|
id=node_id,
|
||||||
|
type="entry",
|
||||||
|
label=f"Entry from {entry.source_file}",
|
||||||
|
properties={
|
||||||
|
"content": entry.content[:200] + "..."
|
||||||
|
if len(entry.content) > 200
|
||||||
|
else entry.content,
|
||||||
|
"timestamp": entry.timestamp.isoformat(),
|
||||||
|
"source_file": entry.source_file,
|
||||||
|
"entry_id": entry.id,
|
||||||
|
"metadata": entry.metadata,
|
||||||
|
},
|
||||||
|
created_at=datetime.now(),
|
||||||
|
updated_at=datetime.now(),
|
||||||
|
)
|
||||||
|
|
||||||
|
self.nodes[node_id] = node
|
||||||
|
self.graph.add_node(node_id, **asdict(node))
|
||||||
|
|
||||||
|
self._save_persistent_data()
|
||||||
|
return node_id
|
||||||
|
|
||||||
|
def add_relationship(
|
||||||
|
self,
|
||||||
|
source_id: str,
|
||||||
|
target_id: str,
|
||||||
|
relationship: str,
|
||||||
|
weight: float = 1.0,
|
||||||
|
properties: Optional[Dict[str, Any]] = None,
|
||||||
|
) -> None:
|
||||||
|
"""Add a relationship between nodes."""
|
||||||
|
if source_id not in self.nodes or target_id not in self.nodes:
|
||||||
|
logger.warning(
|
||||||
|
f"Cannot add relationship: nodes {source_id} or {target_id} not found"
|
||||||
|
)
|
||||||
|
return
|
||||||
|
|
||||||
|
edge = KnowledgeEdge(
|
||||||
|
source_id=source_id,
|
||||||
|
target_id=target_id,
|
||||||
|
relationship=relationship,
|
||||||
|
weight=weight,
|
||||||
|
properties=properties or {},
|
||||||
|
created_at=datetime.now(),
|
||||||
|
)
|
||||||
|
|
||||||
|
self.edges.append(edge)
|
||||||
|
self.graph.add_edge(
|
||||||
|
source_id,
|
||||||
|
target_id,
|
||||||
|
relationship=relationship,
|
||||||
|
weight=weight,
|
||||||
|
**(properties or {}),
|
||||||
|
)
|
||||||
|
|
||||||
|
self._save_persistent_data()
|
||||||
|
|
||||||
|
def connect_concept_to_entries(
|
||||||
|
self, concept_node_id: str, entry_node_ids: List[str]
|
||||||
|
) -> None:
|
||||||
|
"""Connect a concept node to its related entry nodes."""
|
||||||
|
for entry_node_id in entry_node_ids:
|
||||||
|
if entry_node_id in self.nodes:
|
||||||
|
self.add_relationship(
|
||||||
|
concept_node_id,
|
||||||
|
entry_node_id,
|
||||||
|
"relates_to",
|
||||||
|
weight=0.8,
|
||||||
|
properties={"connection_type": "semantic"},
|
||||||
|
)
|
||||||
|
|
||||||
|
def compute_centrality_measures(self) -> Dict[str, Dict[str, float]]:
|
||||||
|
"""Compute various centrality measures for the graph."""
|
||||||
|
centrality_measures = {}
|
||||||
|
|
||||||
|
if len(self.graph.nodes) == 0:
|
||||||
|
return centrality_measures
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Degree centrality
|
||||||
|
degree_centrality = nx.degree_centrality(self.graph)
|
||||||
|
centrality_measures["degree"] = degree_centrality
|
||||||
|
|
||||||
|
# Betweenness centrality
|
||||||
|
betweenness_centrality = nx.betweenness_centrality(self.graph)
|
||||||
|
centrality_measures["betweenness"] = betweenness_centrality
|
||||||
|
|
||||||
|
# Eigenvector centrality
|
||||||
|
eigenvector_centrality = nx.eigenvector_centrality_numpy(self.graph)
|
||||||
|
centrality_measures["eigenvector"] = eigenvector_centrality
|
||||||
|
|
||||||
|
# PageRank
|
||||||
|
pagerank = nx.pagerank(self.graph)
|
||||||
|
centrality_measures["pagerank"] = pagerank
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error computing centrality measures: {e}")
|
||||||
|
|
||||||
|
return centrality_measures
|
||||||
|
|
||||||
|
def identify_gravity_wells(
|
||||||
|
self, centrality_measures: Optional[Dict[str, Dict[str, float]]] = None
|
||||||
|
) -> List[GravityWell]:
|
||||||
|
"""Identify gravity wells based on centrality and connectivity."""
|
||||||
|
if centrality_measures is None:
|
||||||
|
centrality_measures = self.compute_centrality_measures()
|
||||||
|
|
||||||
|
gravity_wells = []
|
||||||
|
|
||||||
|
# Use combined centrality score to identify potential centers
|
||||||
|
combined_scores = {}
|
||||||
|
for node_id in self.graph.nodes():
|
||||||
|
degree_score = centrality_measures.get("degree", {}).get(node_id, 0)
|
||||||
|
betweenness_score = centrality_measures.get("betweenness", {}).get(
|
||||||
|
node_id, 0
|
||||||
|
)
|
||||||
|
eigenvector_score = centrality_measures.get("eigenvector", {}).get(
|
||||||
|
node_id, 0
|
||||||
|
)
|
||||||
|
pagerank_score = centrality_measures.get("pagerank", {}).get(node_id, 0)
|
||||||
|
|
||||||
|
# Weighted combination
|
||||||
|
combined_score = (
|
||||||
|
0.3 * degree_score
|
||||||
|
+ 0.3 * betweenness_score
|
||||||
|
+ 0.2 * eigenvector_score
|
||||||
|
+ 0.2 * pagerank_score
|
||||||
|
)
|
||||||
|
combined_scores[node_id] = combined_score
|
||||||
|
|
||||||
|
# Sort nodes by combined centrality
|
||||||
|
sorted_nodes = sorted(combined_scores.items(), key=lambda x: x[1], reverse=True)
|
||||||
|
|
||||||
|
# Create gravity wells for top nodes
|
||||||
|
for node_id, score in sorted_nodes[:10]: # Top 10 potential centers
|
||||||
|
if score > 0.1: # Minimum threshold
|
||||||
|
# Find nodes within "gravitational influence"
|
||||||
|
influenced_nodes = self._find_influenced_nodes(node_id, max_distance=3)
|
||||||
|
|
||||||
|
# Calculate mass based on centrality and connections
|
||||||
|
mass = score * (1 + len(influenced_nodes) * 0.1)
|
||||||
|
|
||||||
|
well = GravityWell(
|
||||||
|
center_node_id=node_id,
|
||||||
|
radius=2.0, # Fixed radius for now
|
||||||
|
mass=mass,
|
||||||
|
influenced_nodes=influenced_nodes,
|
||||||
|
created_at=datetime.now(),
|
||||||
|
)
|
||||||
|
|
||||||
|
gravity_wells.append(well)
|
||||||
|
self.gravity_wells[node_id] = well
|
||||||
|
|
||||||
|
self._save_persistent_data()
|
||||||
|
return gravity_wells
|
||||||
|
|
||||||
|
def _find_influenced_nodes(
|
||||||
|
self, center_node: str, max_distance: int = 3
|
||||||
|
) -> List[str]:
|
||||||
|
"""Find nodes within gravitational influence of a center node."""
|
||||||
|
try:
|
||||||
|
# Get all nodes within max_distance
|
||||||
|
distances = nx.single_source_shortest_path_length(
|
||||||
|
self.graph, center_node, cutoff=max_distance
|
||||||
|
)
|
||||||
|
return list(distances.keys())[1:] # Exclude the center node itself
|
||||||
|
except Exception:
|
||||||
|
return []
|
||||||
|
|
||||||
|
def get_node_by_id(self, node_id: str) -> Optional[KnowledgeNode]:
|
||||||
|
"""Get a node by its ID."""
|
||||||
|
return self.nodes.get(node_id)
|
||||||
|
|
||||||
|
def get_neighbors(
|
||||||
|
self, node_id: str, relationship_type: Optional[str] = None
|
||||||
|
) -> List[Tuple[str, Dict[str, Any]]]:
|
||||||
|
"""Get neighbors of a node, optionally filtered by relationship type."""
|
||||||
|
if node_id not in self.graph:
|
||||||
|
return []
|
||||||
|
|
||||||
|
neighbors = []
|
||||||
|
for neighbor_id in self.graph.neighbors(node_id):
|
||||||
|
edge_data = self.graph.get_edge_data(node_id, neighbor_id)
|
||||||
|
if (
|
||||||
|
relationship_type is None
|
||||||
|
or edge_data.get("relationship") == relationship_type
|
||||||
|
):
|
||||||
|
neighbors.append((neighbor_id, edge_data))
|
||||||
|
|
||||||
|
return neighbors
|
||||||
|
|
||||||
|
def find_paths_between_concepts(
|
||||||
|
self, start_concept: str, end_concept: str, max_length: int = 5
|
||||||
|
) -> List[List[str]]:
|
||||||
|
"""Find paths between two concepts in the graph."""
|
||||||
|
try:
|
||||||
|
paths = list(
|
||||||
|
nx.all_simple_paths(
|
||||||
|
self.graph, start_concept, end_concept, cutoff=max_length
|
||||||
|
)
|
||||||
|
)
|
||||||
|
return paths
|
||||||
|
except Exception:
|
||||||
|
return []
|
||||||
|
|
||||||
|
def get_subgraph_around_node(self, node_id: str, radius: int = 2) -> nx.DiGraph:
|
||||||
|
"""Get a subgraph centered around a node."""
|
||||||
|
try:
|
||||||
|
subgraph_nodes = set()
|
||||||
|
# Get nodes within radius
|
||||||
|
distances = nx.single_source_shortest_path_length(
|
||||||
|
self.graph, node_id, cutoff=radius
|
||||||
|
)
|
||||||
|
subgraph_nodes.update(distances.keys())
|
||||||
|
|
||||||
|
# Also include nodes that point to this node (incoming edges)
|
||||||
|
for predecessor in self.graph.predecessors(node_id):
|
||||||
|
subgraph_nodes.add(predecessor)
|
||||||
|
|
||||||
|
return self.graph.subgraph(subgraph_nodes).copy()
|
||||||
|
except Exception:
|
||||||
|
return nx.DiGraph()
|
||||||
|
|
||||||
|
def get_stats(self) -> Dict[str, Any]:
|
||||||
|
"""Get graph statistics."""
|
||||||
|
return {
|
||||||
|
"total_nodes": len(self.nodes),
|
||||||
|
"total_edges": len(self.edges),
|
||||||
|
"total_gravity_wells": len(self.gravity_wells),
|
||||||
|
"node_types": self._count_node_types(),
|
||||||
|
"edge_types": self._count_edge_types(),
|
||||||
|
"is_connected": nx.is_weakly_connected(self.graph) if self.graph else False,
|
||||||
|
"average_degree": sum(dict(self.graph.degree()).values()) / len(self.graph)
|
||||||
|
if self.graph
|
||||||
|
else 0,
|
||||||
|
"data_directory": str(self.data_dir),
|
||||||
|
}
|
||||||
|
|
||||||
|
def _count_node_types(self) -> Dict[str, int]:
|
||||||
|
"""Count nodes by type."""
|
||||||
|
type_counts = defaultdict(int)
|
||||||
|
for node in self.nodes.values():
|
||||||
|
type_counts[node.type] += 1
|
||||||
|
return dict(type_counts)
|
||||||
|
|
||||||
|
def _count_edge_types(self) -> Dict[str, int]:
|
||||||
|
"""Count edges by relationship type."""
|
||||||
|
type_counts = defaultdict(int)
|
||||||
|
for edge in self.edges:
|
||||||
|
type_counts[edge.relationship] += 1
|
||||||
|
return dict(type_counts)
|
||||||
Binary file not shown.
348
src/think_bigger/core/memory/semantic/semantic_distiller.py
Normal file
348
src/think_bigger/core/memory/semantic/semantic_distiller.py
Normal file
@ -0,0 +1,348 @@
|
|||||||
|
"""Semantic distillation layer for cognitive trajectory analysis."""
|
||||||
|
|
||||||
|
import json
|
||||||
|
from datetime import datetime
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Dict, List, Optional, Any
|
||||||
|
from dataclasses import dataclass, asdict
|
||||||
|
|
||||||
|
import openai
|
||||||
|
from loguru import logger
|
||||||
|
|
||||||
|
from think_bigger.core.memory.episodic.episodic_memory import MemoryEntry
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class SemanticConcept:
|
||||||
|
"""Represents a distilled semantic concept."""
|
||||||
|
|
||||||
|
id: str
|
||||||
|
name: str
|
||||||
|
description: str
|
||||||
|
confidence: float
|
||||||
|
related_entries: List[str]
|
||||||
|
created_at: datetime
|
||||||
|
updated_at: datetime
|
||||||
|
metadata: Dict[str, Any]
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class CognitiveTrajectory:
|
||||||
|
"""Represents a cognitive trajectory through concepts."""
|
||||||
|
|
||||||
|
id: str
|
||||||
|
concepts: List[str] # Concept IDs in chronological order
|
||||||
|
transitions: List[Dict[str, Any]] # Transition metadata
|
||||||
|
strength: float
|
||||||
|
entry_points: List[str] # Entry IDs that form this trajectory
|
||||||
|
created_at: datetime
|
||||||
|
|
||||||
|
|
||||||
|
class SemanticDistiller:
|
||||||
|
"""Semantic distillation using LLM analysis for cognitive trajectories."""
|
||||||
|
|
||||||
|
def __init__(self, data_dir: str, openai_api_key: str, model: str = "gpt-4"):
|
||||||
|
self.data_dir = Path(data_dir)
|
||||||
|
self.data_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
# OpenAI setup
|
||||||
|
self.client = openai.OpenAI(api_key=openai_api_key)
|
||||||
|
self.model = model
|
||||||
|
|
||||||
|
# Storage
|
||||||
|
self.concepts: Dict[str, SemanticConcept] = {}
|
||||||
|
self.trajectories: Dict[str, CognitiveTrajectory] = {}
|
||||||
|
|
||||||
|
# Persistence paths
|
||||||
|
self.concepts_path = self.data_dir / "semantic_concepts.json"
|
||||||
|
self.trajectories_path = self.data_dir / "cognitive_trajectories.json"
|
||||||
|
|
||||||
|
# Load existing data
|
||||||
|
self._load_persistent_data()
|
||||||
|
|
||||||
|
def _load_persistent_data(self):
|
||||||
|
"""Load persisted semantic data."""
|
||||||
|
try:
|
||||||
|
# Load concepts
|
||||||
|
if self.concepts_path.exists():
|
||||||
|
with open(self.concepts_path, "r") as f:
|
||||||
|
concepts_data = json.load(f)
|
||||||
|
for concept_dict in concepts_data:
|
||||||
|
concept = SemanticConcept(**concept_dict)
|
||||||
|
concept.created_at = datetime.fromisoformat(
|
||||||
|
concept_dict["created_at"]
|
||||||
|
)
|
||||||
|
concept.updated_at = datetime.fromisoformat(
|
||||||
|
concept_dict["updated_at"]
|
||||||
|
)
|
||||||
|
self.concepts[concept.id] = concept
|
||||||
|
|
||||||
|
# Load trajectories
|
||||||
|
if self.trajectories_path.exists():
|
||||||
|
with open(self.trajectories_path, "r") as f:
|
||||||
|
trajectories_data = json.load(f)
|
||||||
|
for traj_dict in trajectories_data:
|
||||||
|
trajectory = CognitiveTrajectory(**traj_dict)
|
||||||
|
trajectory.created_at = datetime.fromisoformat(
|
||||||
|
traj_dict["created_at"]
|
||||||
|
)
|
||||||
|
self.trajectories[trajectory.id] = trajectory
|
||||||
|
|
||||||
|
logger.info(
|
||||||
|
f"Loaded {len(self.concepts)} concepts and {len(self.trajectories)} trajectories"
|
||||||
|
)
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error loading semantic data: {e}")
|
||||||
|
|
||||||
|
def _save_persistent_data(self):
|
||||||
|
"""Save semantic data to disk."""
|
||||||
|
try:
|
||||||
|
# Save concepts
|
||||||
|
concepts_data = []
|
||||||
|
for concept in self.concepts.values():
|
||||||
|
concept_dict = asdict(concept)
|
||||||
|
concept_dict["created_at"] = concept.created_at.isoformat()
|
||||||
|
concept_dict["updated_at"] = concept.updated_at.isoformat()
|
||||||
|
concepts_data.append(concept_dict)
|
||||||
|
|
||||||
|
with open(self.concepts_path, "w") as f:
|
||||||
|
json.dump(concepts_data, f, indent=2)
|
||||||
|
|
||||||
|
# Save trajectories
|
||||||
|
trajectories_data = []
|
||||||
|
for trajectory in self.trajectories.values():
|
||||||
|
traj_dict = asdict(trajectory)
|
||||||
|
traj_dict["created_at"] = trajectory.created_at.isoformat()
|
||||||
|
trajectories_data.append(traj_dict)
|
||||||
|
|
||||||
|
with open(self.trajectories_path, "w") as f:
|
||||||
|
json.dump(trajectories_data, f, indent=2)
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error saving semantic data: {e}")
|
||||||
|
|
||||||
|
async def distill_entries(
|
||||||
|
self, entries: List[MemoryEntry]
|
||||||
|
) -> List[SemanticConcept]:
|
||||||
|
"""Distill semantic concepts from memory entries using LLM analysis."""
|
||||||
|
|
||||||
|
if not entries:
|
||||||
|
return []
|
||||||
|
|
||||||
|
# Prepare content for analysis
|
||||||
|
content_blocks = []
|
||||||
|
for entry in entries:
|
||||||
|
content_blocks.append(
|
||||||
|
f"Entry {entry.id} ({entry.timestamp.date()}): {entry.content[:500]}..."
|
||||||
|
)
|
||||||
|
|
||||||
|
combined_content = "\n\n".join(content_blocks)
|
||||||
|
|
||||||
|
# Create distillation prompt
|
||||||
|
prompt = f"""
|
||||||
|
Analyze the following collection of memory entries and extract the key semantic concepts and themes.
|
||||||
|
For each concept, provide:
|
||||||
|
1. A concise name
|
||||||
|
2. A brief description
|
||||||
|
3. Confidence score (0-1) of how well it represents the content
|
||||||
|
4. Which entries are most related to this concept
|
||||||
|
|
||||||
|
Content to analyze:
|
||||||
|
{combined_content}
|
||||||
|
|
||||||
|
Return your analysis as a JSON array of concept objects with the following structure:
|
||||||
|
[
|
||||||
|
{{
|
||||||
|
"name": "Concept Name",
|
||||||
|
"description": "Brief description of the concept",
|
||||||
|
"confidence": 0.85,
|
||||||
|
"related_entries": ["entry_id_1", "entry_id_2"]
|
||||||
|
}}
|
||||||
|
]
|
||||||
|
"""
|
||||||
|
|
||||||
|
try:
|
||||||
|
response = self.client.chat.completions.create(
|
||||||
|
model=self.model,
|
||||||
|
messages=[
|
||||||
|
{
|
||||||
|
"role": "system",
|
||||||
|
"content": "You are a semantic analysis expert. Extract meaningful concepts from text collections.",
|
||||||
|
},
|
||||||
|
{"role": "user", "content": prompt},
|
||||||
|
],
|
||||||
|
temperature=0.3,
|
||||||
|
max_tokens=2000,
|
||||||
|
)
|
||||||
|
|
||||||
|
result_text = response.choices[0].message.content.strip()
|
||||||
|
|
||||||
|
# Parse JSON response
|
||||||
|
if result_text.startswith("```json"):
|
||||||
|
result_text = result_text[7:-3].strip()
|
||||||
|
elif result_text.startswith("```"):
|
||||||
|
result_text = result_text[3:-3].strip()
|
||||||
|
|
||||||
|
concepts_data = json.loads(result_text)
|
||||||
|
|
||||||
|
# Create concept objects
|
||||||
|
new_concepts = []
|
||||||
|
for concept_data in concepts_data:
|
||||||
|
concept_id = (
|
||||||
|
f"concept_{datetime.now().isoformat()}_{len(self.concepts)}"
|
||||||
|
)
|
||||||
|
|
||||||
|
concept = SemanticConcept(
|
||||||
|
id=concept_id,
|
||||||
|
name=concept_data["name"],
|
||||||
|
description=concept_data["description"],
|
||||||
|
confidence=concept_data["confidence"],
|
||||||
|
related_entries=concept_data["related_entries"],
|
||||||
|
created_at=datetime.now(),
|
||||||
|
updated_at=datetime.now(),
|
||||||
|
metadata={"source": "llm_distillation"},
|
||||||
|
)
|
||||||
|
|
||||||
|
self.concepts[concept_id] = concept
|
||||||
|
new_concepts.append(concept)
|
||||||
|
|
||||||
|
# Save changes
|
||||||
|
self._save_persistent_data()
|
||||||
|
|
||||||
|
logger.info(f"Distilled {len(new_concepts)} semantic concepts")
|
||||||
|
return new_concepts
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error in semantic distillation: {e}")
|
||||||
|
return []
|
||||||
|
|
||||||
|
async def build_trajectory(
|
||||||
|
self, concept_ids: List[str], entries: List[MemoryEntry]
|
||||||
|
) -> Optional[CognitiveTrajectory]:
|
||||||
|
"""Build a cognitive trajectory from related concepts."""
|
||||||
|
|
||||||
|
if len(concept_ids) < 2:
|
||||||
|
return None
|
||||||
|
|
||||||
|
# Get concept details
|
||||||
|
concepts = [self.concepts[cid] for cid in concept_ids if cid in self.concepts]
|
||||||
|
|
||||||
|
# Prepare trajectory analysis prompt
|
||||||
|
concept_summaries = []
|
||||||
|
for concept in concepts:
|
||||||
|
concept_summaries.append(f"- {concept.name}: {concept.description}")
|
||||||
|
|
||||||
|
prompt = f"""
|
||||||
|
Analyze the following concepts and determine if they form a coherent cognitive trajectory or learning path.
|
||||||
|
A cognitive trajectory shows how ideas evolve, build upon each other, or represent progressive understanding.
|
||||||
|
|
||||||
|
Concepts:
|
||||||
|
{chr(10).join(concept_summaries)}
|
||||||
|
|
||||||
|
Determine:
|
||||||
|
1. Whether these concepts form a meaningful trajectory
|
||||||
|
2. The logical order/sequence of the concepts
|
||||||
|
3. The strength of the trajectory connection (0-1)
|
||||||
|
4. Key transition points between concepts
|
||||||
|
|
||||||
|
Return as JSON:
|
||||||
|
{{
|
||||||
|
"is_trajectory": true/false,
|
||||||
|
"sequence": ["concept_name_1", "concept_name_2", ...],
|
||||||
|
"strength": 0.75,
|
||||||
|
"transitions": [
|
||||||
|
{{"from": "concept_1", "to": "concept_2", "description": "transition explanation"}}
|
||||||
|
]
|
||||||
|
}}
|
||||||
|
"""
|
||||||
|
|
||||||
|
try:
|
||||||
|
response = self.client.chat.completions.create(
|
||||||
|
model=self.model,
|
||||||
|
messages=[
|
||||||
|
{
|
||||||
|
"role": "system",
|
||||||
|
"content": "You are a cognitive science expert analyzing learning trajectories.",
|
||||||
|
},
|
||||||
|
{"role": "user", "content": prompt},
|
||||||
|
],
|
||||||
|
temperature=0.2,
|
||||||
|
max_tokens=1500,
|
||||||
|
)
|
||||||
|
|
||||||
|
result_text = response.choices[0].message.content.strip()
|
||||||
|
|
||||||
|
# Parse JSON response
|
||||||
|
if result_text.startswith("```json"):
|
||||||
|
result_text = result_text[7:-3].strip()
|
||||||
|
elif result_text.startswith("```"):
|
||||||
|
result_text = result_text[3:-3].strip()
|
||||||
|
|
||||||
|
trajectory_data = json.loads(result_text)
|
||||||
|
|
||||||
|
if not trajectory_data.get("is_trajectory", False):
|
||||||
|
return None
|
||||||
|
|
||||||
|
# Map concept names back to IDs
|
||||||
|
name_to_id = {self.concepts[cid].name: cid for cid in concept_ids}
|
||||||
|
sequence_ids = [
|
||||||
|
name_to_id[name]
|
||||||
|
for name in trajectory_data["sequence"]
|
||||||
|
if name in name_to_id
|
||||||
|
]
|
||||||
|
|
||||||
|
trajectory = CognitiveTrajectory(
|
||||||
|
id=f"trajectory_{datetime.now().isoformat()}_{len(self.trajectories)}",
|
||||||
|
concepts=sequence_ids,
|
||||||
|
transitions=trajectory_data["transitions"],
|
||||||
|
strength=trajectory_data["strength"],
|
||||||
|
entry_points=[e.id for e in entries],
|
||||||
|
created_at=datetime.now(),
|
||||||
|
)
|
||||||
|
|
||||||
|
self.trajectories[trajectory.id] = trajectory
|
||||||
|
self._save_persistent_data()
|
||||||
|
|
||||||
|
logger.info(f"Built cognitive trajectory: {trajectory.id}")
|
||||||
|
return trajectory
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error building trajectory: {e}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
def get_related_concepts(self, entry_ids: List[str]) -> List[SemanticConcept]:
|
||||||
|
"""Get concepts related to specific entries."""
|
||||||
|
related_concepts = []
|
||||||
|
for concept in self.concepts.values():
|
||||||
|
if any(entry_id in concept.related_entries for entry_id in entry_ids):
|
||||||
|
related_concepts.append(concept)
|
||||||
|
return related_concepts
|
||||||
|
|
||||||
|
def get_trajectory_by_concepts(
|
||||||
|
self, concept_ids: List[str]
|
||||||
|
) -> List[CognitiveTrajectory]:
|
||||||
|
"""Find trajectories that contain the given concepts."""
|
||||||
|
matching_trajectories = []
|
||||||
|
for trajectory in self.trajectories.values():
|
||||||
|
if any(cid in trajectory.concepts for cid in concept_ids):
|
||||||
|
matching_trajectories.append(trajectory)
|
||||||
|
return matching_trajectories
|
||||||
|
|
||||||
|
def get_stats(self) -> Dict[str, Any]:
|
||||||
|
"""Get semantic layer statistics."""
|
||||||
|
return {
|
||||||
|
"total_concepts": len(self.concepts),
|
||||||
|
"total_trajectories": len(self.trajectories),
|
||||||
|
"avg_concept_confidence": sum(c.confidence for c in self.concepts.values())
|
||||||
|
/ len(self.concepts)
|
||||||
|
if self.concepts
|
||||||
|
else 0,
|
||||||
|
"avg_trajectory_strength": sum(
|
||||||
|
t.strength for t in self.trajectories.values()
|
||||||
|
)
|
||||||
|
/ len(self.trajectories)
|
||||||
|
if self.trajectories
|
||||||
|
else 0,
|
||||||
|
"data_directory": str(self.data_dir),
|
||||||
|
}
|
||||||
166
test_phase_1.py
Normal file
166
test_phase_1.py
Normal file
@ -0,0 +1,166 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""Test script for Phase 1 Dual Manifold Cognitive Architecture implementation."""
|
||||||
|
|
||||||
|
import asyncio
|
||||||
|
import os
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
# Set up data directory
|
||||||
|
data_dir = Path.home() / "think_bigger_test_data"
|
||||||
|
data_dir.mkdir(exist_ok=True)
|
||||||
|
|
||||||
|
|
||||||
|
async def test_phase_1_implementation():
|
||||||
|
"""Test the complete Phase 1 implementation."""
|
||||||
|
|
||||||
|
print("🧠 Testing Phase 1: Dual Manifold Cognitive Architecture")
|
||||||
|
print("=" * 60)
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Import required modules
|
||||||
|
from sentence_transformers import SentenceTransformer
|
||||||
|
from think_bigger.core.memory.episodic.episodic_memory import EpisodicMemory
|
||||||
|
from think_bigger.core.memory.semantic.semantic_distiller import (
|
||||||
|
SemanticDistiller,
|
||||||
|
)
|
||||||
|
from think_bigger.core.memory.persona.persona_graph import PersonaGraph
|
||||||
|
from think_bigger.core.memory.manifolds.dual_manifold import DualManifoldEngine
|
||||||
|
from think_bigger.core.memory.braiding.braiding_engine import BraidingEngine
|
||||||
|
|
||||||
|
print("✅ Imports successful")
|
||||||
|
|
||||||
|
# Initialize embedding model
|
||||||
|
print("🔄 Initializing embedding model...")
|
||||||
|
embedding_model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
|
||||||
|
print("✅ Embedding model ready")
|
||||||
|
|
||||||
|
# Initialize memory systems
|
||||||
|
print("🔄 Initializing memory systems...")
|
||||||
|
|
||||||
|
# Episodic Memory
|
||||||
|
episodic_memory = EpisodicMemory(str(data_dir / "episodic"))
|
||||||
|
print("✅ Episodic memory initialized")
|
||||||
|
|
||||||
|
# Semantic Distiller (using mock API key for testing)
|
||||||
|
semantic_distiller = SemanticDistiller(
|
||||||
|
str(data_dir / "semantic"), "mock-api-key"
|
||||||
|
)
|
||||||
|
print("✅ Semantic distiller initialized")
|
||||||
|
|
||||||
|
# Persona Graph
|
||||||
|
persona_graph = PersonaGraph(str(data_dir / "persona"))
|
||||||
|
print("✅ Persona graph initialized")
|
||||||
|
|
||||||
|
# Dual Manifold Engine
|
||||||
|
dual_manifold_engine = DualManifoldEngine(
|
||||||
|
str(data_dir / "manifolds"), embedding_model, "mock-api-key"
|
||||||
|
)
|
||||||
|
print("✅ Dual manifold engine initialized")
|
||||||
|
|
||||||
|
# Braiding Engine
|
||||||
|
braiding_engine = BraidingEngine(
|
||||||
|
str(data_dir / "braiding"), dual_manifold_engine, "mock-api-key"
|
||||||
|
)
|
||||||
|
print("✅ Braiding engine initialized")
|
||||||
|
|
||||||
|
print("\n📝 Testing memory entry addition...")
|
||||||
|
|
||||||
|
# Add some test memory entries
|
||||||
|
test_entries = [
|
||||||
|
{
|
||||||
|
"content": "The dual manifold cognitive architecture combines individual and collective knowledge representations through manifold learning techniques.",
|
||||||
|
"source_file": "architecture_notes.md",
|
||||||
|
"metadata": {"topic": "architecture", "importance": "high"},
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"content": "Episodic memory stores specific experiences with temporal preservation, using hybrid indexing with FAISS and BM25.",
|
||||||
|
"source_file": "memory_system.md",
|
||||||
|
"metadata": {"topic": "memory", "component": "episodic"},
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"content": "Semantic distillation extracts cognitive trajectories from memory entries using LLM analysis to identify meaningful concepts and their relationships.",
|
||||||
|
"source_file": "semantic_layer.md",
|
||||||
|
"metadata": {"topic": "memory", "component": "semantic"},
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"content": "Knowledge graphs in the persona layer use centrality measures and gravity wells to represent structured knowledge with NetworkX.",
|
||||||
|
"source_file": "persona_layer.md",
|
||||||
|
"metadata": {"topic": "memory", "component": "persona"},
|
||||||
|
},
|
||||||
|
]
|
||||||
|
|
||||||
|
added_entries = []
|
||||||
|
for entry_data in test_entries:
|
||||||
|
entry_id = episodic_memory.add_entry(
|
||||||
|
entry_data["content"], entry_data["source_file"], entry_data["metadata"]
|
||||||
|
)
|
||||||
|
added_entries.append(episodic_memory.get_entry_by_id(entry_id))
|
||||||
|
print(f"✅ Added entry: {entry_id}")
|
||||||
|
|
||||||
|
print(f"\n📊 Episodic memory stats: {episodic_memory.get_stats()}")
|
||||||
|
|
||||||
|
print("\n🔄 Processing entries through dual manifold...")
|
||||||
|
|
||||||
|
# Process entries through dual manifold
|
||||||
|
processing_results = await dual_manifold_engine.process_memory_entries(
|
||||||
|
added_entries
|
||||||
|
)
|
||||||
|
print(f"✅ Processing results: {processing_results}")
|
||||||
|
|
||||||
|
print("\n🔍 Testing hybrid search...")
|
||||||
|
|
||||||
|
# Test hybrid search
|
||||||
|
search_results = episodic_memory.hybrid_search(
|
||||||
|
"cognitive architecture", top_k=3
|
||||||
|
)
|
||||||
|
print(f"✅ Found {len(search_results)} results for 'cognitive architecture'")
|
||||||
|
for result in search_results[:2]: # Show top 2
|
||||||
|
print(f" - {result.content[:100]}... (score: {result.combined_score:.3f})")
|
||||||
|
|
||||||
|
print("\n🧵 Testing braiding engine...")
|
||||||
|
|
||||||
|
# Test braiding engine
|
||||||
|
braided_results = await braiding_engine.braid_search("memory systems", top_k=2)
|
||||||
|
print(f"✅ Braided search found {len(braided_results)} results")
|
||||||
|
for result in braided_results:
|
||||||
|
print(f" - {result.content[:80]}... (confidence: {result.confidence:.3f})")
|
||||||
|
|
||||||
|
print("\n📈 System Statistics:")
|
||||||
|
|
||||||
|
# Get system statistics
|
||||||
|
manifold_stats = dual_manifold_engine.get_manifold_stats()
|
||||||
|
braiding_stats = braiding_engine.get_gate_stats()
|
||||||
|
|
||||||
|
print(
|
||||||
|
f"Individual Manifold: {manifold_stats['individual']['total_points']} points"
|
||||||
|
)
|
||||||
|
print(
|
||||||
|
f"Collective Manifold: {manifold_stats['collective']['total_points']} points"
|
||||||
|
)
|
||||||
|
print(f"Braiding Gates: {braiding_stats['total_gates']} gates")
|
||||||
|
print(
|
||||||
|
f"Semantic Concepts: {manifold_stats['semantic']['total_concepts']} concepts"
|
||||||
|
)
|
||||||
|
|
||||||
|
print("\n🎉 Phase 1 implementation test completed successfully!")
|
||||||
|
print("\nKey Deliverables Implemented:")
|
||||||
|
print("✅ Episodic memory with hybrid FAISS + BM25 indexing")
|
||||||
|
print("✅ Semantic distillation pipeline")
|
||||||
|
print("✅ Knowledge graph construction with NetworkX")
|
||||||
|
print("✅ Dual manifold representation")
|
||||||
|
print("✅ Braiding engine with structural gates")
|
||||||
|
print("✅ FastAPI endpoints for all components")
|
||||||
|
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f"❌ Test failed with error: {e}")
|
||||||
|
import traceback
|
||||||
|
|
||||||
|
traceback.print_exc()
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
success = asyncio.run(test_phase_1_implementation())
|
||||||
|
exit(0 if success else 1)
|
||||||
Loading…
x
Reference in New Issue
Block a user