think-bigger/docs/plans/user-journeys/media-ingestion-workflow.md
Kade Heyborne 48c6ddc066
Add comprehensive project documentation
- Complete planning documentation for 5-phase development
- UI design specifications and integration
- Domain architecture and directory templates
- Technical specifications and requirements
- Knowledge incorporation strategies
- Dana language reference and integration notes
2025-12-03 16:54:37 -07:00

10 KiB

Media Ingestion and Processing Workflow

This document outlines the complete user journey for ingesting media content into the Advanced Second Brain PKM system, from initial file placement to actionable insights.

Overview

The media ingestion workflow demonstrates the system's core value proposition: transforming passive media consumption into active knowledge management through automated processing, intelligent analysis, and seamless integration with the user's knowledge base.

User Journey Map

Phase 1: Content Acquisition (User Action)

Trigger: User discovers valuable content (lecture, podcast, video course)

User Actions:

  1. Download or acquire media file (MP4, MP3, WebM, etc.)
  2. Navigate to appropriate domain directory in file system
  3. Place file in correct subfolder (e.g., Neuroscience/Media/Lectures/)
  4. Optionally rename file for clarity

System State: File appears in domain directory, ready for processing

User Expectations:

  • File placement should be intuitive
  • No manual intervention required
  • System should acknowledge file detection

Phase 2: Automated Detection and Processing (Background)

System Actions:

  1. File Watcher Detection: File system monitor detects new file within 5 seconds
  2. Metadata Extraction: Extract file metadata (duration, size, format, creation date)
  3. Format Validation: Verify file format is supported
  4. Queue Processing: Add to media processing queue with priority

Background Processing:

  1. Transcription Service: Send to Whisper/OpenAI/Google Speech-to-Text
  2. Transcript Generation: Convert audio/video to timestamped text
  3. Quality Validation: Check transcript accuracy (>90% confidence)
  4. Synchronization: Align transcript with video timeline (if video)
  5. Storage: Save transcript alongside original file

System State: Media file processed, transcript available

User Feedback: Notification in UI when processing complete

Phase 3: Knowledge Integration (User Interaction)

User Actions:

  1. Open Knowledge Browser for the domain
  2. Navigate to media file in file tree
  3. Click on video file to open in Content Viewer

System Response:

  1. Content Loading: Display video player with controls
  2. Transcript Display: Show synchronized transcript below video
  3. Navigation Integration: Enable click-to-jump between transcript and video

User Value: Can now consume content with searchable, navigable transcript

Phase 4: Intelligent Analysis (User-Driven)

User Actions:

  1. Click "Run Fabric Pattern" button in Insight/Fabric pane
  2. Select analysis pattern (e.g., "Extract Ideas", "Summarize", "Find Action Items")
  3. Optionally adjust parameters

System Actions:

  1. Content Processing: Send transcript to domain agent
  2. Pattern Execution: Run selected Fabric analysis pattern
  3. Insight Generation: Extract structured insights from content
  4. Result Display: Show formatted results in right pane

Example Output:

## Extracted Ideas
- Neural networks can be understood as parallel distributed processors
- Backpropagation remains the most effective learning algorithm
- Attention mechanisms solve the bottleneck problem in RNNs

## Key Takeaways
- Deep learning has moved from art to science
- Transformer architecture enables better long-range dependencies
- Self-supervised learning reduces annotation requirements

Phase 5: Knowledge Graph Integration (Automatic)

System Actions:

  1. Concept Extraction: Identify key concepts from analysis results
  2. Graph Updates: Add new concepts and relationships to knowledge graph
  3. Embedding Generation: Create vector embeddings for new content
  4. Relationship Discovery: Link to existing concepts in domain

Background Processing:

  • Update semantic search index
  • Recalculate concept centrality
  • Generate cross-references to related content
  • Update domain agent context

Phase 6: Cross-Domain Connection (Optional Advanced Usage)

User Actions:

  1. Notice connection between current content and another domain
  2. Switch to Agent Studio mode
  3. Modify Dana agent code to include cross-domain relationships

Example Dana Code Modification:

agent NeuroscienceAgent {
    context: ["Neuroscience/Media/**", "CompSci/Papers/**"]

    query(query) {
        // Search both domains for neural network concepts
        neuroscience_results = search_domain("Neuroscience", query)
        compsci_results = search_domain("CompSci", "neural networks")

        // Combine and synthesize results
        return synthesize_results(neuroscience_results, compsci_results)
    }
}

Technical Implementation Details

File System Integration

Directory Structure:

Domain_Name/
├── Media/
│   ├── Lectures/
│   ├── Podcasts/
│   ├── Videos/
│   └── Transcripts/  # Auto-generated
├── Papers/
├── Notes/
└── agent.na         # Domain agent configuration

File Naming Convention:

  • Original: lecture_neural_networks_fundamentals.mp4
  • Transcript: lecture_neural_networks_fundamentals.mp4.transcript.json

Processing Pipeline

Queue Management:

@dataclass
class MediaProcessingJob:
    file_path: str
    domain_id: str
    priority: int = 1
    retry_count: int = 0
    status: ProcessingStatus = ProcessingStatus.PENDING

Processing Steps:

  1. Validation: Check file integrity and format support
  2. Transcription: Call external API with error handling
  3. Post-processing: Clean transcript, add timestamps
  4. Storage: Save in structured JSON format
  5. Indexing: Update search indices
  6. Notification: Alert user of completion

Transcript Format

JSON Structure:

{
  "metadata": {
    "source_file": "lecture.mp4",
    "duration": 3600,
    "transcription_service": "whisper",
    "confidence_score": 0.95,
    "processing_timestamp": "2024-01-15T10:30:00Z"
  },
  "segments": [
    {
      "start": 0.0,
      "end": 5.2,
      "text": "Welcome to this lecture on neural networks.",
      "confidence": 0.98
    },
    {
      "start": 5.2,
      "end": 12.1,
      "text": "Today we'll cover the fundamental concepts...",
      "confidence": 0.96
    }
  ],
  "chapters": [
    {
      "title": "Introduction",
      "start": 0.0,
      "end": 180.0
    },
    {
      "title": "Basic Concepts",
      "start": 180.0,
      "end": 900.0
    }
  ]
}

Synchronization Mechanism

Video-Transcript Sync:

  • Click Transcript: Jump to corresponding video timestamp
  • Video Playback: Highlight current transcript segment
  • Search: Find text and jump to video location
  • Export: Generate timestamped notes with video references

Fabric Analysis Patterns

Pattern Framework:

@dataclass
class FabricPattern:
    name: str
    description: str
    input_type: str  # "transcript", "document", "mixed"
    output_format: str  # "bullet_points", "summary", "structured"

    async def execute(self, content: str, context: Dict[str, Any]) -> PatternResult:
        # Implementation varies by pattern
        pass

Built-in Patterns:

  1. Extract Ideas: Identify key concepts and insights
  2. Summarize: Create concise content summary
  3. Find Action Items: Extract tasks and follow-ups
  4. Generate Questions: Create study/discussion questions
  5. Extract References: Find citations and sources
  6. Timeline Analysis: Create chronological breakdown

Error Handling and Recovery

Failure Scenarios:

  • Transcription Failure: Retry with different service, notify user
  • File Corruption: Skip processing, log error, allow manual retry
  • Storage Issues: Queue for later processing, alert admin
  • Analysis Errors: Fallback to basic processing, partial results

User Communication:

  • Processing status indicators in UI
  • Notification system for completion/failures
  • Manual retry options for failed jobs
  • Progress tracking for long-running tasks

Performance Requirements

Processing Times

  • File Detection: <5 seconds
  • Metadata Extraction: <1 second
  • Transcription: <10% of media duration (e.g., 6 min for 1-hour video)
  • Analysis: <30 seconds for typical content
  • UI Updates: <2 seconds for all operations

Scalability Targets

  • Concurrent Processing: 10 media files simultaneously
  • Queue Throughput: 50 files per hour
  • Storage Growth: Handle 100GB+ media libraries
  • Search Performance: <500ms for transcript searches

User Experience Considerations

Progressive Enhancement

  • Basic playback works immediately
  • Transcripts appear asynchronously
  • Analysis results load on demand
  • Advanced features available when processing complete

Accessibility

  • Keyboard navigation for all controls
  • Screen reader support for transcripts
  • High contrast mode for video controls
  • Adjustable playback speeds

Mobile Considerations

  • Responsive video player
  • Touch-friendly transcript navigation
  • Offline transcript access
  • Bandwidth-adaptive quality

Success Metrics

User Engagement

  • Completion Rate: % of videos watched with transcripts
  • Analysis Usage: % of content analyzed with Fabric patterns
  • Time Saved: Average time reduction vs. manual note-taking
  • Knowledge Retention: User-reported learning improvement

Technical Performance

  • Processing Success Rate: >95% of files processed successfully
  • Transcript Accuracy: >90% confidence scores
  • Analysis Quality: >80% user satisfaction with insights
  • System Reliability: <1% processing failures

Future Enhancements

Advanced Features

  • Multi-language Support: Automatic language detection and translation
  • Speaker Diarization: Identify different speakers in recordings
  • Emotion Analysis: Detect speaker enthusiasm and emphasis
  • Concept Mapping: Visual knowledge graphs from transcripts
  • Collaborative Annotations: Shared notes and highlights

Integration Opportunities

  • Calendar Integration: Sync with lecture schedules
  • Note-taking Apps: Export to Roam Research, Obsidian, etc.
  • Learning Platforms: Integration with Coursera, edX, etc.
  • Social Features: Share insights with study groups

This workflow transforms passive media consumption into an active, intelligent knowledge management process, demonstrating the system's core value proposition of making complex information accessible and actionable. docs/plans/user-journeys/media-ingestion-workflow.md