think-bigger/docs/plans/user-journeys/media-ingestion-workflow.md
Kade Heyborne 48c6ddc066
Add comprehensive project documentation
- Complete planning documentation for 5-phase development
- UI design specifications and integration
- Domain architecture and directory templates
- Technical specifications and requirements
- Knowledge incorporation strategies
- Dana language reference and integration notes
2025-12-03 16:54:37 -07:00

318 lines
10 KiB
Markdown

# Media Ingestion and Processing Workflow
This document outlines the complete user journey for ingesting media content into the Advanced Second Brain PKM system, from initial file placement to actionable insights.
## Overview
The media ingestion workflow demonstrates the system's core value proposition: transforming passive media consumption into active knowledge management through automated processing, intelligent analysis, and seamless integration with the user's knowledge base.
## User Journey Map
### Phase 1: Content Acquisition (User Action)
**Trigger**: User discovers valuable content (lecture, podcast, video course)
**User Actions**:
1. Download or acquire media file (MP4, MP3, WebM, etc.)
2. Navigate to appropriate domain directory in file system
3. Place file in correct subfolder (e.g., `Neuroscience/Media/Lectures/`)
4. Optionally rename file for clarity
**System State**: File appears in domain directory, ready for processing
**User Expectations**:
- File placement should be intuitive
- No manual intervention required
- System should acknowledge file detection
### Phase 2: Automated Detection and Processing (Background)
**System Actions**:
1. **File Watcher Detection**: File system monitor detects new file within 5 seconds
2. **Metadata Extraction**: Extract file metadata (duration, size, format, creation date)
3. **Format Validation**: Verify file format is supported
4. **Queue Processing**: Add to media processing queue with priority
**Background Processing**:
1. **Transcription Service**: Send to Whisper/OpenAI/Google Speech-to-Text
2. **Transcript Generation**: Convert audio/video to timestamped text
3. **Quality Validation**: Check transcript accuracy (>90% confidence)
4. **Synchronization**: Align transcript with video timeline (if video)
5. **Storage**: Save transcript alongside original file
**System State**: Media file processed, transcript available
**User Feedback**: Notification in UI when processing complete
### Phase 3: Knowledge Integration (User Interaction)
**User Actions**:
1. Open Knowledge Browser for the domain
2. Navigate to media file in file tree
3. Click on video file to open in Content Viewer
**System Response**:
1. **Content Loading**: Display video player with controls
2. **Transcript Display**: Show synchronized transcript below video
3. **Navigation Integration**: Enable click-to-jump between transcript and video
**User Value**: Can now consume content with searchable, navigable transcript
### Phase 4: Intelligent Analysis (User-Driven)
**User Actions**:
1. Click "Run Fabric Pattern" button in Insight/Fabric pane
2. Select analysis pattern (e.g., "Extract Ideas", "Summarize", "Find Action Items")
3. Optionally adjust parameters
**System Actions**:
1. **Content Processing**: Send transcript to domain agent
2. **Pattern Execution**: Run selected Fabric analysis pattern
3. **Insight Generation**: Extract structured insights from content
4. **Result Display**: Show formatted results in right pane
**Example Output**:
```
## Extracted Ideas
- Neural networks can be understood as parallel distributed processors
- Backpropagation remains the most effective learning algorithm
- Attention mechanisms solve the bottleneck problem in RNNs
## Key Takeaways
- Deep learning has moved from art to science
- Transformer architecture enables better long-range dependencies
- Self-supervised learning reduces annotation requirements
```
### Phase 5: Knowledge Graph Integration (Automatic)
**System Actions**:
1. **Concept Extraction**: Identify key concepts from analysis results
2. **Graph Updates**: Add new concepts and relationships to knowledge graph
3. **Embedding Generation**: Create vector embeddings for new content
4. **Relationship Discovery**: Link to existing concepts in domain
**Background Processing**:
- Update semantic search index
- Recalculate concept centrality
- Generate cross-references to related content
- Update domain agent context
### Phase 6: Cross-Domain Connection (Optional Advanced Usage)
**User Actions**:
1. Notice connection between current content and another domain
2. Switch to Agent Studio mode
3. Modify Dana agent code to include cross-domain relationships
**Example Dana Code Modification**:
```
agent NeuroscienceAgent {
context: ["Neuroscience/Media/**", "CompSci/Papers/**"]
query(query) {
// Search both domains for neural network concepts
neuroscience_results = search_domain("Neuroscience", query)
compsci_results = search_domain("CompSci", "neural networks")
// Combine and synthesize results
return synthesize_results(neuroscience_results, compsci_results)
}
}
```
## Technical Implementation Details
### File System Integration
**Directory Structure**:
```
Domain_Name/
├── Media/
│ ├── Lectures/
│ ├── Podcasts/
│ ├── Videos/
│ └── Transcripts/ # Auto-generated
├── Papers/
├── Notes/
└── agent.na # Domain agent configuration
```
**File Naming Convention**:
- Original: `lecture_neural_networks_fundamentals.mp4`
- Transcript: `lecture_neural_networks_fundamentals.mp4.transcript.json`
### Processing Pipeline
**Queue Management**:
```python
@dataclass
class MediaProcessingJob:
file_path: str
domain_id: str
priority: int = 1
retry_count: int = 0
status: ProcessingStatus = ProcessingStatus.PENDING
```
**Processing Steps**:
1. **Validation**: Check file integrity and format support
2. **Transcription**: Call external API with error handling
3. **Post-processing**: Clean transcript, add timestamps
4. **Storage**: Save in structured JSON format
5. **Indexing**: Update search indices
6. **Notification**: Alert user of completion
### Transcript Format
**JSON Structure**:
```json
{
"metadata": {
"source_file": "lecture.mp4",
"duration": 3600,
"transcription_service": "whisper",
"confidence_score": 0.95,
"processing_timestamp": "2024-01-15T10:30:00Z"
},
"segments": [
{
"start": 0.0,
"end": 5.2,
"text": "Welcome to this lecture on neural networks.",
"confidence": 0.98
},
{
"start": 5.2,
"end": 12.1,
"text": "Today we'll cover the fundamental concepts...",
"confidence": 0.96
}
],
"chapters": [
{
"title": "Introduction",
"start": 0.0,
"end": 180.0
},
{
"title": "Basic Concepts",
"start": 180.0,
"end": 900.0
}
]
}
```
### Synchronization Mechanism
**Video-Transcript Sync**:
- **Click Transcript**: Jump to corresponding video timestamp
- **Video Playback**: Highlight current transcript segment
- **Search**: Find text and jump to video location
- **Export**: Generate timestamped notes with video references
### Fabric Analysis Patterns
**Pattern Framework**:
```python
@dataclass
class FabricPattern:
name: str
description: str
input_type: str # "transcript", "document", "mixed"
output_format: str # "bullet_points", "summary", "structured"
async def execute(self, content: str, context: Dict[str, Any]) -> PatternResult:
# Implementation varies by pattern
pass
```
**Built-in Patterns**:
1. **Extract Ideas**: Identify key concepts and insights
2. **Summarize**: Create concise content summary
3. **Find Action Items**: Extract tasks and follow-ups
4. **Generate Questions**: Create study/discussion questions
5. **Extract References**: Find citations and sources
6. **Timeline Analysis**: Create chronological breakdown
### Error Handling and Recovery
**Failure Scenarios**:
- **Transcription Failure**: Retry with different service, notify user
- **File Corruption**: Skip processing, log error, allow manual retry
- **Storage Issues**: Queue for later processing, alert admin
- **Analysis Errors**: Fallback to basic processing, partial results
**User Communication**:
- Processing status indicators in UI
- Notification system for completion/failures
- Manual retry options for failed jobs
- Progress tracking for long-running tasks
## Performance Requirements
### Processing Times
- **File Detection**: <5 seconds
- **Metadata Extraction**: <1 second
- **Transcription**: <10% of media duration (e.g., 6 min for 1-hour video)
- **Analysis**: <30 seconds for typical content
- **UI Updates**: <2 seconds for all operations
### Scalability Targets
- **Concurrent Processing**: 10 media files simultaneously
- **Queue Throughput**: 50 files per hour
- **Storage Growth**: Handle 100GB+ media libraries
- **Search Performance**: <500ms for transcript searches
## User Experience Considerations
### Progressive Enhancement
- Basic playback works immediately
- Transcripts appear asynchronously
- Analysis results load on demand
- Advanced features available when processing complete
### Accessibility
- Keyboard navigation for all controls
- Screen reader support for transcripts
- High contrast mode for video controls
- Adjustable playback speeds
### Mobile Considerations
- Responsive video player
- Touch-friendly transcript navigation
- Offline transcript access
- Bandwidth-adaptive quality
## Success Metrics
### User Engagement
- **Completion Rate**: % of videos watched with transcripts
- **Analysis Usage**: % of content analyzed with Fabric patterns
- **Time Saved**: Average time reduction vs. manual note-taking
- **Knowledge Retention**: User-reported learning improvement
### Technical Performance
- **Processing Success Rate**: >95% of files processed successfully
- **Transcript Accuracy**: >90% confidence scores
- **Analysis Quality**: >80% user satisfaction with insights
- **System Reliability**: <1% processing failures
## Future Enhancements
### Advanced Features
- **Multi-language Support**: Automatic language detection and translation
- **Speaker Diarization**: Identify different speakers in recordings
- **Emotion Analysis**: Detect speaker enthusiasm and emphasis
- **Concept Mapping**: Visual knowledge graphs from transcripts
- **Collaborative Annotations**: Shared notes and highlights
### Integration Opportunities
- **Calendar Integration**: Sync with lecture schedules
- **Note-taking Apps**: Export to Roam Research, Obsidian, etc.
- **Learning Platforms**: Integration with Coursera, edX, etc.
- **Social Features**: Share insights with study groups
This workflow transforms passive media consumption into an active, intelligent knowledge management process, demonstrating the system's core value proposition of making complex information accessible and actionable.</content>
<parameter name="filePath">docs/plans/user-journeys/media-ingestion-workflow.md