- Complete planning documentation for 5-phase development - UI design specifications and integration - Domain architecture and directory templates - Technical specifications and requirements - Knowledge incorporation strategies - Dana language reference and integration notes
318 lines
10 KiB
Markdown
318 lines
10 KiB
Markdown
# Media Ingestion and Processing Workflow
|
|
|
|
This document outlines the complete user journey for ingesting media content into the Advanced Second Brain PKM system, from initial file placement to actionable insights.
|
|
|
|
## Overview
|
|
|
|
The media ingestion workflow demonstrates the system's core value proposition: transforming passive media consumption into active knowledge management through automated processing, intelligent analysis, and seamless integration with the user's knowledge base.
|
|
|
|
## User Journey Map
|
|
|
|
### Phase 1: Content Acquisition (User Action)
|
|
|
|
**Trigger**: User discovers valuable content (lecture, podcast, video course)
|
|
|
|
**User Actions**:
|
|
1. Download or acquire media file (MP4, MP3, WebM, etc.)
|
|
2. Navigate to appropriate domain directory in file system
|
|
3. Place file in correct subfolder (e.g., `Neuroscience/Media/Lectures/`)
|
|
4. Optionally rename file for clarity
|
|
|
|
**System State**: File appears in domain directory, ready for processing
|
|
|
|
**User Expectations**:
|
|
- File placement should be intuitive
|
|
- No manual intervention required
|
|
- System should acknowledge file detection
|
|
|
|
### Phase 2: Automated Detection and Processing (Background)
|
|
|
|
**System Actions**:
|
|
1. **File Watcher Detection**: File system monitor detects new file within 5 seconds
|
|
2. **Metadata Extraction**: Extract file metadata (duration, size, format, creation date)
|
|
3. **Format Validation**: Verify file format is supported
|
|
4. **Queue Processing**: Add to media processing queue with priority
|
|
|
|
**Background Processing**:
|
|
1. **Transcription Service**: Send to Whisper/OpenAI/Google Speech-to-Text
|
|
2. **Transcript Generation**: Convert audio/video to timestamped text
|
|
3. **Quality Validation**: Check transcript accuracy (>90% confidence)
|
|
4. **Synchronization**: Align transcript with video timeline (if video)
|
|
5. **Storage**: Save transcript alongside original file
|
|
|
|
**System State**: Media file processed, transcript available
|
|
|
|
**User Feedback**: Notification in UI when processing complete
|
|
|
|
### Phase 3: Knowledge Integration (User Interaction)
|
|
|
|
**User Actions**:
|
|
1. Open Knowledge Browser for the domain
|
|
2. Navigate to media file in file tree
|
|
3. Click on video file to open in Content Viewer
|
|
|
|
**System Response**:
|
|
1. **Content Loading**: Display video player with controls
|
|
2. **Transcript Display**: Show synchronized transcript below video
|
|
3. **Navigation Integration**: Enable click-to-jump between transcript and video
|
|
|
|
**User Value**: Can now consume content with searchable, navigable transcript
|
|
|
|
### Phase 4: Intelligent Analysis (User-Driven)
|
|
|
|
**User Actions**:
|
|
1. Click "Run Fabric Pattern" button in Insight/Fabric pane
|
|
2. Select analysis pattern (e.g., "Extract Ideas", "Summarize", "Find Action Items")
|
|
3. Optionally adjust parameters
|
|
|
|
**System Actions**:
|
|
1. **Content Processing**: Send transcript to domain agent
|
|
2. **Pattern Execution**: Run selected Fabric analysis pattern
|
|
3. **Insight Generation**: Extract structured insights from content
|
|
4. **Result Display**: Show formatted results in right pane
|
|
|
|
**Example Output**:
|
|
```
|
|
## Extracted Ideas
|
|
- Neural networks can be understood as parallel distributed processors
|
|
- Backpropagation remains the most effective learning algorithm
|
|
- Attention mechanisms solve the bottleneck problem in RNNs
|
|
|
|
## Key Takeaways
|
|
- Deep learning has moved from art to science
|
|
- Transformer architecture enables better long-range dependencies
|
|
- Self-supervised learning reduces annotation requirements
|
|
```
|
|
|
|
### Phase 5: Knowledge Graph Integration (Automatic)
|
|
|
|
**System Actions**:
|
|
1. **Concept Extraction**: Identify key concepts from analysis results
|
|
2. **Graph Updates**: Add new concepts and relationships to knowledge graph
|
|
3. **Embedding Generation**: Create vector embeddings for new content
|
|
4. **Relationship Discovery**: Link to existing concepts in domain
|
|
|
|
**Background Processing**:
|
|
- Update semantic search index
|
|
- Recalculate concept centrality
|
|
- Generate cross-references to related content
|
|
- Update domain agent context
|
|
|
|
### Phase 6: Cross-Domain Connection (Optional Advanced Usage)
|
|
|
|
**User Actions**:
|
|
1. Notice connection between current content and another domain
|
|
2. Switch to Agent Studio mode
|
|
3. Modify Dana agent code to include cross-domain relationships
|
|
|
|
**Example Dana Code Modification**:
|
|
```
|
|
agent NeuroscienceAgent {
|
|
context: ["Neuroscience/Media/**", "CompSci/Papers/**"]
|
|
|
|
query(query) {
|
|
// Search both domains for neural network concepts
|
|
neuroscience_results = search_domain("Neuroscience", query)
|
|
compsci_results = search_domain("CompSci", "neural networks")
|
|
|
|
// Combine and synthesize results
|
|
return synthesize_results(neuroscience_results, compsci_results)
|
|
}
|
|
}
|
|
```
|
|
|
|
## Technical Implementation Details
|
|
|
|
### File System Integration
|
|
|
|
**Directory Structure**:
|
|
```
|
|
Domain_Name/
|
|
├── Media/
|
|
│ ├── Lectures/
|
|
│ ├── Podcasts/
|
|
│ ├── Videos/
|
|
│ └── Transcripts/ # Auto-generated
|
|
├── Papers/
|
|
├── Notes/
|
|
└── agent.na # Domain agent configuration
|
|
```
|
|
|
|
**File Naming Convention**:
|
|
- Original: `lecture_neural_networks_fundamentals.mp4`
|
|
- Transcript: `lecture_neural_networks_fundamentals.mp4.transcript.json`
|
|
|
|
### Processing Pipeline
|
|
|
|
**Queue Management**:
|
|
```python
|
|
@dataclass
|
|
class MediaProcessingJob:
|
|
file_path: str
|
|
domain_id: str
|
|
priority: int = 1
|
|
retry_count: int = 0
|
|
status: ProcessingStatus = ProcessingStatus.PENDING
|
|
```
|
|
|
|
**Processing Steps**:
|
|
1. **Validation**: Check file integrity and format support
|
|
2. **Transcription**: Call external API with error handling
|
|
3. **Post-processing**: Clean transcript, add timestamps
|
|
4. **Storage**: Save in structured JSON format
|
|
5. **Indexing**: Update search indices
|
|
6. **Notification**: Alert user of completion
|
|
|
|
### Transcript Format
|
|
|
|
**JSON Structure**:
|
|
```json
|
|
{
|
|
"metadata": {
|
|
"source_file": "lecture.mp4",
|
|
"duration": 3600,
|
|
"transcription_service": "whisper",
|
|
"confidence_score": 0.95,
|
|
"processing_timestamp": "2024-01-15T10:30:00Z"
|
|
},
|
|
"segments": [
|
|
{
|
|
"start": 0.0,
|
|
"end": 5.2,
|
|
"text": "Welcome to this lecture on neural networks.",
|
|
"confidence": 0.98
|
|
},
|
|
{
|
|
"start": 5.2,
|
|
"end": 12.1,
|
|
"text": "Today we'll cover the fundamental concepts...",
|
|
"confidence": 0.96
|
|
}
|
|
],
|
|
"chapters": [
|
|
{
|
|
"title": "Introduction",
|
|
"start": 0.0,
|
|
"end": 180.0
|
|
},
|
|
{
|
|
"title": "Basic Concepts",
|
|
"start": 180.0,
|
|
"end": 900.0
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### Synchronization Mechanism
|
|
|
|
**Video-Transcript Sync**:
|
|
- **Click Transcript**: Jump to corresponding video timestamp
|
|
- **Video Playback**: Highlight current transcript segment
|
|
- **Search**: Find text and jump to video location
|
|
- **Export**: Generate timestamped notes with video references
|
|
|
|
### Fabric Analysis Patterns
|
|
|
|
**Pattern Framework**:
|
|
```python
|
|
@dataclass
|
|
class FabricPattern:
|
|
name: str
|
|
description: str
|
|
input_type: str # "transcript", "document", "mixed"
|
|
output_format: str # "bullet_points", "summary", "structured"
|
|
|
|
async def execute(self, content: str, context: Dict[str, Any]) -> PatternResult:
|
|
# Implementation varies by pattern
|
|
pass
|
|
```
|
|
|
|
**Built-in Patterns**:
|
|
1. **Extract Ideas**: Identify key concepts and insights
|
|
2. **Summarize**: Create concise content summary
|
|
3. **Find Action Items**: Extract tasks and follow-ups
|
|
4. **Generate Questions**: Create study/discussion questions
|
|
5. **Extract References**: Find citations and sources
|
|
6. **Timeline Analysis**: Create chronological breakdown
|
|
|
|
### Error Handling and Recovery
|
|
|
|
**Failure Scenarios**:
|
|
- **Transcription Failure**: Retry with different service, notify user
|
|
- **File Corruption**: Skip processing, log error, allow manual retry
|
|
- **Storage Issues**: Queue for later processing, alert admin
|
|
- **Analysis Errors**: Fallback to basic processing, partial results
|
|
|
|
**User Communication**:
|
|
- Processing status indicators in UI
|
|
- Notification system for completion/failures
|
|
- Manual retry options for failed jobs
|
|
- Progress tracking for long-running tasks
|
|
|
|
## Performance Requirements
|
|
|
|
### Processing Times
|
|
- **File Detection**: <5 seconds
|
|
- **Metadata Extraction**: <1 second
|
|
- **Transcription**: <10% of media duration (e.g., 6 min for 1-hour video)
|
|
- **Analysis**: <30 seconds for typical content
|
|
- **UI Updates**: <2 seconds for all operations
|
|
|
|
### Scalability Targets
|
|
- **Concurrent Processing**: 10 media files simultaneously
|
|
- **Queue Throughput**: 50 files per hour
|
|
- **Storage Growth**: Handle 100GB+ media libraries
|
|
- **Search Performance**: <500ms for transcript searches
|
|
|
|
## User Experience Considerations
|
|
|
|
### Progressive Enhancement
|
|
- Basic playback works immediately
|
|
- Transcripts appear asynchronously
|
|
- Analysis results load on demand
|
|
- Advanced features available when processing complete
|
|
|
|
### Accessibility
|
|
- Keyboard navigation for all controls
|
|
- Screen reader support for transcripts
|
|
- High contrast mode for video controls
|
|
- Adjustable playback speeds
|
|
|
|
### Mobile Considerations
|
|
- Responsive video player
|
|
- Touch-friendly transcript navigation
|
|
- Offline transcript access
|
|
- Bandwidth-adaptive quality
|
|
|
|
## Success Metrics
|
|
|
|
### User Engagement
|
|
- **Completion Rate**: % of videos watched with transcripts
|
|
- **Analysis Usage**: % of content analyzed with Fabric patterns
|
|
- **Time Saved**: Average time reduction vs. manual note-taking
|
|
- **Knowledge Retention**: User-reported learning improvement
|
|
|
|
### Technical Performance
|
|
- **Processing Success Rate**: >95% of files processed successfully
|
|
- **Transcript Accuracy**: >90% confidence scores
|
|
- **Analysis Quality**: >80% user satisfaction with insights
|
|
- **System Reliability**: <1% processing failures
|
|
|
|
## Future Enhancements
|
|
|
|
### Advanced Features
|
|
- **Multi-language Support**: Automatic language detection and translation
|
|
- **Speaker Diarization**: Identify different speakers in recordings
|
|
- **Emotion Analysis**: Detect speaker enthusiasm and emphasis
|
|
- **Concept Mapping**: Visual knowledge graphs from transcripts
|
|
- **Collaborative Annotations**: Shared notes and highlights
|
|
|
|
### Integration Opportunities
|
|
- **Calendar Integration**: Sync with lecture schedules
|
|
- **Note-taking Apps**: Export to Roam Research, Obsidian, etc.
|
|
- **Learning Platforms**: Integration with Coursera, edX, etc.
|
|
- **Social Features**: Share insights with study groups
|
|
|
|
This workflow transforms passive media consumption into an active, intelligent knowledge management process, demonstrating the system's core value proposition of making complex information accessible and actionable.</content>
|
|
<parameter name="filePath">docs/plans/user-journeys/media-ingestion-workflow.md |