- Complete planning documentation for 5-phase development - UI design specifications and integration - Domain architecture and directory templates - Technical specifications and requirements - Knowledge incorporation strategies - Dana language reference and integration notes
10 KiB
Media Ingestion and Processing Workflow
This document outlines the complete user journey for ingesting media content into the Advanced Second Brain PKM system, from initial file placement to actionable insights.
Overview
The media ingestion workflow demonstrates the system's core value proposition: transforming passive media consumption into active knowledge management through automated processing, intelligent analysis, and seamless integration with the user's knowledge base.
User Journey Map
Phase 1: Content Acquisition (User Action)
Trigger: User discovers valuable content (lecture, podcast, video course)
User Actions:
- Download or acquire media file (MP4, MP3, WebM, etc.)
- Navigate to appropriate domain directory in file system
- Place file in correct subfolder (e.g.,
Neuroscience/Media/Lectures/) - Optionally rename file for clarity
System State: File appears in domain directory, ready for processing
User Expectations:
- File placement should be intuitive
- No manual intervention required
- System should acknowledge file detection
Phase 2: Automated Detection and Processing (Background)
System Actions:
- File Watcher Detection: File system monitor detects new file within 5 seconds
- Metadata Extraction: Extract file metadata (duration, size, format, creation date)
- Format Validation: Verify file format is supported
- Queue Processing: Add to media processing queue with priority
Background Processing:
- Transcription Service: Send to Whisper/OpenAI/Google Speech-to-Text
- Transcript Generation: Convert audio/video to timestamped text
- Quality Validation: Check transcript accuracy (>90% confidence)
- Synchronization: Align transcript with video timeline (if video)
- Storage: Save transcript alongside original file
System State: Media file processed, transcript available
User Feedback: Notification in UI when processing complete
Phase 3: Knowledge Integration (User Interaction)
User Actions:
- Open Knowledge Browser for the domain
- Navigate to media file in file tree
- Click on video file to open in Content Viewer
System Response:
- Content Loading: Display video player with controls
- Transcript Display: Show synchronized transcript below video
- Navigation Integration: Enable click-to-jump between transcript and video
User Value: Can now consume content with searchable, navigable transcript
Phase 4: Intelligent Analysis (User-Driven)
User Actions:
- Click "Run Fabric Pattern" button in Insight/Fabric pane
- Select analysis pattern (e.g., "Extract Ideas", "Summarize", "Find Action Items")
- Optionally adjust parameters
System Actions:
- Content Processing: Send transcript to domain agent
- Pattern Execution: Run selected Fabric analysis pattern
- Insight Generation: Extract structured insights from content
- Result Display: Show formatted results in right pane
Example Output:
## Extracted Ideas
- Neural networks can be understood as parallel distributed processors
- Backpropagation remains the most effective learning algorithm
- Attention mechanisms solve the bottleneck problem in RNNs
## Key Takeaways
- Deep learning has moved from art to science
- Transformer architecture enables better long-range dependencies
- Self-supervised learning reduces annotation requirements
Phase 5: Knowledge Graph Integration (Automatic)
System Actions:
- Concept Extraction: Identify key concepts from analysis results
- Graph Updates: Add new concepts and relationships to knowledge graph
- Embedding Generation: Create vector embeddings for new content
- Relationship Discovery: Link to existing concepts in domain
Background Processing:
- Update semantic search index
- Recalculate concept centrality
- Generate cross-references to related content
- Update domain agent context
Phase 6: Cross-Domain Connection (Optional Advanced Usage)
User Actions:
- Notice connection between current content and another domain
- Switch to Agent Studio mode
- Modify Dana agent code to include cross-domain relationships
Example Dana Code Modification:
agent NeuroscienceAgent {
context: ["Neuroscience/Media/**", "CompSci/Papers/**"]
query(query) {
// Search both domains for neural network concepts
neuroscience_results = search_domain("Neuroscience", query)
compsci_results = search_domain("CompSci", "neural networks")
// Combine and synthesize results
return synthesize_results(neuroscience_results, compsci_results)
}
}
Technical Implementation Details
File System Integration
Directory Structure:
Domain_Name/
├── Media/
│ ├── Lectures/
│ ├── Podcasts/
│ ├── Videos/
│ └── Transcripts/ # Auto-generated
├── Papers/
├── Notes/
└── agent.na # Domain agent configuration
File Naming Convention:
- Original:
lecture_neural_networks_fundamentals.mp4 - Transcript:
lecture_neural_networks_fundamentals.mp4.transcript.json
Processing Pipeline
Queue Management:
@dataclass
class MediaProcessingJob:
file_path: str
domain_id: str
priority: int = 1
retry_count: int = 0
status: ProcessingStatus = ProcessingStatus.PENDING
Processing Steps:
- Validation: Check file integrity and format support
- Transcription: Call external API with error handling
- Post-processing: Clean transcript, add timestamps
- Storage: Save in structured JSON format
- Indexing: Update search indices
- Notification: Alert user of completion
Transcript Format
JSON Structure:
{
"metadata": {
"source_file": "lecture.mp4",
"duration": 3600,
"transcription_service": "whisper",
"confidence_score": 0.95,
"processing_timestamp": "2024-01-15T10:30:00Z"
},
"segments": [
{
"start": 0.0,
"end": 5.2,
"text": "Welcome to this lecture on neural networks.",
"confidence": 0.98
},
{
"start": 5.2,
"end": 12.1,
"text": "Today we'll cover the fundamental concepts...",
"confidence": 0.96
}
],
"chapters": [
{
"title": "Introduction",
"start": 0.0,
"end": 180.0
},
{
"title": "Basic Concepts",
"start": 180.0,
"end": 900.0
}
]
}
Synchronization Mechanism
Video-Transcript Sync:
- Click Transcript: Jump to corresponding video timestamp
- Video Playback: Highlight current transcript segment
- Search: Find text and jump to video location
- Export: Generate timestamped notes with video references
Fabric Analysis Patterns
Pattern Framework:
@dataclass
class FabricPattern:
name: str
description: str
input_type: str # "transcript", "document", "mixed"
output_format: str # "bullet_points", "summary", "structured"
async def execute(self, content: str, context: Dict[str, Any]) -> PatternResult:
# Implementation varies by pattern
pass
Built-in Patterns:
- Extract Ideas: Identify key concepts and insights
- Summarize: Create concise content summary
- Find Action Items: Extract tasks and follow-ups
- Generate Questions: Create study/discussion questions
- Extract References: Find citations and sources
- Timeline Analysis: Create chronological breakdown
Error Handling and Recovery
Failure Scenarios:
- Transcription Failure: Retry with different service, notify user
- File Corruption: Skip processing, log error, allow manual retry
- Storage Issues: Queue for later processing, alert admin
- Analysis Errors: Fallback to basic processing, partial results
User Communication:
- Processing status indicators in UI
- Notification system for completion/failures
- Manual retry options for failed jobs
- Progress tracking for long-running tasks
Performance Requirements
Processing Times
- File Detection: <5 seconds
- Metadata Extraction: <1 second
- Transcription: <10% of media duration (e.g., 6 min for 1-hour video)
- Analysis: <30 seconds for typical content
- UI Updates: <2 seconds for all operations
Scalability Targets
- Concurrent Processing: 10 media files simultaneously
- Queue Throughput: 50 files per hour
- Storage Growth: Handle 100GB+ media libraries
- Search Performance: <500ms for transcript searches
User Experience Considerations
Progressive Enhancement
- Basic playback works immediately
- Transcripts appear asynchronously
- Analysis results load on demand
- Advanced features available when processing complete
Accessibility
- Keyboard navigation for all controls
- Screen reader support for transcripts
- High contrast mode for video controls
- Adjustable playback speeds
Mobile Considerations
- Responsive video player
- Touch-friendly transcript navigation
- Offline transcript access
- Bandwidth-adaptive quality
Success Metrics
User Engagement
- Completion Rate: % of videos watched with transcripts
- Analysis Usage: % of content analyzed with Fabric patterns
- Time Saved: Average time reduction vs. manual note-taking
- Knowledge Retention: User-reported learning improvement
Technical Performance
- Processing Success Rate: >95% of files processed successfully
- Transcript Accuracy: >90% confidence scores
- Analysis Quality: >80% user satisfaction with insights
- System Reliability: <1% processing failures
Future Enhancements
Advanced Features
- Multi-language Support: Automatic language detection and translation
- Speaker Diarization: Identify different speakers in recordings
- Emotion Analysis: Detect speaker enthusiasm and emphasis
- Concept Mapping: Visual knowledge graphs from transcripts
- Collaborative Annotations: Shared notes and highlights
Integration Opportunities
- Calendar Integration: Sync with lecture schedules
- Note-taking Apps: Export to Roam Research, Obsidian, etc.
- Learning Platforms: Integration with Coursera, edX, etc.
- Social Features: Share insights with study groups
This workflow transforms passive media consumption into an active, intelligent knowledge management process, demonstrating the system's core value proposition of making complex information accessible and actionable. docs/plans/user-journeys/media-ingestion-workflow.md