dictation-service/PROJECT_STRUCTURE.md
Kade Heyborne 73a15d03cd
Fix dictation service: state detection, async processing, and performance optimizations
- Fix state detection priority: dictation now takes precedence over conversation
- Fix critical bug: event loop was created but never started, preventing async coroutines from executing
- Optimize audio processing: reorder AcceptWaveform/PartialResult checks
- Switch to faster Vosk model: vosk-model-en-us-0.22-lgraph for 2-3x speed improvement
- Reduce block size from 8000 to 4000 for lower latency
- Add filtering to remove spurious 'the', 'a', 'an' words from start/end of transcriptions
- Update toggle-dictation.sh to properly clean up conversation lock file
- Improve batch audio processing for better responsiveness
2025-12-04 11:49:07 -07:00

134 lines
5.6 KiB
Markdown

# AI Dictation Service - Clean Project Structure
## 📁 **Directory Organization**
```
dictation-service/
├── 📁 src/
│ └── 📁 dictation_service/
│ ├── 🔧 ai_dictation_simple.py # Main AI dictation service (ACTIVE)
│ ├── 🔧 ai_dictation.py # Full version with GTK GUI
│ ├── 🔧 enhanced_dictation.py # Original enhanced dictation
│ ├── 🔧 vosk_dictation.py # Basic dictation
│ └── 🔧 main.py # Entry point
├── 📁 scripts/
│ ├── 🔧 fix_service.sh # Service setup with sudo
│ ├── 🔧 setup-dual-keybindings.sh # Alt+D & Super+Alt+D setup
│ ├── 🔧 setup_super_d_manual.sh # Manual Super+Alt+D setup
│ ├── 🔧 setup-keybindings.sh # Original Alt+D setup
│ ├── 🔧 setup-keybindings-manual.sh # Manual setup
│ ├── 🔧 switch-model.sh # Model switching tool
│ ├── 🔧 toggle-conversation.sh # Conversation mode toggle
│ └── 🔧 toggle-dictation.sh # Dictation mode toggle
├── 📁 tests/
│ ├── 🔧 run_all_tests.sh # Comprehensive test runner
│ ├── 🔧 test_original_dictation.py # Original dictation tests
│ ├── 🔧 test_suite.py # AI conversation tests
│ ├── 🔧 test_vllm_integration.py # VLLM integration tests
│ ├── 🔧 test_imports.py # Import tests
│ └── 🔧 test_run.py # Runtime tests
├── 📁 docs/
│ ├── 📖 AI_DICTATION_GUIDE.md # Complete user guide
│ ├── 📖 INSTALL.md # Installation instructions
│ ├── 📖 TESTING_SUMMARY.md # Test coverage overview
│ ├── 📖 TEST_RESULTS_AND_FIXES.md # Test results and fixes
│ ├── 📖 README.md # Project overview
│ └── 📖 CLAUDE.md # Claude configuration
├── 📁 ~/.shared/models/vosk-models/ # Shared model directory
│ ├── 🧠 vosk-model-en-us-0.22/ # Best accuracy model
│ ├── 🧠 vosk-model-en-us-0.22-lgraph/ # Good balance model
│ └── 🧠 vosk-model-small-en-us-0.15/ # Fast model
├── ⚙️ pyproject.toml # Python dependencies
├── ⚙️ uv.lock # Dependency lock file
├── ⚙️ .python-version # Python version
├── ⚙️ dictation.service # systemd service config
├── ⚙️ .gitignore # Git ignore rules
└── ⚙️ .venv/ # Python virtual environment
```
## 🎯 **Key Features by Directory**
### **src/** - Core Application Logic
- **Main Service**: `ai_dictation_simple.py` (currently active)
- **VLLM Integration**: OpenAI-compatible API client
- **TTS Engine**: Text-to-speech synthesis
- **Conversation Manager**: Persistent context management
- **Audio Processing**: Real-time speech recognition
### **scripts/** - System Integration
- **Keybinding Setup**: Super+Alt+D for AI conversation, Alt+D for dictation
- **Service Management**: systemd service configuration
- **Model Switching**: Easy switching between VOSK models
- **Mode Toggling**: Scripts to start/stop dictation and conversation modes
### **tests/** - Comprehensive Testing
- **100+ Test Cases**: Covering all functionality
- **Integration Tests**: VLLM, audio, and system integration
- **Performance Tests**: Response time and resource usage
- **Error Handling**: Failure and recovery scenarios
### **docs/** - Documentation
- **User Guide**: Complete setup and usage instructions
- **Test Results**: Comprehensive testing coverage report
- **Installation**: Step-by-step setup instructions
## 🚀 **Quick Start Commands**
```bash
# Setup keybindings (Super+Alt+D for AI, Alt+D for dictation)
./scripts/setup-dual-keybindings.sh
# Start service with sudo fix
./scripts/fix_service.sh
# Test VLLM integration
python tests/test_vllm_integration.py
# Run all tests
cd tests && ./run_all_tests.sh
# Switch speech recognition models
./scripts/switch-model.sh
```
## 🔧 **Configuration**
### **Keybindings:**
- **Super+Alt+D**: AI conversation mode (with persistent context)
- **Alt+D**: Traditional dictation mode
### **Models:**
- **Speech**: VOSK models from `~/.shared/models/vosk-models/`
- **AI**: Qwen/Qwen2.5-7B-Instruct-GPTQ-Int4 (VLLM)
### **API Endpoints:**
- **VLLM**: `http://127.0.0.1:8000/v1`
- **API Key**: `vllm-api-key`
## 📊 **Clean Project Benefits**
### **✅ Organization:**
- **Logical Structure**: Separate concerns into distinct directories
- **Easy Navigation**: Clear purpose for each directory
- **Scalable**: Easy to add new features and tests
### **✅ Maintainability:**
- **Modular Code**: Independent components and services
- **Version Control**: Clean git history without clutter
- **Testing Isolation**: Tests separate from production code
### **✅ Deployment:**
- **Service Ready**: systemd configuration included
- **Shared Resources**: Models in shared directory for multi-project use
- **Dependency Management**: uv package manager with lock file
---
**🎉 Your AI Dictation Service is now perfectly organized and ready for production use!**
The clean structure makes it easy to maintain, extend, and deploy your conversational AI phone call system with persistent conversation context.