- Fix state detection priority: dictation now takes precedence over conversation - Fix critical bug: event loop was created but never started, preventing async coroutines from executing - Optimize audio processing: reorder AcceptWaveform/PartialResult checks - Switch to faster Vosk model: vosk-model-en-us-0.22-lgraph for 2-3x speed improvement - Reduce block size from 8000 to 4000 for lower latency - Add filtering to remove spurious 'the', 'a', 'an' words from start/end of transcriptions - Update toggle-dictation.sh to properly clean up conversation lock file - Improve batch audio processing for better responsiveness
134 lines
5.6 KiB
Markdown
134 lines
5.6 KiB
Markdown
# AI Dictation Service - Clean Project Structure
|
|
|
|
## 📁 **Directory Organization**
|
|
|
|
```
|
|
dictation-service/
|
|
├── 📁 src/
|
|
│ └── 📁 dictation_service/
|
|
│ ├── 🔧 ai_dictation_simple.py # Main AI dictation service (ACTIVE)
|
|
│ ├── 🔧 ai_dictation.py # Full version with GTK GUI
|
|
│ ├── 🔧 enhanced_dictation.py # Original enhanced dictation
|
|
│ ├── 🔧 vosk_dictation.py # Basic dictation
|
|
│ └── 🔧 main.py # Entry point
|
|
│
|
|
├── 📁 scripts/
|
|
│ ├── 🔧 fix_service.sh # Service setup with sudo
|
|
│ ├── 🔧 setup-dual-keybindings.sh # Alt+D & Super+Alt+D setup
|
|
│ ├── 🔧 setup_super_d_manual.sh # Manual Super+Alt+D setup
|
|
│ ├── 🔧 setup-keybindings.sh # Original Alt+D setup
|
|
│ ├── 🔧 setup-keybindings-manual.sh # Manual setup
|
|
│ ├── 🔧 switch-model.sh # Model switching tool
|
|
│ ├── 🔧 toggle-conversation.sh # Conversation mode toggle
|
|
│ └── 🔧 toggle-dictation.sh # Dictation mode toggle
|
|
│
|
|
├── 📁 tests/
|
|
│ ├── 🔧 run_all_tests.sh # Comprehensive test runner
|
|
│ ├── 🔧 test_original_dictation.py # Original dictation tests
|
|
│ ├── 🔧 test_suite.py # AI conversation tests
|
|
│ ├── 🔧 test_vllm_integration.py # VLLM integration tests
|
|
│ ├── 🔧 test_imports.py # Import tests
|
|
│ └── 🔧 test_run.py # Runtime tests
|
|
│
|
|
├── 📁 docs/
|
|
│ ├── 📖 AI_DICTATION_GUIDE.md # Complete user guide
|
|
│ ├── 📖 INSTALL.md # Installation instructions
|
|
│ ├── 📖 TESTING_SUMMARY.md # Test coverage overview
|
|
│ ├── 📖 TEST_RESULTS_AND_FIXES.md # Test results and fixes
|
|
│ ├── 📖 README.md # Project overview
|
|
│ └── 📖 CLAUDE.md # Claude configuration
|
|
│
|
|
├── 📁 ~/.shared/models/vosk-models/ # Shared model directory
|
|
│ ├── 🧠 vosk-model-en-us-0.22/ # Best accuracy model
|
|
│ ├── 🧠 vosk-model-en-us-0.22-lgraph/ # Good balance model
|
|
│ └── 🧠 vosk-model-small-en-us-0.15/ # Fast model
|
|
│
|
|
├── ⚙️ pyproject.toml # Python dependencies
|
|
├── ⚙️ uv.lock # Dependency lock file
|
|
├── ⚙️ .python-version # Python version
|
|
├── ⚙️ dictation.service # systemd service config
|
|
├── ⚙️ .gitignore # Git ignore rules
|
|
└── ⚙️ .venv/ # Python virtual environment
|
|
```
|
|
|
|
## 🎯 **Key Features by Directory**
|
|
|
|
### **src/** - Core Application Logic
|
|
- **Main Service**: `ai_dictation_simple.py` (currently active)
|
|
- **VLLM Integration**: OpenAI-compatible API client
|
|
- **TTS Engine**: Text-to-speech synthesis
|
|
- **Conversation Manager**: Persistent context management
|
|
- **Audio Processing**: Real-time speech recognition
|
|
|
|
### **scripts/** - System Integration
|
|
- **Keybinding Setup**: Super+Alt+D for AI conversation, Alt+D for dictation
|
|
- **Service Management**: systemd service configuration
|
|
- **Model Switching**: Easy switching between VOSK models
|
|
- **Mode Toggling**: Scripts to start/stop dictation and conversation modes
|
|
|
|
### **tests/** - Comprehensive Testing
|
|
- **100+ Test Cases**: Covering all functionality
|
|
- **Integration Tests**: VLLM, audio, and system integration
|
|
- **Performance Tests**: Response time and resource usage
|
|
- **Error Handling**: Failure and recovery scenarios
|
|
|
|
### **docs/** - Documentation
|
|
- **User Guide**: Complete setup and usage instructions
|
|
- **Test Results**: Comprehensive testing coverage report
|
|
- **Installation**: Step-by-step setup instructions
|
|
|
|
## 🚀 **Quick Start Commands**
|
|
|
|
```bash
|
|
# Setup keybindings (Super+Alt+D for AI, Alt+D for dictation)
|
|
./scripts/setup-dual-keybindings.sh
|
|
|
|
# Start service with sudo fix
|
|
./scripts/fix_service.sh
|
|
|
|
# Test VLLM integration
|
|
python tests/test_vllm_integration.py
|
|
|
|
# Run all tests
|
|
cd tests && ./run_all_tests.sh
|
|
|
|
# Switch speech recognition models
|
|
./scripts/switch-model.sh
|
|
```
|
|
|
|
## 🔧 **Configuration**
|
|
|
|
### **Keybindings:**
|
|
- **Super+Alt+D**: AI conversation mode (with persistent context)
|
|
- **Alt+D**: Traditional dictation mode
|
|
|
|
### **Models:**
|
|
- **Speech**: VOSK models from `~/.shared/models/vosk-models/`
|
|
- **AI**: Qwen/Qwen2.5-7B-Instruct-GPTQ-Int4 (VLLM)
|
|
|
|
### **API Endpoints:**
|
|
- **VLLM**: `http://127.0.0.1:8000/v1`
|
|
- **API Key**: `vllm-api-key`
|
|
|
|
## 📊 **Clean Project Benefits**
|
|
|
|
### **✅ Organization:**
|
|
- **Logical Structure**: Separate concerns into distinct directories
|
|
- **Easy Navigation**: Clear purpose for each directory
|
|
- **Scalable**: Easy to add new features and tests
|
|
|
|
### **✅ Maintainability:**
|
|
- **Modular Code**: Independent components and services
|
|
- **Version Control**: Clean git history without clutter
|
|
- **Testing Isolation**: Tests separate from production code
|
|
|
|
### **✅ Deployment:**
|
|
- **Service Ready**: systemd configuration included
|
|
- **Shared Resources**: Models in shared directory for multi-project use
|
|
- **Dependency Management**: uv package manager with lock file
|
|
|
|
---
|
|
|
|
**🎉 Your AI Dictation Service is now perfectly organized and ready for production use!**
|
|
|
|
The clean structure makes it easy to maintain, extend, and deploy your conversational AI phone call system with persistent conversation context. |