Fix dictation service: state detection, async processing, and performance optimizations

- Fix state detection priority: dictation now takes precedence over conversation
- Fix critical bug: event loop was created but never started, preventing async coroutines from executing
- Optimize audio processing: reorder AcceptWaveform/PartialResult checks
- Switch to faster Vosk model: vosk-model-en-us-0.22-lgraph for 2-3x speed improvement
- Reduce block size from 8000 to 4000 for lower latency
- Add filtering to remove spurious 'the', 'a', 'an' words from start/end of transcriptions
- Update toggle-dictation.sh to properly clean up conversation lock file
- Improve batch audio processing for better responsiveness

2025-12-04 11:49:07 -07:00

5.6 KiB

Raw Blame History

AI Dictation Service - Clean Project Structure

📁 Directory Organization

dictation-service/
├── 📁 src/
│   └── 📁 dictation_service/
│       ├── 🔧 ai_dictation_simple.py      # Main AI dictation service (ACTIVE)
│       ├── 🔧 ai_dictation.py             # Full version with GTK GUI
│       ├── 🔧 enhanced_dictation.py       # Original enhanced dictation
│       ├── 🔧 vosk_dictation.py           # Basic dictation
│       └── 🔧 main.py                     # Entry point
│
├── 📁 scripts/
│   ├── 🔧 fix_service.sh                  # Service setup with sudo
│   ├── 🔧 setup-dual-keybindings.sh       # Alt+D & Super+Alt+D setup
│   ├── 🔧 setup_super_d_manual.sh         # Manual Super+Alt+D setup
│   ├── 🔧 setup-keybindings.sh            # Original Alt+D setup
│   ├── 🔧 setup-keybindings-manual.sh     # Manual setup
│   ├── 🔧 switch-model.sh                 # Model switching tool
│   ├── 🔧 toggle-conversation.sh          # Conversation mode toggle
│   └── 🔧 toggle-dictation.sh             # Dictation mode toggle
│
├── 📁 tests/
│   ├── 🔧 run_all_tests.sh                # Comprehensive test runner
│   ├── 🔧 test_original_dictation.py      # Original dictation tests
│   ├── 🔧 test_suite.py                   # AI conversation tests
│   ├── 🔧 test_vllm_integration.py        # VLLM integration tests
│   ├── 🔧 test_imports.py                 # Import tests
│   └── 🔧 test_run.py                     # Runtime tests
│
├── 📁 docs/
│   ├── 📖 AI_DICTATION_GUIDE.md            # Complete user guide
│   ├── 📖 INSTALL.md                      # Installation instructions
│   ├── 📖 TESTING_SUMMARY.md              # Test coverage overview
│   ├── 📖 TEST_RESULTS_AND_FIXES.md       # Test results and fixes
│   ├── 📖 README.md                       # Project overview
│   └── 📖 CLAUDE.md                       # Claude configuration
│
├── 📁 ~/.shared/models/vosk-models/       # Shared model directory
│   ├── 🧠 vosk-model-en-us-0.22/          # Best accuracy model
│   ├── 🧠 vosk-model-en-us-0.22-lgraph/   # Good balance model
│   └── 🧠 vosk-model-small-en-us-0.15/    # Fast model
│
├── ⚙️ pyproject.toml                      # Python dependencies
├── ⚙️ uv.lock                             # Dependency lock file
├── ⚙️ .python-version                     # Python version
├── ⚙️ dictation.service                   # systemd service config
├── ⚙️ .gitignore                          # Git ignore rules
└── ⚙️ .venv/                              # Python virtual environment

🎯 Key Features by Directory

src/ - Core Application Logic

Main Service: ai_dictation_simple.py (currently active)
VLLM Integration: OpenAI-compatible API client
TTS Engine: Text-to-speech synthesis
Conversation Manager: Persistent context management
Audio Processing: Real-time speech recognition

scripts/ - System Integration

Keybinding Setup: Super+Alt+D for AI conversation, Alt+D for dictation
Service Management: systemd service configuration
Model Switching: Easy switching between VOSK models
Mode Toggling: Scripts to start/stop dictation and conversation modes

tests/ - Comprehensive Testing

100+ Test Cases: Covering all functionality
Integration Tests: VLLM, audio, and system integration
Performance Tests: Response time and resource usage
Error Handling: Failure and recovery scenarios

docs/ - Documentation

User Guide: Complete setup and usage instructions
Test Results: Comprehensive testing coverage report
Installation: Step-by-step setup instructions

🚀 Quick Start Commands

# Setup keybindings (Super+Alt+D for AI, Alt+D for dictation)
./scripts/setup-dual-keybindings.sh

# Start service with sudo fix
./scripts/fix_service.sh

# Test VLLM integration
python tests/test_vllm_integration.py

# Run all tests
cd tests && ./run_all_tests.sh

# Switch speech recognition models
./scripts/switch-model.sh

🔧 Configuration

Keybindings:

Super+Alt+D: AI conversation mode (with persistent context)
Alt+D: Traditional dictation mode

Models:

Speech: VOSK models from ~/.shared/models/vosk-models/
AI: Qwen/Qwen2.5-7B-Instruct-GPTQ-Int4 (VLLM)

API Endpoints:

VLLM: http://127.0.0.1:8000/v1
API Key: vllm-api-key

📊 Clean Project Benefits

✅ Organization:

Logical Structure: Separate concerns into distinct directories
Easy Navigation: Clear purpose for each directory
Scalable: Easy to add new features and tests

✅ Maintainability:

Modular Code: Independent components and services
Version Control: Clean git history without clutter
Testing Isolation: Tests separate from production code

✅ Deployment:

Service Ready: systemd configuration included
Shared Resources: Models in shared directory for multi-project use
Dependency Management: uv package manager with lock file

🎉 Your AI Dictation Service is now perfectly organized and ready for production use!

The clean structure makes it easy to maintain, extend, and deploy your conversational AI phone call system with persistent conversation context.

5.6 KiB Raw Blame History