dictation-service/PROJECT_STRUCTURE.md
Kade Heyborne 73a15d03cd
Fix dictation service: state detection, async processing, and performance optimizations
- Fix state detection priority: dictation now takes precedence over conversation
- Fix critical bug: event loop was created but never started, preventing async coroutines from executing
- Optimize audio processing: reorder AcceptWaveform/PartialResult checks
- Switch to faster Vosk model: vosk-model-en-us-0.22-lgraph for 2-3x speed improvement
- Reduce block size from 8000 to 4000 for lower latency
- Add filtering to remove spurious 'the', 'a', 'an' words from start/end of transcriptions
- Update toggle-dictation.sh to properly clean up conversation lock file
- Improve batch audio processing for better responsiveness
2025-12-04 11:49:07 -07:00

5.6 KiB

AI Dictation Service - Clean Project Structure

📁 Directory Organization

dictation-service/
├── 📁 src/
│   └── 📁 dictation_service/
│       ├── 🔧 ai_dictation_simple.py      # Main AI dictation service (ACTIVE)
│       ├── 🔧 ai_dictation.py             # Full version with GTK GUI
│       ├── 🔧 enhanced_dictation.py       # Original enhanced dictation
│       ├── 🔧 vosk_dictation.py           # Basic dictation
│       └── 🔧 main.py                     # Entry point
│
├── 📁 scripts/
│   ├── 🔧 fix_service.sh                  # Service setup with sudo
│   ├── 🔧 setup-dual-keybindings.sh       # Alt+D & Super+Alt+D setup
│   ├── 🔧 setup_super_d_manual.sh         # Manual Super+Alt+D setup
│   ├── 🔧 setup-keybindings.sh            # Original Alt+D setup
│   ├── 🔧 setup-keybindings-manual.sh     # Manual setup
│   ├── 🔧 switch-model.sh                 # Model switching tool
│   ├── 🔧 toggle-conversation.sh          # Conversation mode toggle
│   └── 🔧 toggle-dictation.sh             # Dictation mode toggle
│
├── 📁 tests/
│   ├── 🔧 run_all_tests.sh                # Comprehensive test runner
│   ├── 🔧 test_original_dictation.py      # Original dictation tests
│   ├── 🔧 test_suite.py                   # AI conversation tests
│   ├── 🔧 test_vllm_integration.py        # VLLM integration tests
│   ├── 🔧 test_imports.py                 # Import tests
│   └── 🔧 test_run.py                     # Runtime tests
│
├── 📁 docs/
│   ├── 📖 AI_DICTATION_GUIDE.md            # Complete user guide
│   ├── 📖 INSTALL.md                      # Installation instructions
│   ├── 📖 TESTING_SUMMARY.md              # Test coverage overview
│   ├── 📖 TEST_RESULTS_AND_FIXES.md       # Test results and fixes
│   ├── 📖 README.md                       # Project overview
│   └── 📖 CLAUDE.md                       # Claude configuration
│
├── 📁 ~/.shared/models/vosk-models/       # Shared model directory
│   ├── 🧠 vosk-model-en-us-0.22/          # Best accuracy model
│   ├── 🧠 vosk-model-en-us-0.22-lgraph/   # Good balance model
│   └── 🧠 vosk-model-small-en-us-0.15/    # Fast model
│
├── ⚙️ pyproject.toml                      # Python dependencies
├── ⚙️ uv.lock                             # Dependency lock file
├── ⚙️ .python-version                     # Python version
├── ⚙️ dictation.service                   # systemd service config
├── ⚙️ .gitignore                          # Git ignore rules
└── ⚙️ .venv/                              # Python virtual environment

🎯 Key Features by Directory

src/ - Core Application Logic

  • Main Service: ai_dictation_simple.py (currently active)
  • VLLM Integration: OpenAI-compatible API client
  • TTS Engine: Text-to-speech synthesis
  • Conversation Manager: Persistent context management
  • Audio Processing: Real-time speech recognition

scripts/ - System Integration

  • Keybinding Setup: Super+Alt+D for AI conversation, Alt+D for dictation
  • Service Management: systemd service configuration
  • Model Switching: Easy switching between VOSK models
  • Mode Toggling: Scripts to start/stop dictation and conversation modes

tests/ - Comprehensive Testing

  • 100+ Test Cases: Covering all functionality
  • Integration Tests: VLLM, audio, and system integration
  • Performance Tests: Response time and resource usage
  • Error Handling: Failure and recovery scenarios

docs/ - Documentation

  • User Guide: Complete setup and usage instructions
  • Test Results: Comprehensive testing coverage report
  • Installation: Step-by-step setup instructions

🚀 Quick Start Commands

# Setup keybindings (Super+Alt+D for AI, Alt+D for dictation)
./scripts/setup-dual-keybindings.sh

# Start service with sudo fix
./scripts/fix_service.sh

# Test VLLM integration
python tests/test_vllm_integration.py

# Run all tests
cd tests && ./run_all_tests.sh

# Switch speech recognition models
./scripts/switch-model.sh

🔧 Configuration

Keybindings:

  • Super+Alt+D: AI conversation mode (with persistent context)
  • Alt+D: Traditional dictation mode

Models:

  • Speech: VOSK models from ~/.shared/models/vosk-models/
  • AI: Qwen/Qwen2.5-7B-Instruct-GPTQ-Int4 (VLLM)

API Endpoints:

  • VLLM: http://127.0.0.1:8000/v1
  • API Key: vllm-api-key

📊 Clean Project Benefits

Organization:

  • Logical Structure: Separate concerns into distinct directories
  • Easy Navigation: Clear purpose for each directory
  • Scalable: Easy to add new features and tests

Maintainability:

  • Modular Code: Independent components and services
  • Version Control: Clean git history without clutter
  • Testing Isolation: Tests separate from production code

Deployment:

  • Service Ready: systemd configuration included
  • Shared Resources: Models in shared directory for multi-project use
  • Dependency Management: uv package manager with lock file

🎉 Your AI Dictation Service is now perfectly organized and ready for production use!

The clean structure makes it easy to maintain, extend, and deploy your conversational AI phone call system with persistent conversation context.