dictation-service/CHANGES.md
Kade Heyborne 71c305a201
Major refactoring: v0.2.0 - Simplify to core dictation & read-aloud features
This is a comprehensive refactoring that transforms the dictation service from a
complex multi-mode application into two clean, focused features:
1. Voice dictation with system tray icon
2. On-demand read-aloud via Ctrl+middle-click

## Key Changes

### Dictation Service Enhancements
- Add GTK/AppIndicator3 system tray icon for visual status
- Remove all notification spam (dictation start/stop/status)
- Icon states: microphone-muted (OFF) → microphone-high (ON)
- Click tray icon to toggle dictation (same as Alt+D)
- Simplify ai_dictation_simple.py by removing conversation mode

### Read-Aloud Service Redesign
- Replace automatic clipboard reader with on-demand Ctrl+middle-click
- New middle_click_reader.py service
- Works anywhere: highlight text, Ctrl+middle-click to read
- Uses Edge-TTS (Christopher voice) with mpv playback
- Lock file prevents feedback with dictation service

### Conversation Mode Removed
- Delete all VLLM/conversation code (VLLMClient, ConversationManager, TTS)
- Archive 5 old implementations to archive/old_implementations/
- Remove conversation-related scripts and services
- Clean separation of concerns for future reintegration if needed

### Dependencies Cleanup
- Remove: openai, aiohttp, pyttsx3, requests (conversation deps)
- Keep: PyGObject, pynput, sounddevice, vosk, numpy, edge-tts
- Net reduction: 4 packages removed, 6 core packages retained

### Testing Improvements
- Add test_dictation_service.py (8 tests) 
- Add test_middle_click.py (11 tests) 
- Fix test_run.py to use correct model path
- Total: 19 unit tests passing
- Delete obsolete test files (test_suite, test_vllm_integration, etc.)

### Documentation
- Add CHANGES.md with complete changelog
- Add docs/MIGRATION_GUIDE.md for upgrading
- Add README.md with quick start guide
- Update docs/README.md with current features only
- Add justfile for common tasks

### New Services & Scripts
- Add middle-click-reader.service (systemd)
- Add scripts/setup-middle-click-reader.sh
- Add desktop files for autostart
- Remove toggle-conversation.sh (obsolete)

## Impact

**Code Quality**
- Net change: -6,007 lines (596 added, 6,603 deleted)
- Simpler architecture, easier maintenance
- Better test coverage (19 tests vs mixed before)
- Cleaner separation of concerns

**User Experience**
- No notification spam during dictation
- Clean visual status via tray icon
- Full control over read-aloud (no unwanted readings)
- Better performance (fewer background processes)

**Privacy**
- No conversation data stored
- No VLLM connection needed
- All processing local except Edge-TTS text

## Migration Notes

Users upgrading should:
1. Run `uv sync` to update dependencies
2. Restart dictation.service to get tray icon
3. Run scripts/setup-middle-click-reader.sh for new read-aloud
4. Remove old read-aloud.service if present

See docs/MIGRATION_GUIDE.md for details.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-10 19:11:06 -07:00

8.4 KiB

Changes Summary

Overview

Complete refactoring of the dictation service to focus on two core features:

  1. Voice Dictation with system tray icon
  2. On-Demand Read-Aloud via middle-click

All conversation mode functionality has been removed as requested.


Completed Changes

1. Dictation Service Enhancements

System Tray Icon Integration

  • Added: GTK/AppIndicator3-based system tray icon
  • Icon States:
    • OFF: microphone-sensitivity-muted
    • ON: microphone-sensitivity-high
  • Features:
    • Click to toggle dictation (same as Alt+D)
    • Visual status indicator
    • Quit option from tray menu

Notification Removal

  • Removed all dictation notifications:
    • "Dictation Active" → Now shown via tray icon
    • "Dictating... (N words)" → Silent operation
    • "Dictation Complete" → Silent operation
    • "Dictation Stopped" → Shown via tray icon state
  • Kept: Error notifications (typing errors, etc.)

Code Simplification

  • File: src/dictation_service/ai_dictation_simple.py
  • Removed: All conversation mode logic
    • VLLMClient class
    • ConversationManager class
    • TTSManager for conversations
    • AppState enum (simplified to boolean)
    • Persistent conversation history
  • Kept: Core dictation functionality only

2. Read-Aloud Service Redesign

Removed Automatic Service

  • Deleted: Old read_aloud_service.py (automatic reader)
  • Deleted: System tray service for read-aloud
  • Deleted: Toggle scripts for old service

New Middle-Click Implementation

  • Created: src/dictation_service/middle_click_reader.py
  • Trigger: Middle-click (scroll wheel press) on selected text
  • Features:
    • On-demand only (no automatic reading)
    • Works in any application
    • Uses Edge-TTS (Christopher voice)
    • Lock file prevents feedback with dictation
    • Lightweight (runs in background)

3. Dependencies Cleanup

Removed from pyproject.toml:

  • openai>=1.0.0 (conversation mode)
  • aiohttp>=3.8.0 (async API calls)
  • pyttsx3>=2.90 (local TTS for conversations)
  • requests>=2.28.0 (HTTP requests)

Kept:

  • PyGObject>=3.42.0 (system tray)
  • pynput>=1.8.1 (mouse events)
  • sounddevice>=0.5.3 (audio)
  • vosk>=0.3.45 (speech recognition)
  • numpy>=2.3.5 (audio processing)
  • edge-tts>=7.2.3 (read-aloud TTS)

4. File Cleanup

Deleted (11 deprecated files):

docs/AI_DICTATION_GUIDE.md.deprecated
docs/READ_ALOUD_GUIDE.md.deprecated
tests/test_vllm_integration.py.deprecated
tests/test_suite.py.deprecated
tests/test_original_dictation.py.deprecated
tests/test_read_aloud.py.deprecated
read-aloud.service.deprecated
scripts/toggle-conversation.sh.deprecated
scripts/toggle-read-aloud.sh.deprecated
scripts/setup-read-aloud.sh.deprecated
src/dictation_service/read_aloud_service.py.deprecated

Archived (5 old implementations):

archive/old_implementations/
├── ai_dictation.py (full version with GUI)
├── enhanced_dictation.py (original enhanced)
├── new_dictation.py (experimental)
├── streaming_dictation.py (streaming focus)
└── vosk_dictation.py (basic version)

5. New Documentation

Created:

  • README.md - Project overview and quick start
  • docs/README.md - Complete guide for current features
  • docs/MIGRATION_GUIDE.md - Migration from old version
  • CHANGES.md - This file

Updated:

  • Removed all conversation mode references
  • Updated installation instructions
  • Added middle-click reader setup
  • Simplified architecture diagrams

6. Test Suite Overhaul

New Tests:

  • tests/test_dictation_service.py - 8 tests for dictation
  • tests/test_middle_click.py - 11 tests for read-aloud
  • Total: 19 tests, all passing

Test Coverage:

  • Dictation core functionality
  • System tray icon integration
  • Lock file management
  • Audio processing
  • Middle-click detection
  • Edge-TTS integration
  • Text selection handling
  • Concurrent reading prevention

7. New Services & Scripts

Created:

  • middle-click-reader.service - Systemd service
  • scripts/setup-middle-click-reader.sh - Installation script

Kept:

  • dictation.service - Main dictation service
  • scripts/setup-keybindings.sh - Alt+D keybinding
  • scripts/toggle-dictation.sh - Manual toggle

Current Project Structure

dictation-service/
├── src/dictation_service/
│   ├── __init__.py
│   ├── ai_dictation_simple.py      # Main dictation service
│   ├── middle_click_reader.py      # Read-aloud service
│   └── main.py
├── tests/
│   ├── test_dictation_service.py   # 8 tests ✅
│   ├── test_middle_click.py        # 11 tests ✅
│   ├── test_e2e.py                 # End-to-end tests
│   ├── test_imports.py             # Import validation
│   └── test_run.py                 # Runtime tests
├── scripts/
│   ├── setup-keybindings.sh
│   ├── setup-middle-click-reader.sh
│   ├── toggle-dictation.sh
│   └── switch-model.sh
├── docs/
│   ├── README.md                   # Complete guide
│   ├── MIGRATION_GUIDE.md
│   ├── INSTALL.md
│   └── TESTING_SUMMARY.md
├── archive/
│   └── old_implementations/        # 5 archived files
├── dictation.service
├── middle-click-reader.service
├── README.md                       # Quick start
├── CHANGES.md                      # This file
└── pyproject.toml                  # v0.2.0


Feature Comparison

Feature Before After
Dictation Notifications System tray icon
Read-Aloud Automatic polling Middle-click on-demand
Conversation Mode Included Removed completely
Dependencies 10 packages 6 packages
Source Files 9 Python files 4 Python files
Test Files 6 test files 5 test files
Tests Passing Mixed 19/19
Documentation Conversation-focused Dictation+Read-Aloud focused

How to Use

Dictation

  1. Look for microphone icon in system tray
  2. Press Alt+D or click icon → Icon turns "on"
  3. Speak → Text is typed
  4. Press Alt+D or click icon → Icon turns "off"
  5. No notifications - status shown in tray only

Read-Aloud

  1. Highlight any text
  2. Middle-click (press scroll wheel)
  3. Text is read aloud
  4. Always ready - no enable/disable needed

Testing

All tests pass successfully:

# Run all tests
uv run python tests/test_dictation_service.py -v  # 8 tests ✅
uv run python tests/test_middle_click.py -v       # 11 tests ✅

# Results:
# - Dictation: 8/8 passed
# - Middle-click: 11/11 passed
# - Total: 19/19 passed ✅

Installation

# 1. Sync dependencies
uv sync

# 2. Setup dictation
./scripts/setup-keybindings.sh
systemctl --user enable --now dictation.service

# 3. Setup read-aloud (optional)
./scripts/setup-middle-click-reader.sh

# 4. Verify
systemctl --user status dictation.service
systemctl --user status middle-click-reader

Benefits

User Experience

No notification spam Clean visual status (tray icon) Full control over read-aloud Simple, focused features Better performance

Code Quality

Reduced complexity (removed 5000+ lines) Fewer dependencies Better test coverage Cleaner architecture Easier to maintain

Privacy

No conversation data stored No VLLM connection needed All processing local Minimal external calls (only Edge-TTS text)


Next Steps (Optional)

If you want to add conversation mode back in the future:

  1. It will be a separate application (as you mentioned)
  2. Can reuse the Vosk speech recognition from this service
  3. Can integrate via D-Bus or similar IPC
  4. Old conversation code is in git history if needed

Version

  • Before: v0.1.0 (conversation-focused)
  • After: v0.2.0 (dictation+read-aloud focused)

Summary

This refactoring successfully transformed the dictation service from a complex multi-mode application into two clean, focused features:

  1. Dictation: Voice-to-text with visual tray icon feedback
  2. Read-Aloud: On-demand text-to-speech via middle-click

All conversation mode functionality has been cleanly removed, the codebase has been simplified, dependencies reduced, and comprehensive tests added. The project is now cleaner, more maintainable, and focused on doing two things very well.