Major refactoring: v0.2.0 - Simplify to core dictation & read-aloud features

This is a comprehensive refactoring that transforms the dictation service from a
complex multi-mode application into two clean, focused features:
1. Voice dictation with system tray icon
2. On-demand read-aloud via Ctrl+middle-click

## Key Changes

### Dictation Service Enhancements
- Add GTK/AppIndicator3 system tray icon for visual status
- Remove all notification spam (dictation start/stop/status)
- Icon states: microphone-muted (OFF) → microphone-high (ON)
- Click tray icon to toggle dictation (same as Alt+D)
- Simplify ai_dictation_simple.py by removing conversation mode

### Read-Aloud Service Redesign
- Replace automatic clipboard reader with on-demand Ctrl+middle-click
- New middle_click_reader.py service
- Works anywhere: highlight text, Ctrl+middle-click to read
- Uses Edge-TTS (Christopher voice) with mpv playback
- Lock file prevents feedback with dictation service

### Conversation Mode Removed
- Delete all VLLM/conversation code (VLLMClient, ConversationManager, TTS)
- Archive 5 old implementations to archive/old_implementations/
- Remove conversation-related scripts and services
- Clean separation of concerns for future reintegration if needed

### Dependencies Cleanup
- Remove: openai, aiohttp, pyttsx3, requests (conversation deps)
- Keep: PyGObject, pynput, sounddevice, vosk, numpy, edge-tts
- Net reduction: 4 packages removed, 6 core packages retained

### Testing Improvements
- Add test_dictation_service.py (8 tests) ✅
- Add test_middle_click.py (11 tests) ✅
- Fix test_run.py to use correct model path
- Total: 19 unit tests passing
- Delete obsolete test files (test_suite, test_vllm_integration, etc.)

### Documentation
- Add CHANGES.md with complete changelog
- Add docs/MIGRATION_GUIDE.md for upgrading
- Add README.md with quick start guide
- Update docs/README.md with current features only
- Add justfile for common tasks

### New Services & Scripts
- Add middle-click-reader.service (systemd)
- Add scripts/setup-middle-click-reader.sh
- Add desktop files for autostart
- Remove toggle-conversation.sh (obsolete)

## Impact

**Code Quality**
- Net change: -6,007 lines (596 added, 6,603 deleted)
- Simpler architecture, easier maintenance
- Better test coverage (19 tests vs mixed before)
- Cleaner separation of concerns

**User Experience**
- No notification spam during dictation
- Clean visual status via tray icon
- Full control over read-aloud (no unwanted readings)
- Better performance (fewer background processes)

**Privacy**
- No conversation data stored
- No VLLM connection needed
- All processing local except Edge-TTS text

## Migration Notes

Users upgrading should:
1. Run `uv sync` to update dependencies
2. Restart dictation.service to get tray icon
3. Run scripts/setup-middle-click-reader.sh for new read-aloud
4. Remove old read-aloud.service if present

See docs/MIGRATION_GUIDE.md for details.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2025-12-10 19:11:06 -07:00

8.4 KiB

Raw Blame History

Changes Summary

Overview

Complete refactoring of the dictation service to focus on two core features:

Voice Dictation with system tray icon
On-Demand Read-Aloud via middle-click

All conversation mode functionality has been removed as requested.

✅ Completed Changes

1. Dictation Service Enhancements

System Tray Icon Integration

Added: GTK/AppIndicator3-based system tray icon
Icon States:
- OFF: microphone-sensitivity-muted
- ON: microphone-sensitivity-high
Features:
- Click to toggle dictation (same as Alt+D)
- Visual status indicator
- Quit option from tray menu

Notification Removal

Removed all dictation notifications:
- "Dictation Active" → Now shown via tray icon
- "Dictating... (N words)" → Silent operation
- "Dictation Complete" → Silent operation
- "Dictation Stopped" → Shown via tray icon state
Kept: Error notifications (typing errors, etc.)

Code Simplification

File: src/dictation_service/ai_dictation_simple.py
Removed: All conversation mode logic
- VLLMClient class
- ConversationManager class
- TTSManager for conversations
- AppState enum (simplified to boolean)
- Persistent conversation history
Kept: Core dictation functionality only

2. Read-Aloud Service Redesign

Removed Automatic Service

Deleted: Old read_aloud_service.py (automatic reader)
Deleted: System tray service for read-aloud
Deleted: Toggle scripts for old service

New Middle-Click Implementation

Created: src/dictation_service/middle_click_reader.py
Trigger: Middle-click (scroll wheel press) on selected text
Features:
- On-demand only (no automatic reading)
- Works in any application
- Uses Edge-TTS (Christopher voice)
- Lock file prevents feedback with dictation
- Lightweight (runs in background)

3. Dependencies Cleanup

Removed from `pyproject.toml`:

openai>=1.0.0 (conversation mode)
aiohttp>=3.8.0 (async API calls)
pyttsx3>=2.90 (local TTS for conversations)
requests>=2.28.0 (HTTP requests)

Kept:

PyGObject>=3.42.0 (system tray)
pynput>=1.8.1 (mouse events)
sounddevice>=0.5.3 (audio)
vosk>=0.3.45 (speech recognition)
numpy>=2.3.5 (audio processing)
edge-tts>=7.2.3 (read-aloud TTS)

4. File Cleanup

Deleted (11 deprecated files):

docs/AI_DICTATION_GUIDE.md.deprecated
docs/READ_ALOUD_GUIDE.md.deprecated
tests/test_vllm_integration.py.deprecated
tests/test_suite.py.deprecated
tests/test_original_dictation.py.deprecated
tests/test_read_aloud.py.deprecated
read-aloud.service.deprecated
scripts/toggle-conversation.sh.deprecated
scripts/toggle-read-aloud.sh.deprecated
scripts/setup-read-aloud.sh.deprecated
src/dictation_service/read_aloud_service.py.deprecated

Archived (5 old implementations):

archive/old_implementations/
├── ai_dictation.py (full version with GUI)
├── enhanced_dictation.py (original enhanced)
├── new_dictation.py (experimental)
├── streaming_dictation.py (streaming focus)
└── vosk_dictation.py (basic version)

5. New Documentation

Created:

README.md - Project overview and quick start
docs/README.md - Complete guide for current features
docs/MIGRATION_GUIDE.md - Migration from old version
CHANGES.md - This file

Updated:

Removed all conversation mode references
Updated installation instructions
Added middle-click reader setup
Simplified architecture diagrams

6. Test Suite Overhaul

New Tests:

tests/test_dictation_service.py - 8 tests for dictation
tests/test_middle_click.py - 11 tests for read-aloud
Total: 19 tests, all passing ✅

Test Coverage:

Dictation core functionality
System tray icon integration
Lock file management
Audio processing
Middle-click detection
Edge-TTS integration
Text selection handling
Concurrent reading prevention

7. New Services & Scripts

Created:

middle-click-reader.service - Systemd service
scripts/setup-middle-click-reader.sh - Installation script

Kept:

dictation.service - Main dictation service
scripts/setup-keybindings.sh - Alt+D keybinding
scripts/toggle-dictation.sh - Manual toggle

Current Project Structure

dictation-service/
├── src/dictation_service/
│   ├── __init__.py
│   ├── ai_dictation_simple.py      # Main dictation service
│   ├── middle_click_reader.py      # Read-aloud service
│   └── main.py
├── tests/
│   ├── test_dictation_service.py   # 8 tests ✅
│   ├── test_middle_click.py        # 11 tests ✅
│   ├── test_e2e.py                 # End-to-end tests
│   ├── test_imports.py             # Import validation
│   └── test_run.py                 # Runtime tests
├── scripts/
│   ├── setup-keybindings.sh
│   ├── setup-middle-click-reader.sh
│   ├── toggle-dictation.sh
│   └── switch-model.sh
├── docs/
│   ├── README.md                   # Complete guide
│   ├── MIGRATION_GUIDE.md
│   ├── INSTALL.md
│   └── TESTING_SUMMARY.md
├── archive/
│   └── old_implementations/        # 5 archived files
├── dictation.service
├── middle-click-reader.service
├── README.md                       # Quick start
├── CHANGES.md                      # This file
└── pyproject.toml                  # v0.2.0

Feature Comparison

Feature	Before	After
Dictation	Notifications	System tray icon
Read-Aloud	Automatic polling	Middle-click on-demand
Conversation Mode	✅ Included	❌ Removed completely
Dependencies	10 packages	6 packages
Source Files	9 Python files	4 Python files
Test Files	6 test files	5 test files
Tests Passing	Mixed	19/19 ✅
Documentation	Conversation-focused	Dictation+Read-Aloud focused

How to Use

Dictation

Look for microphone icon in system tray
Press Alt+D or click icon → Icon turns "on"
Speak → Text is typed
Press Alt+D or click icon → Icon turns "off"
No notifications - status shown in tray only

Read-Aloud

Highlight any text
Middle-click (press scroll wheel)
Text is read aloud
Always ready - no enable/disable needed

Testing

All tests pass successfully:

# Run all tests
uv run python tests/test_dictation_service.py -v  # 8 tests ✅
uv run python tests/test_middle_click.py -v       # 11 tests ✅

# Results:
# - Dictation: 8/8 passed
# - Middle-click: 11/11 passed
# - Total: 19/19 passed ✅

Installation

# 1. Sync dependencies
uv sync

# 2. Setup dictation
./scripts/setup-keybindings.sh
systemctl --user enable --now dictation.service

# 3. Setup read-aloud (optional)
./scripts/setup-middle-click-reader.sh

# 4. Verify
systemctl --user status dictation.service
systemctl --user status middle-click-reader

Benefits

User Experience

✅ No notification spam ✅ Clean visual status (tray icon) ✅ Full control over read-aloud ✅ Simple, focused features ✅ Better performance

Code Quality

✅ Reduced complexity (removed 5000+ lines) ✅ Fewer dependencies ✅ Better test coverage ✅ Cleaner architecture ✅ Easier to maintain

Privacy

✅ No conversation data stored ✅ No VLLM connection needed ✅ All processing local ✅ Minimal external calls (only Edge-TTS text)

Next Steps (Optional)

If you want to add conversation mode back in the future:

It will be a separate application (as you mentioned)
Can reuse the Vosk speech recognition from this service
Can integrate via D-Bus or similar IPC
Old conversation code is in git history if needed

Version

Before: v0.1.0 (conversation-focused)
After: v0.2.0 (dictation+read-aloud focused)

Summary

This refactoring successfully transformed the dictation service from a complex multi-mode application into two clean, focused features:

Dictation: Voice-to-text with visual tray icon feedback
Read-Aloud: On-demand text-to-speech via middle-click

All conversation mode functionality has been cleanly removed, the codebase has been simplified, dependencies reduced, and comprehensive tests added. The project is now cleaner, more maintainable, and focused on doing two things very well.

8.4 KiB Raw Blame History

Changes Summary

Overview

✅ Completed Changes

1. Dictation Service Enhancements

System Tray Icon Integration

Notification Removal

Code Simplification

2. Read-Aloud Service Redesign

Removed Automatic Service

New Middle-Click Implementation

3. Dependencies Cleanup

Removed from pyproject.toml:

Kept:

4. File Cleanup

Deleted (11 deprecated files):

Archived (5 old implementations):

5. New Documentation

Created:

Updated:

6. Test Suite Overhaul

New Tests:

Test Coverage:

7. New Services & Scripts

Created:

Kept:

Current Project Structure

Feature Comparison

How to Use

Dictation

Read-Aloud

Testing

Installation

Benefits

User Experience

Code Quality

Privacy

Next Steps (Optional)

Version

Summary

8.4 KiB

Raw Blame History

Removed from `pyproject.toml`: