dictation-service/CHANGES.md
Kade Heyborne 71c305a201
Major refactoring: v0.2.0 - Simplify to core dictation & read-aloud features
This is a comprehensive refactoring that transforms the dictation service from a
complex multi-mode application into two clean, focused features:
1. Voice dictation with system tray icon
2. On-demand read-aloud via Ctrl+middle-click

## Key Changes

### Dictation Service Enhancements
- Add GTK/AppIndicator3 system tray icon for visual status
- Remove all notification spam (dictation start/stop/status)
- Icon states: microphone-muted (OFF) → microphone-high (ON)
- Click tray icon to toggle dictation (same as Alt+D)
- Simplify ai_dictation_simple.py by removing conversation mode

### Read-Aloud Service Redesign
- Replace automatic clipboard reader with on-demand Ctrl+middle-click
- New middle_click_reader.py service
- Works anywhere: highlight text, Ctrl+middle-click to read
- Uses Edge-TTS (Christopher voice) with mpv playback
- Lock file prevents feedback with dictation service

### Conversation Mode Removed
- Delete all VLLM/conversation code (VLLMClient, ConversationManager, TTS)
- Archive 5 old implementations to archive/old_implementations/
- Remove conversation-related scripts and services
- Clean separation of concerns for future reintegration if needed

### Dependencies Cleanup
- Remove: openai, aiohttp, pyttsx3, requests (conversation deps)
- Keep: PyGObject, pynput, sounddevice, vosk, numpy, edge-tts
- Net reduction: 4 packages removed, 6 core packages retained

### Testing Improvements
- Add test_dictation_service.py (8 tests) 
- Add test_middle_click.py (11 tests) 
- Fix test_run.py to use correct model path
- Total: 19 unit tests passing
- Delete obsolete test files (test_suite, test_vllm_integration, etc.)

### Documentation
- Add CHANGES.md with complete changelog
- Add docs/MIGRATION_GUIDE.md for upgrading
- Add README.md with quick start guide
- Update docs/README.md with current features only
- Add justfile for common tasks

### New Services & Scripts
- Add middle-click-reader.service (systemd)
- Add scripts/setup-middle-click-reader.sh
- Add desktop files for autostart
- Remove toggle-conversation.sh (obsolete)

## Impact

**Code Quality**
- Net change: -6,007 lines (596 added, 6,603 deleted)
- Simpler architecture, easier maintenance
- Better test coverage (19 tests vs mixed before)
- Cleaner separation of concerns

**User Experience**
- No notification spam during dictation
- Clean visual status via tray icon
- Full control over read-aloud (no unwanted readings)
- Better performance (fewer background processes)

**Privacy**
- No conversation data stored
- No VLLM connection needed
- All processing local except Edge-TTS text

## Migration Notes

Users upgrading should:
1. Run `uv sync` to update dependencies
2. Restart dictation.service to get tray icon
3. Run scripts/setup-middle-click-reader.sh for new read-aloud
4. Remove old read-aloud.service if present

See docs/MIGRATION_GUIDE.md for details.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-10 19:11:06 -07:00

304 lines
8.4 KiB
Markdown

# Changes Summary
## Overview
Complete refactoring of the dictation service to focus on two core features:
1. **Voice Dictation** with system tray icon
2. **On-Demand Read-Aloud** via middle-click
All conversation mode functionality has been removed as requested.
---
## ✅ Completed Changes
### 1. Dictation Service Enhancements
#### System Tray Icon Integration
- **Added**: GTK/AppIndicator3-based system tray icon
- **Icon States**:
- OFF: `microphone-sensitivity-muted`
- ON: `microphone-sensitivity-high`
- **Features**:
- Click to toggle dictation (same as Alt+D)
- Visual status indicator
- Quit option from tray menu
#### Notification Removal
- **Removed all dictation notifications**:
- "Dictation Active" → Now shown via tray icon
- "Dictating... (N words)" → Silent operation
- "Dictation Complete" → Silent operation
- "Dictation Stopped" → Shown via tray icon state
- **Kept**: Error notifications (typing errors, etc.)
#### Code Simplification
- **File**: `src/dictation_service/ai_dictation_simple.py`
- **Removed**: All conversation mode logic
- VLLMClient class
- ConversationManager class
- TTSManager for conversations
- AppState enum (simplified to boolean)
- Persistent conversation history
- **Kept**: Core dictation functionality only
### 2. Read-Aloud Service Redesign
#### Removed Automatic Service
- **Deleted**: Old `read_aloud_service.py` (automatic reader)
- **Deleted**: System tray service for read-aloud
- **Deleted**: Toggle scripts for old service
#### New Middle-Click Implementation
- **Created**: `src/dictation_service/middle_click_reader.py`
- **Trigger**: Middle-click (scroll wheel press) on selected text
- **Features**:
- On-demand only (no automatic reading)
- Works in any application
- Uses Edge-TTS (Christopher voice)
- Lock file prevents feedback with dictation
- Lightweight (runs in background)
### 3. Dependencies Cleanup
#### Removed from `pyproject.toml`:
- `openai>=1.0.0` (conversation mode)
- `aiohttp>=3.8.0` (async API calls)
- `pyttsx3>=2.90` (local TTS for conversations)
- `requests>=2.28.0` (HTTP requests)
#### Kept:
- `PyGObject>=3.42.0` (system tray)
- `pynput>=1.8.1` (mouse events)
- `sounddevice>=0.5.3` (audio)
- `vosk>=0.3.45` (speech recognition)
- `numpy>=2.3.5` (audio processing)
- `edge-tts>=7.2.3` (read-aloud TTS)
### 4. File Cleanup
#### Deleted (11 deprecated files):
```
docs/AI_DICTATION_GUIDE.md.deprecated
docs/READ_ALOUD_GUIDE.md.deprecated
tests/test_vllm_integration.py.deprecated
tests/test_suite.py.deprecated
tests/test_original_dictation.py.deprecated
tests/test_read_aloud.py.deprecated
read-aloud.service.deprecated
scripts/toggle-conversation.sh.deprecated
scripts/toggle-read-aloud.sh.deprecated
scripts/setup-read-aloud.sh.deprecated
src/dictation_service/read_aloud_service.py.deprecated
```
#### Archived (5 old implementations):
```
archive/old_implementations/
├── ai_dictation.py (full version with GUI)
├── enhanced_dictation.py (original enhanced)
├── new_dictation.py (experimental)
├── streaming_dictation.py (streaming focus)
└── vosk_dictation.py (basic version)
```
### 5. New Documentation
#### Created:
- `README.md` - Project overview and quick start
- `docs/README.md` - Complete guide for current features
- `docs/MIGRATION_GUIDE.md` - Migration from old version
- `CHANGES.md` - This file
#### Updated:
- Removed all conversation mode references
- Updated installation instructions
- Added middle-click reader setup
- Simplified architecture diagrams
### 6. Test Suite Overhaul
#### New Tests:
- `tests/test_dictation_service.py` - 8 tests for dictation
- `tests/test_middle_click.py` - 11 tests for read-aloud
- **Total**: 19 tests, all passing ✅
#### Test Coverage:
- Dictation core functionality
- System tray icon integration
- Lock file management
- Audio processing
- Middle-click detection
- Edge-TTS integration
- Text selection handling
- Concurrent reading prevention
### 7. New Services & Scripts
#### Created:
- `middle-click-reader.service` - Systemd service
- `scripts/setup-middle-click-reader.sh` - Installation script
#### Kept:
- `dictation.service` - Main dictation service
- `scripts/setup-keybindings.sh` - Alt+D keybinding
- `scripts/toggle-dictation.sh` - Manual toggle
---
## Current Project Structure
```
dictation-service/
├── src/dictation_service/
│ ├── __init__.py
│ ├── ai_dictation_simple.py # Main dictation service
│ ├── middle_click_reader.py # Read-aloud service
│ └── main.py
├── tests/
│ ├── test_dictation_service.py # 8 tests ✅
│ ├── test_middle_click.py # 11 tests ✅
│ ├── test_e2e.py # End-to-end tests
│ ├── test_imports.py # Import validation
│ └── test_run.py # Runtime tests
├── scripts/
│ ├── setup-keybindings.sh
│ ├── setup-middle-click-reader.sh
│ ├── toggle-dictation.sh
│ └── switch-model.sh
├── docs/
│ ├── README.md # Complete guide
│ ├── MIGRATION_GUIDE.md
│ ├── INSTALL.md
│ └── TESTING_SUMMARY.md
├── archive/
│ └── old_implementations/ # 5 archived files
├── dictation.service
├── middle-click-reader.service
├── README.md # Quick start
├── CHANGES.md # This file
└── pyproject.toml # v0.2.0
```
---
## Feature Comparison
| Feature | Before | After |
|---------|--------|-------|
| **Dictation** | Notifications | System tray icon |
| **Read-Aloud** | Automatic polling | Middle-click on-demand |
| **Conversation Mode** | ✅ Included | ❌ Removed completely |
| **Dependencies** | 10 packages | 6 packages |
| **Source Files** | 9 Python files | 4 Python files |
| **Test Files** | 6 test files | 5 test files |
| **Tests Passing** | Mixed | 19/19 ✅ |
| **Documentation** | Conversation-focused | Dictation+Read-Aloud focused |
---
## How to Use
### Dictation
1. Look for microphone icon in system tray
2. Press `Alt+D` or click icon → Icon turns "on"
3. Speak → Text is typed
4. Press `Alt+D` or click icon → Icon turns "off"
5. **No notifications** - status shown in tray only
### Read-Aloud
1. Highlight any text
2. Middle-click (press scroll wheel)
3. Text is read aloud
4. **Always ready** - no enable/disable needed
---
## Testing
All tests pass successfully:
```bash
# Run all tests
uv run python tests/test_dictation_service.py -v # 8 tests ✅
uv run python tests/test_middle_click.py -v # 11 tests ✅
# Results:
# - Dictation: 8/8 passed
# - Middle-click: 11/11 passed
# - Total: 19/19 passed ✅
```
---
## Installation
```bash
# 1. Sync dependencies
uv sync
# 2. Setup dictation
./scripts/setup-keybindings.sh
systemctl --user enable --now dictation.service
# 3. Setup read-aloud (optional)
./scripts/setup-middle-click-reader.sh
# 4. Verify
systemctl --user status dictation.service
systemctl --user status middle-click-reader
```
---
## Benefits
### User Experience
✅ No notification spam
✅ Clean visual status (tray icon)
✅ Full control over read-aloud
✅ Simple, focused features
✅ Better performance
### Code Quality
✅ Reduced complexity (removed 5000+ lines)
✅ Fewer dependencies
✅ Better test coverage
✅ Cleaner architecture
✅ Easier to maintain
### Privacy
✅ No conversation data stored
✅ No VLLM connection needed
✅ All processing local
✅ Minimal external calls (only Edge-TTS text)
---
## Next Steps (Optional)
If you want to add conversation mode back in the future:
1. It will be a separate application (as you mentioned)
2. Can reuse the Vosk speech recognition from this service
3. Can integrate via D-Bus or similar IPC
4. Old conversation code is in git history if needed
---
## Version
- **Before**: v0.1.0 (conversation-focused)
- **After**: v0.2.0 (dictation+read-aloud focused)
---
## Summary
This refactoring successfully transformed the dictation service from a complex multi-mode application into two clean, focused features:
1. **Dictation**: Voice-to-text with visual tray icon feedback
2. **Read-Aloud**: On-demand text-to-speech via middle-click
All conversation mode functionality has been cleanly removed, the codebase has been simplified, dependencies reduced, and comprehensive tests added. The project is now cleaner, more maintainable, and focused on doing two things very well.