dictation-service/docs/MIGRATION_GUIDE.md
Kade Heyborne 71c305a201
Major refactoring: v0.2.0 - Simplify to core dictation & read-aloud features
This is a comprehensive refactoring that transforms the dictation service from a
complex multi-mode application into two clean, focused features:
1. Voice dictation with system tray icon
2. On-demand read-aloud via Ctrl+middle-click

## Key Changes

### Dictation Service Enhancements
- Add GTK/AppIndicator3 system tray icon for visual status
- Remove all notification spam (dictation start/stop/status)
- Icon states: microphone-muted (OFF) → microphone-high (ON)
- Click tray icon to toggle dictation (same as Alt+D)
- Simplify ai_dictation_simple.py by removing conversation mode

### Read-Aloud Service Redesign
- Replace automatic clipboard reader with on-demand Ctrl+middle-click
- New middle_click_reader.py service
- Works anywhere: highlight text, Ctrl+middle-click to read
- Uses Edge-TTS (Christopher voice) with mpv playback
- Lock file prevents feedback with dictation service

### Conversation Mode Removed
- Delete all VLLM/conversation code (VLLMClient, ConversationManager, TTS)
- Archive 5 old implementations to archive/old_implementations/
- Remove conversation-related scripts and services
- Clean separation of concerns for future reintegration if needed

### Dependencies Cleanup
- Remove: openai, aiohttp, pyttsx3, requests (conversation deps)
- Keep: PyGObject, pynput, sounddevice, vosk, numpy, edge-tts
- Net reduction: 4 packages removed, 6 core packages retained

### Testing Improvements
- Add test_dictation_service.py (8 tests) 
- Add test_middle_click.py (11 tests) 
- Fix test_run.py to use correct model path
- Total: 19 unit tests passing
- Delete obsolete test files (test_suite, test_vllm_integration, etc.)

### Documentation
- Add CHANGES.md with complete changelog
- Add docs/MIGRATION_GUIDE.md for upgrading
- Add README.md with quick start guide
- Update docs/README.md with current features only
- Add justfile for common tasks

### New Services & Scripts
- Add middle-click-reader.service (systemd)
- Add scripts/setup-middle-click-reader.sh
- Add desktop files for autostart
- Remove toggle-conversation.sh (obsolete)

## Impact

**Code Quality**
- Net change: -6,007 lines (596 added, 6,603 deleted)
- Simpler architecture, easier maintenance
- Better test coverage (19 tests vs mixed before)
- Cleaner separation of concerns

**User Experience**
- No notification spam during dictation
- Clean visual status via tray icon
- Full control over read-aloud (no unwanted readings)
- Better performance (fewer background processes)

**Privacy**
- No conversation data stored
- No VLLM connection needed
- All processing local except Edge-TTS text

## Migration Notes

Users upgrading should:
1. Run `uv sync` to update dependencies
2. Restart dictation.service to get tray icon
3. Run scripts/setup-middle-click-reader.sh for new read-aloud
4. Remove old read-aloud.service if present

See docs/MIGRATION_GUIDE.md for details.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-10 19:11:06 -07:00

206 lines
5.4 KiB
Markdown

# Migration Guide - Updated Features
## Summary of Changes
This update introduces significant UX improvements based on user feedback:
### ✅ Changes Made
1. **Dictation Mode: System Tray Icon Instead of Notifications**
- **Old:** System notifications for every dictation start/stop/status
- **New:** Clean system tray icon that changes based on state
- **Benefit:** No more notification spam, cleaner UX
2. **Read-Aloud: Middle-Click Instead of Automatic**
- **Old:** Automatic reading of all highlighted text via system tray service
- **New:** On-demand reading via middle-click on selected text
- **Benefit:** More control, less annoying, works on-demand only
3. **Conversation Mode: Unchanged**
- Still works with Super+Alt+D (Windows+Alt+D)
- Still maintains persistent context across calls
- Still sends notifications (intentionally kept for this feature)
## Migration Steps
### 1. Update the Dictation Service
The main dictation service now includes a system tray icon:
```bash
# Stop the old service
systemctl --user stop dictation.service
# Restart with new code (already updated)
systemctl --user restart dictation.service
```
**What to expect:**
- A microphone icon will appear in your system tray
- Icon changes from "muted" (OFF) to "high" (ON) when dictating
- Click the icon to toggle dictation, or continue using Alt+D
- No more notifications when dictating
### 2. Remove Old Read-Aloud Service
The automatic read-aloud service has been replaced:
```bash
# Stop and disable old service
systemctl --user stop read-aloud.service 2>/dev/null || true
systemctl --user disable read-aloud.service 2>/dev/null || true
# Remove old service file
rm -f ~/.config/systemd/user/read-aloud.service
# Reload systemd
systemctl --user daemon-reload
```
### 3. Install New Middle-Click Reader
Set up the new on-demand read-aloud service:
```bash
# Run setup script
cd /mnt/storage/Development/dictation-service
./scripts/setup-middle-click-reader.sh
```
**What to expect:**
- No visible tray icon (runs in background)
- Highlight text anywhere
- Middle-click (press scroll wheel) to read it
- Only reads when you explicitly request it
### 4. Test Everything
**Test Dictation:**
1. Look for microphone icon in system tray
2. Press Alt+D or click the icon
3. Icon should change to "microphone-high"
4. Speak - text should type
5. Press Alt+D or click icon again to stop
6. No notifications should appear
**Test Read-Aloud:**
1. Highlight some text in a browser or editor
2. Middle-click on the highlighted text
3. It should be read aloud
4. Try highlighting different text and middle-clicking again
**Test Conversation (unchanged):**
1. Press Super+Alt+D
2. Should see "Conversation Started" notification (this is kept)
3. Speak with AI
4. Press Super+Alt+D to end
## Deprecated Files
These files have been renamed with `.deprecated` suffix and are no longer used:
- `read-aloud.service.deprecated` (old automatic service)
- `scripts/setup-read-aloud.sh.deprecated` (old setup script)
- `scripts/toggle-read-aloud.sh.deprecated` (old toggle script)
- `src/dictation_service/read_aloud_service.py.deprecated` (old implementation)
You can safely delete these files if desired.
## New Files
- `src/dictation_service/middle_click_reader.py` - New middle-click service
- `middle-click-reader.service` - Systemd service file
- `scripts/setup-middle-click-reader.sh` - Setup script
## Troubleshooting
### System Tray Icon Not Appearing
1. Make sure AppIndicator3 is installed:
```bash
sudo apt-get install gir1.2-appindicator3-0.1
```
2. Check service logs:
```bash
journalctl --user -u dictation.service -f
```
3. Some desktop environments need additional packages:
```bash
# For GNOME Shell
sudo apt-get install gnome-shell-extension-appindicator
```
### Middle-Click Not Working
1. Check if service is running:
```bash
systemctl --user status middle-click-reader
```
2. Check logs:
```bash
journalctl --user -u middle-click-reader -f
```
3. Test xclip manually:
```bash
echo "test" | xclip -selection primary
xclip -o -selection primary
```
4. Verify edge-tts is installed:
```bash
edge-tts --list-voices | grep Christopher
```
### Notifications Still Appearing for Dictation
This means you might be running an old version of the code:
```bash
# Force restart the service
systemctl --user restart dictation.service
# Verify the new code is running
journalctl --user -u dictation.service -n 20 | grep "system tray"
```
## Rollback Instructions
If you need to revert to the old behavior:
```bash
# Restore old files (if you didn't delete them)
mv read-aloud.service.deprecated read-aloud.service
mv scripts/setup-read-aloud.sh.deprecated scripts/setup-read-aloud.sh
mv scripts/toggle-read-aloud.sh.deprecated scripts/toggle-read-aloud.sh
# Use git to restore old dictation code
git checkout HEAD~1 -- src/dictation_service/ai_dictation_simple.py
# Restart services
systemctl --user restart dictation.service
./scripts/setup-read-aloud.sh
```
## Benefits of New Approach
### Dictation
- ✅ No notification spam
- ✅ Visual status always visible in tray
- ✅ One-click toggle from tray menu
- ✅ Cleaner, less intrusive UX
### Read-Aloud
- ✅ Only reads when you want it to
- ✅ No background polling
- ✅ Lower resource usage
- ✅ Works everywhere (not just when service is "on")
- ✅ No accidental readings
## Questions?
Check the updated [AI_DICTATION_GUIDE.md](./AI_DICTATION_GUIDE.md) for complete usage instructions.