dictation-service/docs/MIGRATION_GUIDE.md
Kade Heyborne 71c305a201
Major refactoring: v0.2.0 - Simplify to core dictation & read-aloud features
This is a comprehensive refactoring that transforms the dictation service from a
complex multi-mode application into two clean, focused features:
1. Voice dictation with system tray icon
2. On-demand read-aloud via Ctrl+middle-click

## Key Changes

### Dictation Service Enhancements
- Add GTK/AppIndicator3 system tray icon for visual status
- Remove all notification spam (dictation start/stop/status)
- Icon states: microphone-muted (OFF) → microphone-high (ON)
- Click tray icon to toggle dictation (same as Alt+D)
- Simplify ai_dictation_simple.py by removing conversation mode

### Read-Aloud Service Redesign
- Replace automatic clipboard reader with on-demand Ctrl+middle-click
- New middle_click_reader.py service
- Works anywhere: highlight text, Ctrl+middle-click to read
- Uses Edge-TTS (Christopher voice) with mpv playback
- Lock file prevents feedback with dictation service

### Conversation Mode Removed
- Delete all VLLM/conversation code (VLLMClient, ConversationManager, TTS)
- Archive 5 old implementations to archive/old_implementations/
- Remove conversation-related scripts and services
- Clean separation of concerns for future reintegration if needed

### Dependencies Cleanup
- Remove: openai, aiohttp, pyttsx3, requests (conversation deps)
- Keep: PyGObject, pynput, sounddevice, vosk, numpy, edge-tts
- Net reduction: 4 packages removed, 6 core packages retained

### Testing Improvements
- Add test_dictation_service.py (8 tests) 
- Add test_middle_click.py (11 tests) 
- Fix test_run.py to use correct model path
- Total: 19 unit tests passing
- Delete obsolete test files (test_suite, test_vllm_integration, etc.)

### Documentation
- Add CHANGES.md with complete changelog
- Add docs/MIGRATION_GUIDE.md for upgrading
- Add README.md with quick start guide
- Update docs/README.md with current features only
- Add justfile for common tasks

### New Services & Scripts
- Add middle-click-reader.service (systemd)
- Add scripts/setup-middle-click-reader.sh
- Add desktop files for autostart
- Remove toggle-conversation.sh (obsolete)

## Impact

**Code Quality**
- Net change: -6,007 lines (596 added, 6,603 deleted)
- Simpler architecture, easier maintenance
- Better test coverage (19 tests vs mixed before)
- Cleaner separation of concerns

**User Experience**
- No notification spam during dictation
- Clean visual status via tray icon
- Full control over read-aloud (no unwanted readings)
- Better performance (fewer background processes)

**Privacy**
- No conversation data stored
- No VLLM connection needed
- All processing local except Edge-TTS text

## Migration Notes

Users upgrading should:
1. Run `uv sync` to update dependencies
2. Restart dictation.service to get tray icon
3. Run scripts/setup-middle-click-reader.sh for new read-aloud
4. Remove old read-aloud.service if present

See docs/MIGRATION_GUIDE.md for details.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-10 19:11:06 -07:00

5.4 KiB

Migration Guide - Updated Features

Summary of Changes

This update introduces significant UX improvements based on user feedback:

Changes Made

  1. Dictation Mode: System Tray Icon Instead of Notifications

    • Old: System notifications for every dictation start/stop/status
    • New: Clean system tray icon that changes based on state
    • Benefit: No more notification spam, cleaner UX
  2. Read-Aloud: Middle-Click Instead of Automatic

    • Old: Automatic reading of all highlighted text via system tray service
    • New: On-demand reading via middle-click on selected text
    • Benefit: More control, less annoying, works on-demand only
  3. Conversation Mode: Unchanged

    • Still works with Super+Alt+D (Windows+Alt+D)
    • Still maintains persistent context across calls
    • Still sends notifications (intentionally kept for this feature)

Migration Steps

1. Update the Dictation Service

The main dictation service now includes a system tray icon:

# Stop the old service
systemctl --user stop dictation.service

# Restart with new code (already updated)
systemctl --user restart dictation.service

What to expect:

  • A microphone icon will appear in your system tray
  • Icon changes from "muted" (OFF) to "high" (ON) when dictating
  • Click the icon to toggle dictation, or continue using Alt+D
  • No more notifications when dictating

2. Remove Old Read-Aloud Service

The automatic read-aloud service has been replaced:

# Stop and disable old service
systemctl --user stop read-aloud.service 2>/dev/null || true
systemctl --user disable read-aloud.service 2>/dev/null || true

# Remove old service file
rm -f ~/.config/systemd/user/read-aloud.service

# Reload systemd
systemctl --user daemon-reload

3. Install New Middle-Click Reader

Set up the new on-demand read-aloud service:

# Run setup script
cd /mnt/storage/Development/dictation-service
./scripts/setup-middle-click-reader.sh

What to expect:

  • No visible tray icon (runs in background)
  • Highlight text anywhere
  • Middle-click (press scroll wheel) to read it
  • Only reads when you explicitly request it

4. Test Everything

Test Dictation:

  1. Look for microphone icon in system tray
  2. Press Alt+D or click the icon
  3. Icon should change to "microphone-high"
  4. Speak - text should type
  5. Press Alt+D or click icon again to stop
  6. No notifications should appear

Test Read-Aloud:

  1. Highlight some text in a browser or editor
  2. Middle-click on the highlighted text
  3. It should be read aloud
  4. Try highlighting different text and middle-clicking again

Test Conversation (unchanged):

  1. Press Super+Alt+D
  2. Should see "Conversation Started" notification (this is kept)
  3. Speak with AI
  4. Press Super+Alt+D to end

Deprecated Files

These files have been renamed with .deprecated suffix and are no longer used:

  • read-aloud.service.deprecated (old automatic service)
  • scripts/setup-read-aloud.sh.deprecated (old setup script)
  • scripts/toggle-read-aloud.sh.deprecated (old toggle script)
  • src/dictation_service/read_aloud_service.py.deprecated (old implementation)

You can safely delete these files if desired.

New Files

  • src/dictation_service/middle_click_reader.py - New middle-click service
  • middle-click-reader.service - Systemd service file
  • scripts/setup-middle-click-reader.sh - Setup script

Troubleshooting

System Tray Icon Not Appearing

  1. Make sure AppIndicator3 is installed:

    sudo apt-get install gir1.2-appindicator3-0.1
    
  2. Check service logs:

    journalctl --user -u dictation.service -f
    
  3. Some desktop environments need additional packages:

    # For GNOME Shell
    sudo apt-get install gnome-shell-extension-appindicator
    

Middle-Click Not Working

  1. Check if service is running:

    systemctl --user status middle-click-reader
    
  2. Check logs:

    journalctl --user -u middle-click-reader -f
    
  3. Test xclip manually:

    echo "test" | xclip -selection primary
    xclip -o -selection primary
    
  4. Verify edge-tts is installed:

    edge-tts --list-voices | grep Christopher
    

Notifications Still Appearing for Dictation

This means you might be running an old version of the code:

# Force restart the service
systemctl --user restart dictation.service

# Verify the new code is running
journalctl --user -u dictation.service -n 20 | grep "system tray"

Rollback Instructions

If you need to revert to the old behavior:

# Restore old files (if you didn't delete them)
mv read-aloud.service.deprecated read-aloud.service
mv scripts/setup-read-aloud.sh.deprecated scripts/setup-read-aloud.sh
mv scripts/toggle-read-aloud.sh.deprecated scripts/toggle-read-aloud.sh

# Use git to restore old dictation code
git checkout HEAD~1 -- src/dictation_service/ai_dictation_simple.py

# Restart services
systemctl --user restart dictation.service
./scripts/setup-read-aloud.sh

Benefits of New Approach

Dictation

  • No notification spam
  • Visual status always visible in tray
  • One-click toggle from tray menu
  • Cleaner, less intrusive UX

Read-Aloud

  • Only reads when you want it to
  • No background polling
  • Lower resource usage
  • Works everywhere (not just when service is "on")
  • No accidental readings

Questions?

Check the updated AI_DICTATION_GUIDE.md for complete usage instructions.