History

Major refactoring: v0.2.0 - Simplify to core dictation & read-aloud features

This is a comprehensive refactoring that transforms the dictation service from a
complex multi-mode application into two clean, focused features:
1. Voice dictation with system tray icon
2. On-demand read-aloud via Ctrl+middle-click

## Key Changes

### Dictation Service Enhancements
- Add GTK/AppIndicator3 system tray icon for visual status
- Remove all notification spam (dictation start/stop/status)
- Icon states: microphone-muted (OFF) → microphone-high (ON)
- Click tray icon to toggle dictation (same as Alt+D)
- Simplify ai_dictation_simple.py by removing conversation mode

### Read-Aloud Service Redesign
- Replace automatic clipboard reader with on-demand Ctrl+middle-click
- New middle_click_reader.py service
- Works anywhere: highlight text, Ctrl+middle-click to read
- Uses Edge-TTS (Christopher voice) with mpv playback
- Lock file prevents feedback with dictation service

### Conversation Mode Removed
- Delete all VLLM/conversation code (VLLMClient, ConversationManager, TTS)
- Archive 5 old implementations to archive/old_implementations/
- Remove conversation-related scripts and services
- Clean separation of concerns for future reintegration if needed

### Dependencies Cleanup
- Remove: openai, aiohttp, pyttsx3, requests (conversation deps)
- Keep: PyGObject, pynput, sounddevice, vosk, numpy, edge-tts
- Net reduction: 4 packages removed, 6 core packages retained

### Testing Improvements
- Add test_dictation_service.py (8 tests) ✅
- Add test_middle_click.py (11 tests) ✅
- Fix test_run.py to use correct model path
- Total: 19 unit tests passing
- Delete obsolete test files (test_suite, test_vllm_integration, etc.)

### Documentation
- Add CHANGES.md with complete changelog
- Add docs/MIGRATION_GUIDE.md for upgrading
- Add README.md with quick start guide
- Update docs/README.md with current features only
- Add justfile for common tasks

### New Services & Scripts
- Add middle-click-reader.service (systemd)
- Add scripts/setup-middle-click-reader.sh
- Add desktop files for autostart
- Remove toggle-conversation.sh (obsolete)

## Impact

**Code Quality**
- Net change: -6,007 lines (596 added, 6,603 deleted)
- Simpler architecture, easier maintenance
- Better test coverage (19 tests vs mixed before)
- Cleaner separation of concerns

**User Experience**
- No notification spam during dictation
- Clean visual status via tray icon
- Full control over read-aloud (no unwanted readings)
- Better performance (fewer background processes)

**Privacy**
- No conversation data stored
- No VLLM connection needed
- All processing local except Edge-TTS text

## Migration Notes

Users upgrading should:
1. Run `uv sync` to update dependencies
2. Restart dictation.service to get tray icon
3. Run scripts/setup-middle-click-reader.sh for new read-aloud
4. Remove old read-aloud.service if present

See docs/MIGRATION_GUIDE.md for details.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2025-12-10 19:11:06 -07:00

CLAUDE.md

Fix dictation service: state detection, async processing, and performance optimizations

2025-12-04 11:49:07 -07:00

INSTALL.md

Fix dictation service: state detection, async processing, and performance optimizations

2025-12-04 11:49:07 -07:00

MIGRATION_GUIDE.md

Major refactoring: v0.2.0 - Simplify to core dictation & read-aloud features

2025-12-10 19:11:06 -07:00

README.md

Major refactoring: v0.2.0 - Simplify to core dictation & read-aloud features

2025-12-10 19:11:06 -07:00

TEST_RESULTS_AND_FIXES.md

Fix dictation service: state detection, async processing, and performance optimizations

2025-12-04 11:49:07 -07:00

TESTING_SUMMARY.md

Fix dictation service: state detection, async processing, and performance optimizations

2025-12-04 11:49:07 -07:00

README.md

Dictation Service - Complete Guide

Voice dictation with system tray control and on-demand text-to-speech for Linux.

Overview
Features
Installation
Usage
Configuration
Troubleshooting
Architecture

Overview

This service provides two main features:

Voice Dictation: Real-time speech-to-text that types into any application
Read-Aloud: On-demand text-to-speech for highlighted text

Both features work seamlessly together without interference.

Features

Dictation Mode

✅ Real-time voice recognition using Vosk (offline)
✅ System tray icon for status (no notification spam)
✅ Toggle via Alt+D or tray icon click
✅ Automatic spurious word filtering
✅ Works with all applications

Read-Aloud

✅ Middle-click to read selected text
✅ High-quality neural voice (Microsoft Edge TTS)
✅ Works in any application
✅ On-demand only (no automatic reading)
✅ Prevents feedback loops with dictation

Installation

See INSTALL.md for detailed installation instructions.

Quick install:

uv sync
./scripts/setup-keybindings.sh
./scripts/setup-middle-click-reader.sh
systemctl --user enable --now dictation.service

Usage

Dictation

Starting:

Press Alt+D (or click tray icon)
Microphone icon turns "on" in system tray
Speak normally
Words are typed into focused application

Stopping:

Press Alt+D again (or click tray icon)
Icon returns to "muted" state

Tips:

Speak clearly and at normal pace
Avoid filler words like "um", "uh" (automatically filtered)
Pause briefly between thoughts for better accuracy

Read-Aloud

Using:

Highlight any text (in browser, PDF, editor, etc.)
Middle-click (press scroll wheel)
Text is read aloud

Tips:

Works on any highlighted text
No need to enable/disable - always ready
Only reads when you middle-click

Configuration

Speech Recognition Models

Switch models for different speed/accuracy trade-offs:

./scripts/switch-model.sh

Available models:

vosk-model-small-en-us-0.15 - Fast, basic accuracy
vosk-model-en-us-0.22-lgraph - Balanced (default)
vosk-model-en-us-0.22 - Best accuracy (~5.69% WER)

TTS Voice

Edit src/dictation_service/middle_click_reader.py:

EDGE_TTS_VOICE = "en-US-ChristopherNeural"

List available voices:

edge-tts --list-voices

Popular options:

en-US-JennyNeural (female, friendly)
en-US-GuyNeural (male, professional)
en-GB-RyanNeural (British male)

Audio Settings

Edit src/dictation_service/ai_dictation_simple.py:

SAMPLE_RATE = 16000   # Higher = better quality, more CPU
BLOCK_SIZE = 4000     # Lower = less latency, less accurate

Troubleshooting

System Tray Icon Missing

# Install AppIndicator
sudo apt-get install gir1.2-appindicator3-0.1

# For GNOME Shell
sudo apt-get install gnome-shell-extension-appindicator

# Restart
systemctl --user restart dictation.service

Dictation Not Typing

# Check ydotool status
systemctl status ydotool

# Start if needed
sudo systemctl enable --now ydotool

# Add user to input group
sudo usermod -aG input $USER
# Log out and back in

Middle-Click Not Working

# Check service
systemctl --user status middle-click-reader

# View logs
journalctl --user -u middle-click-reader -f

# Test selection
echo "test" | xclip -selection primary
xclip -o -selection primary

Poor Recognition Accuracy

Check microphone:
```
arecord -d 3 test.wav
aplay test.wav
```

Try better model:

./scripts/switch-model.sh
# Select vosk-model-en-us-0.22

Reduce background noise
Speak more clearly and slowly

Service Won't Start

# View detailed logs
journalctl --user -u dictation.service -n 50

# Check for errors
tail -f ~/.cache/dictation_service.log

# Verify model exists
ls ~/.shared/models/vosk-models/

Architecture

Components

┌─────────────────────────────────┐
│     System Tray Icon (GTK)      │
│   - Visual status indicator     │
│   - Click to toggle dictation   │
└─────────────────────────────────┘
              ↓
┌─────────────────────────────────┐
│   Dictation Service (Main)      │
│   - Audio capture               │
│   - Speech recognition (Vosk)   │
│   - Text typing (ydotool)       │
│   - Lock file management        │
└─────────────────────────────────┘
              ↓
         Focused App


┌─────────────────────────────────┐
│  Middle-Click Reader Service    │
│   - Mouse event monitoring      │
│   - Selection capture (xclip)   │
│   - Text-to-speech (edge-tts)   │
│   - Audio playback (mpv)        │
└─────────────────────────────────┘

Lock Files

listening.lock - Dictation active
/tmp/dictation_speaking.lock - TTS playing (prevents feedback)

Logs

Dictation: ~/.cache/dictation_service.log
Read-aloud: ~/.cache/middle_click_reader.log
Systemd: journalctl --user -u <service-name>

Managing Services