Kade Heyborne 71c305a201
Major refactoring: v0.2.0 - Simplify to core dictation & read-aloud features
This is a comprehensive refactoring that transforms the dictation service from a
complex multi-mode application into two clean, focused features:
1. Voice dictation with system tray icon
2. On-demand read-aloud via Ctrl+middle-click

## Key Changes

### Dictation Service Enhancements
- Add GTK/AppIndicator3 system tray icon for visual status
- Remove all notification spam (dictation start/stop/status)
- Icon states: microphone-muted (OFF) → microphone-high (ON)
- Click tray icon to toggle dictation (same as Alt+D)
- Simplify ai_dictation_simple.py by removing conversation mode

### Read-Aloud Service Redesign
- Replace automatic clipboard reader with on-demand Ctrl+middle-click
- New middle_click_reader.py service
- Works anywhere: highlight text, Ctrl+middle-click to read
- Uses Edge-TTS (Christopher voice) with mpv playback
- Lock file prevents feedback with dictation service

### Conversation Mode Removed
- Delete all VLLM/conversation code (VLLMClient, ConversationManager, TTS)
- Archive 5 old implementations to archive/old_implementations/
- Remove conversation-related scripts and services
- Clean separation of concerns for future reintegration if needed

### Dependencies Cleanup
- Remove: openai, aiohttp, pyttsx3, requests (conversation deps)
- Keep: PyGObject, pynput, sounddevice, vosk, numpy, edge-tts
- Net reduction: 4 packages removed, 6 core packages retained

### Testing Improvements
- Add test_dictation_service.py (8 tests) 
- Add test_middle_click.py (11 tests) 
- Fix test_run.py to use correct model path
- Total: 19 unit tests passing
- Delete obsolete test files (test_suite, test_vllm_integration, etc.)

### Documentation
- Add CHANGES.md with complete changelog
- Add docs/MIGRATION_GUIDE.md for upgrading
- Add README.md with quick start guide
- Update docs/README.md with current features only
- Add justfile for common tasks

### New Services & Scripts
- Add middle-click-reader.service (systemd)
- Add scripts/setup-middle-click-reader.sh
- Add desktop files for autostart
- Remove toggle-conversation.sh (obsolete)

## Impact

**Code Quality**
- Net change: -6,007 lines (596 added, 6,603 deleted)
- Simpler architecture, easier maintenance
- Better test coverage (19 tests vs mixed before)
- Cleaner separation of concerns

**User Experience**
- No notification spam during dictation
- Clean visual status via tray icon
- Full control over read-aloud (no unwanted readings)
- Better performance (fewer background processes)

**Privacy**
- No conversation data stored
- No VLLM connection needed
- All processing local except Edge-TTS text

## Migration Notes

Users upgrading should:
1. Run `uv sync` to update dependencies
2. Restart dictation.service to get tray icon
3. Run scripts/setup-middle-click-reader.sh for new read-aloud
4. Remove old read-aloud.service if present

See docs/MIGRATION_GUIDE.md for details.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-10 19:11:06 -07:00

131 lines
4.9 KiB
Python
Executable File

#!/mnt/storage/Development/dictation-service/.venv/bin/python
import os
import sys
import queue
import json
import time
import subprocess
import threading
import sounddevice as sd
from vosk import Model, KaldiRecognizer
from pynput.keyboard import Controller
import logging
logging.basicConfig(filename='/home/universal/.gemini/tmp/428d098e581799ff7817b2001dd545f7b891975897338dd78498cc16582e004f/debug.log', level=logging.DEBUG)
# Configuration
MODEL_NAME = "vosk-model-small-en-us-0.15" # Small model (fast)
# MODEL_NAME = "vosk-model-en-us-0.22" # Larger model (more accurate, higher RAM)
SAMPLE_RATE = 16000
BLOCK_SIZE = 8000
LOCK_FILE = "listening.lock"
# Global State
is_listening = False
keyboard = Controller()
q = queue.Queue()
def send_notification(title, message):
"""Sends a system notification to let the user know state changed."""
try:
subprocess.run(["notify-send", "-t", "2000", title, message])
except FileNotFoundError:
pass # notify-send might not be installed
def download_model_if_needed():
"""Checks if model exists, otherwise downloads the small English model."""
if not os.path.exists(MODEL_NAME):
logging.info(f"Model '{MODEL_NAME}' not found.")
logging.info("Downloading default model (approx 40MB)...")
try:
# Requires requests and zipfile, simplified here to system call for robustness
subprocess.check_call(["wget", f"https://alphacephei.com/vosk/models/{MODEL_NAME}.zip"])
subprocess.check_call(["unzip", f"{MODEL_NAME}.zip"])
logging.info("Download complete.")
except Exception as e:
logging.error(f"Error downloading model: {e}")
sys.exit(1)
def audio_callback(indata, frames, time, status):
"""This is called (from a separate thread) for each audio block."""
if status:
logging.warning(status)
if is_listening:
q.put(bytes(indata))
def process_text(text):
"""Formats text slightly before typing (capitalization)."""
if not text:
return ""
# Basic Sentence Case
formatted = text[0].upper() + text[1:]
return formatted + " "
def main():
try:
logging.info("Starting main function")
global is_listening
# 2. Model Setup
download_model_if_needed()
logging.info("Model check complete")
logging.info("Loading model... (this may take a moment)")
try:
model = Model(MODEL_NAME)
logging.info("Model loaded successfully")
except Exception as e:
logging.error(f"Failed to load model: {e}")
sys.exit(1)
recognizer = KaldiRecognizer(model, SAMPLE_RATE)
logging.info("Recognizer created")
logging.info("\n=== Ready ===")
logging.info("Waiting for lock file to start dictation...")
# 3. Main Audio Loop
# We use raw input stream to keep latency low
try:
with sd.RawInputStream(samplerate=SAMPLE_RATE, blocksize=BLOCK_SIZE, dtype='int16',
channels=1, callback=audio_callback):
logging.info("Audio stream opened")
while True:
# If lock file exists, start listening
if os.path.exists(LOCK_FILE) and not is_listening:
is_listening = True
logging.info("\n[Dictation] STARTED listening...")
send_notification("Dictation", "🎤 Listening...")
# If lock file does not exist, stop listening
elif not os.path.exists(LOCK_FILE) and is_listening:
is_listening = False
logging.info("\n[Dictation] STOPPED listening.")
send_notification("Dictation", "🛑 Stopped.")
# If not listening, just sleep to save CPU
if not is_listening:
time.sleep(0.1)
continue
# If listening, process the queue
try:
data = q.get(timeout=0.1)
if recognizer.AcceptWaveform(data):
result = json.loads(recognizer.Result())
text = result.get("text", "")
if text:
typed_text = process_text(text)
logging.info(f"Typing: {text}")
keyboard.type(typed_text)
except queue.Empty:
pass
except KeyboardInterrupt:
logging.info("\nExiting...")
except Exception as e:
logging.error(f"\nError in audio loop: {e}")
except Exception as e:
logging.error(f"Error in main function: {e}")
if __name__ == "__main__":
main()