dictation-service

4 Commits 2 Branches 0 Tags

Author	SHA1	Message	Date
Kade Heyborne	cca6bd2aee	Refactor middle-click reader to read-aloud service with Alt+R hotkey and Piper TTS - Rename middle-click-reader to read-aloud service - Change hotkey from Ctrl+middle-click to Alt+R - Replace edge-tts with Piper TTS for local neural voices - Update desktop and service files - Add piper-tts dependency - Update tests and setup scripts	2025-12-10 20:10:44 -07:00
Kade Heyborne	71c305a201	Major refactoring: v0.2.0 - Simplify to core dictation & read-aloud features This is a comprehensive refactoring that transforms the dictation service from a complex multi-mode application into two clean, focused features: 1. Voice dictation with system tray icon 2. On-demand read-aloud via Ctrl+middle-click ## Key Changes ### Dictation Service Enhancements - Add GTK/AppIndicator3 system tray icon for visual status - Remove all notification spam (dictation start/stop/status) - Icon states: microphone-muted (OFF) → microphone-high (ON) - Click tray icon to toggle dictation (same as Alt+D) - Simplify ai_dictation_simple.py by removing conversation mode ### Read-Aloud Service Redesign - Replace automatic clipboard reader with on-demand Ctrl+middle-click - New middle_click_reader.py service - Works anywhere: highlight text, Ctrl+middle-click to read - Uses Edge-TTS (Christopher voice) with mpv playback - Lock file prevents feedback with dictation service ### Conversation Mode Removed - Delete all VLLM/conversation code (VLLMClient, ConversationManager, TTS) - Archive 5 old implementations to archive/old_implementations/ - Remove conversation-related scripts and services - Clean separation of concerns for future reintegration if needed ### Dependencies Cleanup - Remove: openai, aiohttp, pyttsx3, requests (conversation deps) - Keep: PyGObject, pynput, sounddevice, vosk, numpy, edge-tts - Net reduction: 4 packages removed, 6 core packages retained ### Testing Improvements - Add test_dictation_service.py (8 tests) ✅ - Add test_middle_click.py (11 tests) ✅ - Fix test_run.py to use correct model path - Total: 19 unit tests passing - Delete obsolete test files (test_suite, test_vllm_integration, etc.) ### Documentation - Add CHANGES.md with complete changelog - Add docs/MIGRATION_GUIDE.md for upgrading - Add README.md with quick start guide - Update docs/README.md with current features only - Add justfile for common tasks ### New Services & Scripts - Add middle-click-reader.service (systemd) - Add scripts/setup-middle-click-reader.sh - Add desktop files for autostart - Remove toggle-conversation.sh (obsolete) ## Impact Code Quality - Net change: -6,007 lines (596 added, 6,603 deleted) - Simpler architecture, easier maintenance - Better test coverage (19 tests vs mixed before) - Cleaner separation of concerns User Experience - No notification spam during dictation - Clean visual status via tray icon - Full control over read-aloud (no unwanted readings) - Better performance (fewer background processes) Privacy - No conversation data stored - No VLLM connection needed - All processing local except Edge-TTS text ## Migration Notes Users upgrading should: 1. Run `uv sync` to update dependencies 2. Restart dictation.service to get tray icon 3. Run scripts/setup-middle-click-reader.sh for new read-aloud 4. Remove old read-aloud.service if present See docs/MIGRATION_GUIDE.md for details. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2025-12-10 19:11:06 -07:00
Kade Heyborne	cf2ebc9afa	Add privacy: hide exact words in dictation notifications - Replace exact text preview with word count in notifications - Change 'Speaking' notification to show 'Dictating... (X words)' instead of text - Change 'Dictation' notification to show 'Text typed successfully (X words)' instead of preview - Improves privacy by not displaying sensitive dictation content in notifications	2025-12-04 11:50:21 -07:00
Kade Heyborne	73a15d03cd	Fix dictation service: state detection, async processing, and performance optimizations - Fix state detection priority: dictation now takes precedence over conversation - Fix critical bug: event loop was created but never started, preventing async coroutines from executing - Optimize audio processing: reorder AcceptWaveform/PartialResult checks - Switch to faster Vosk model: vosk-model-en-us-0.22-lgraph for 2-3x speed improvement - Reduce block size from 8000 to 4000 for lower latency - Add filtering to remove spurious 'the', 'a', 'an' words from start/end of transcriptions - Update toggle-dictation.sh to properly clean up conversation lock file - Improve batch audio processing for better responsiveness	2025-12-04 11:49:07 -07:00