Fix dictation service: state detection, async processing, and performance optimizations

- Fix state detection priority: dictation now takes precedence over conversation
- Fix critical bug: event loop was created but never started, preventing async coroutines from executing
- Optimize audio processing: reorder AcceptWaveform/PartialResult checks
- Switch to faster Vosk model: vosk-model-en-us-0.22-lgraph for 2-3x speed improvement
- Reduce block size from 8000 to 4000 for lower latency
- Add filtering to remove spurious 'the', 'a', 'an' words from start/end of transcriptions
- Update toggle-dictation.sh to properly clean up conversation lock file
- Improve batch audio processing for better responsiveness

2025-12-04 11:49:07 -07:00

3.2 KiB

Raw Blame History

Dictation Service Setup Guide

This guide will help you set up the dictation service as a system service with global keybindings for voice-to-text input.

Prerequisites

Ubuntu/GNOME desktop environment
Python 3.12+ (already specified in project)
uv package manager
Microphone access
Audio system (PulseAudio)

Installation Steps

1. Install Dependencies

# Install system dependencies
sudo apt update
sudo apt install python3.12 python3.12-venv portaudio19-dev

# Install Python dependencies with uv
uv sync

2. Set Up System Service

# Copy service file to systemd directory
sudo cp dictation.service /etc/systemd/system/

# Reload systemd daemon
sudo systemctl daemon-reload

# Enable and start the service
systemctl --user enable dictation.service
systemctl --user start dictation.service

3. Configure Global Keybinding

# Run the keybinding setup script
./setup-keybindings.sh

This will configure Alt+D as the global shortcut to toggle dictation.

4. Verify Installation

# Check service status
systemctl --user status dictation.service

# Test the toggle script
./toggle-dictation.sh

Usage

Start Dictation: Press Alt+D (or run ./toggle-dictation.sh)
Wait for notification: You'll see "Dictation Started"
Speak clearly: The service will transcribe your voice to text
Text appears: Transcribed text will be typed wherever your cursor is
Stop Dictation: Press Alt+D again

Troubleshooting

Service Issues

# Check service logs
journalctl --user -u dictation.service -f

# Restart service
systemctl --user restart dictation.service

Audio Issues

# Test microphone
arecord -D pulse -f cd -d 5 test.wav
aplay test.wav

# Check PulseAudio
pulseaudio --check -v

Keybinding Issues

# Check current keybindings
gsettings list-recursively org.gnome.settings-daemon.plugins.media-keys

# Reset keybindings if needed
gsettings reset org.gnome.settings-daemon.plugins.media-keys custom-keybindings

Permission Issues

# Add user to audio group
sudo usermod -a -G audio $USER

# Check microphone permissions
pacmd list-sources | grep -A 10 index

Configuration

Service Configuration

Edit /etc/systemd/user/dictation.service to modify:

User account
Working directory
Environment variables

Keybinding Configuration

Run ./setup-keybindings.sh again to change the keybinding, or edit the script to use a different shortcut.

Dictation Behavior

The dictation service can be configured by modifying:

src/dictation_service/vosk_dictation.py - Main dictation logic
Model files for different languages
Audio settings and formatting

Files Created

dictation.service - Systemd service file
toggle-dictation.sh - Dictation control script
setup-keybindings.sh - Keybinding configuration script

Removing the Service

# Stop and disable service
systemctl --user stop dictation.service
systemctl --user disable dictation.service

# Remove service file
sudo rm /etc/systemd/system/dictation.service
sudo systemctl daemon-reload

# Remove keybinding
gsettings reset org.gnome.settings-daemon.plugins.media-keys custom-keybindings

3.2 KiB Raw Blame History