dictation-service/docs/INSTALL.md
Kade Heyborne 73a15d03cd
Fix dictation service: state detection, async processing, and performance optimizations
- Fix state detection priority: dictation now takes precedence over conversation
- Fix critical bug: event loop was created but never started, preventing async coroutines from executing
- Optimize audio processing: reorder AcceptWaveform/PartialResult checks
- Switch to faster Vosk model: vosk-model-en-us-0.22-lgraph for 2-3x speed improvement
- Reduce block size from 8000 to 4000 for lower latency
- Add filtering to remove spurious 'the', 'a', 'an' words from start/end of transcriptions
- Update toggle-dictation.sh to properly clean up conversation lock file
- Improve batch audio processing for better responsiveness
2025-12-04 11:49:07 -07:00

149 lines
3.2 KiB
Markdown

# Dictation Service Setup Guide
This guide will help you set up the dictation service as a system service with global keybindings for voice-to-text input.
## Prerequisites
- Ubuntu/GNOME desktop environment
- Python 3.12+ (already specified in project)
- uv package manager
- Microphone access
- Audio system (PulseAudio)
## Installation Steps
### 1. Install Dependencies
```bash
# Install system dependencies
sudo apt update
sudo apt install python3.12 python3.12-venv portaudio19-dev
# Install Python dependencies with uv
uv sync
```
### 2. Set Up System Service
```bash
# Copy service file to systemd directory
sudo cp dictation.service /etc/systemd/system/
# Reload systemd daemon
sudo systemctl daemon-reload
# Enable and start the service
systemctl --user enable dictation.service
systemctl --user start dictation.service
```
### 3. Configure Global Keybinding
```bash
# Run the keybinding setup script
./setup-keybindings.sh
```
This will configure Alt+D as the global shortcut to toggle dictation.
### 4. Verify Installation
```bash
# Check service status
systemctl --user status dictation.service
# Test the toggle script
./toggle-dictation.sh
```
## Usage
1. **Start Dictation**: Press Alt+D (or run `./toggle-dictation.sh`)
2. **Wait for notification**: You'll see "Dictation Started"
3. **Speak clearly**: The service will transcribe your voice to text
4. **Text appears**: Transcribed text will be typed wherever your cursor is
5. **Stop Dictation**: Press Alt+D again
## Troubleshooting
### Service Issues
```bash
# Check service logs
journalctl --user -u dictation.service -f
# Restart service
systemctl --user restart dictation.service
```
### Audio Issues
```bash
# Test microphone
arecord -D pulse -f cd -d 5 test.wav
aplay test.wav
# Check PulseAudio
pulseaudio --check -v
```
### Keybinding Issues
```bash
# Check current keybindings
gsettings list-recursively org.gnome.settings-daemon.plugins.media-keys
# Reset keybindings if needed
gsettings reset org.gnome.settings-daemon.plugins.media-keys custom-keybindings
```
### Permission Issues
```bash
# Add user to audio group
sudo usermod -a -G audio $USER
# Check microphone permissions
pacmd list-sources | grep -A 10 index
```
## Configuration
### Service Configuration
Edit `/etc/systemd/user/dictation.service` to modify:
- User account
- Working directory
- Environment variables
### Keybinding Configuration
Run `./setup-keybindings.sh` again to change the keybinding, or edit the script to use a different shortcut.
### Dictation Behavior
The dictation service can be configured by modifying:
- `src/dictation_service/vosk_dictation.py` - Main dictation logic
- Model files for different languages
- Audio settings and formatting
## Files Created
- `dictation.service` - Systemd service file
- `toggle-dictation.sh` - Dictation control script
- `setup-keybindings.sh` - Keybinding configuration script
## Removing the Service
```bash
# Stop and disable service
systemctl --user stop dictation.service
systemctl --user disable dictation.service
# Remove service file
sudo rm /etc/systemd/system/dictation.service
sudo systemctl daemon-reload
# Remove keybinding
gsettings reset org.gnome.settings-daemon.plugins.media-keys custom-keybindings
```