A powerful speech-to-text application that integrates with Ollama AI models for seamless voice interaction
This project provides two Python applications for speech-to-text conversion that work seamlessly with Ollama AI models:
ollama_stt_app.py- Advanced STT application with silence detectionollama_stt_simple.py- Enhanced STT with additional audio processing featuresweb_portal.py- Beautiful web interface for easy STT interaction (NEW! ๐)
- Real-time microphone recording with automatic silence detection
- Ambient noise adjustment for optimal recognition
- Configurable recording duration and silence thresholds
- Multiple audio format support (via PyDub)
- Google Speech Recognition (default, online)
- OpenAI Whisper support (offline capable)
- Sphinx fallback for offline recognition
- Automatic engine fallback on failure
- Ollama AI model integration (LLaMA 3.1 ready)
- Automatic TTS forwarding to companion scripts
- Transcription history saving with timestamps
- Command-line interface with full argument support
- Modern web portal with beautiful UI on localhost:55667
- Real-time voice recording with visual feedback
- Drag & drop file uploads for audio transcription
- Ollama integration directly from the web interface
- Transcription history with easy management
- Responsive design for desktop and mobile
- One-click setup with automated dependency installation
- Automatic package installation on first run
- Dependency checking and error handling
- Cross-platform compatibility (Windows optimized)
The application now automatically installs all required dependencies on first run!
Simply run the application and it will:
- โ Check for missing dependencies
- ๐ฆ Automatically install any missing packages
- ๐ Start the application once everything is ready
This repository MUST be located in your Ollama home directory for proper functionality.
- Windows:
C:\Users\{username}\.ollama\ - macOS:
~/.ollama/ - Linux:
~/.ollama/
The application relies on the Ollama model storage structure and configuration files that are located in the .ollama directory. Running from any other location is not guaranteed to work properly.
- Python 3.7 or higher
- Ollama installed and running
- Microphone access
- Repository cloned/extracted to your
.ollamadirectory
# IMPORTANT: Clone to your .ollama directory
# Windows:
cd C:\Users\{username}\.ollama
git clone <repository-url> .
# macOS/Linux:
cd ~/.ollama
git clone <repository-url> .
# Run the web portal (dependencies auto-install!)
python web_portal.py
# Or run the console versions
python ollama_stt_app.py
python ollama_stt_simple.py
# On Windows, you can also use the batch files:
start_portal.bat
# or
run_portal.batIf automatic installation fails, you can manually install dependencies:
pip install -r requirements.txtOr install packages individually:
pip install Flask==3.0.0 SpeechRecognition==3.10.0 pyaudio==0.2.11 pydub==0.25.1 Werkzeug==3.0.1# Start the STT application
python ollama_stt_app.py
# With custom settings
python ollama_stt_app.py --duration 30 --silence-threshold 3.0python ollama_stt_simple.py \
--duration 60 \
--silence-threshold 2.0 \
--engine whisper \
--model llama3.1:latest \
--tts_script ollama_tts_app.py \
--voice default \
--save-path ~/Documents/transcriptions/ \
--verbose| Argument | Description | Default |
|---|---|---|
--duration |
Maximum recording duration (seconds) | 60 |
--silence-threshold |
Silence detection threshold (seconds) | 2.0 |
--engine |
Speech recognition engine | google |
--model |
Ollama model to use | llama3.1:latest |
--tts_script |
Path to TTS script | ollama_tts_app.py |
--voice |
Voice for TTS output | default |
--save-path |
Custom save path for transcriptions | ~/Documents/SchmidtSims/STTHistory/ |
--verbose |
Enable verbose output | False |
- LLaMA 3.1 (default) - High-quality text processing
- Custom Ollama models - Any model available in your Ollama installation
๐ Project Structure
โโโ ๐๏ธ ollama_stt_app.py # Main STT application
โโโ ๐๏ธ ollama_stt_simple.py # Enhanced STT with audio processing
โโโ ๐ readme # This file
โโโ ๐ id_xxxxx # SSH key (private)
โโโ ๐ id_xxxxx.pub # SSH key (public)
โโโ ๐ models/ # Ollama model storage
โโโ ๐ blobs/ # Model binary data
โโโ ๐ manifests/ # Model metadata
โโโ ๐ registry.ollama.ai/
โโโ ๐ library/
โโโ ๐ llama3.1/
โโโ latest # LLaMA 3.1 model manifest
graph TD
A[๐ค Start Recording] --> B[๐ Detect Audio]
B --> C{Silence > Threshold?}
C -->|No| B
C -->|Yes| D[โน๏ธ Stop Recording]
D --> E[๐ค Transcribe Audio]
E --> F[๐พ Save Transcription]
F --> G[๐ Forward to TTS]
G --> H[๐ค Ollama Processing]
H --> I[๐ Audio Output]
The application provides rich console output with emojis and status indicators:
- ๐ค Recording status - Visual feedback during audio capture
- ๐ด Recording indicator - Shows when actively recording
- ๐ต Audio detection - Real-time audio level indication
- ๐ Silence detection - Shows when silence is detected
- ๐ Transcription - Displays recognized text
- โ Success messages - Confirmation of operations
- โ Error handling - Clear error messages and fallbacks
If you're experiencing issues with model loading or file paths:
- Ensure the repository is in your .ollama directory
- Windows:
C:\Users\{username}\.ollama\ - macOS/Linux:
~/.ollama/ - Check current directory: Use
pwd(Unix) orcd(Windows) to verify location - Reinstall if needed: Move the entire repository to the correct location
pip install --upgrade pip setuptools wheel
pip install pyaudio- Check microphone permissions
- Ensure microphone is not used by other applications
- Try running as administrator
- Verify Ollama is running:
ollama serve - Check model availability:
ollama list - Ensure model is pulled:
ollama pull llama3.1
- Recording latency: < 100ms
- Transcription speed: ~2-5 seconds (Google API)
- Offline recognition: Available via Sphinx
- Memory usage: < 50MB typical
- Supported audio formats: WAV, MP3, FLAC, OGG (via PyDub)
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Ollama - For providing excellent local AI model serving
- SpeechRecognition - For the robust speech recognition library
- PyAudio - For audio capture capabilities
- OpenAI Whisper - For offline speech recognition
- Google Speech API - For high-quality online recognition
For issues, questions, or contributions, please:
- Open an issue on the repository
- Check the troubleshooting section above
- Review the Ollama documentation
Made with โค๏ธ for the Ollama community
Experience the easiest way to use Ollama STT with our beautiful web interface!
# Windows users - navigate to .ollama directory first:
cd C:\Users\{username}\.ollama
start_portal.bat
# macOS/Linux users:
cd ~/.ollama
pip install -r requirements.txt
python web_portal.py- ๐๏ธ Voice Recording: Click to record, automatic transcription
- ๐ File Upload: Drag & drop audio files for instant transcription
- ๐ค Ollama Integration: Send transcribed text directly to AI models
- ๐ Live Status: Real-time system status and health monitoring
- ๐ History: Complete transcription history with search and management
- ๐จ Beautiful UI: Modern, responsive design with smooth animations
- โก Real-time: Instant feedback and processing indicators
- Navigate to .ollama directory:
cd ~/.ollama(orcd C:\Users\{username}\.ollamaon Windows) - Start the server: Run
start_portal.batorpython web_portal.py - Open browser: Navigate to
http://localhost:55667 - Check status: Green indicators mean systems are ready
- Start transcribing: Use microphone or upload audio files!