Skip to content

KoJesko/ollama-TTS-STT-App

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

23 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐ŸŽ™๏ธ Ollama STT (Speech-to-Text) Application

Python Ollama License Platform

๐ŸŒŸ About

A powerful speech-to-text application that integrates with Ollama AI models for seamless voice interaction

๐ŸŒŸ Overview

This project provides two Python applications for speech-to-text conversion that work seamlessly with Ollama AI models:

  • ollama_stt_app.py - Advanced STT application with silence detection
  • ollama_stt_simple.py - Enhanced STT with additional audio processing features
  • web_portal.py - Beautiful web interface for easy STT interaction (NEW! ๐ŸŒ)

๐ŸŽฏ Features

๐Ÿ”Š Audio Processing

  • Real-time microphone recording with automatic silence detection
  • Ambient noise adjustment for optimal recognition
  • Configurable recording duration and silence thresholds
  • Multiple audio format support (via PyDub)

๐Ÿค– Speech Recognition

  • Google Speech Recognition (default, online)
  • OpenAI Whisper support (offline capable)
  • Sphinx fallback for offline recognition
  • Automatic engine fallback on failure

๐Ÿš€ Integration

  • Ollama AI model integration (LLaMA 3.1 ready)
  • Automatic TTS forwarding to companion scripts
  • Transcription history saving with timestamps
  • Command-line interface with full argument support

๐ŸŒ Web Interface

  • Modern web portal with beautiful UI on localhost:55667
  • Real-time voice recording with visual feedback
  • Drag & drop file uploads for audio transcription
  • Ollama integration directly from the web interface
  • Transcription history with easy management
  • Responsive design for desktop and mobile
  • One-click setup with automated dependency installation

๐Ÿ› ๏ธ Smart Dependencies

  • Automatic package installation on first run
  • Dependency checking and error handling
  • Cross-platform compatibility (Windows optimized)

๐Ÿ“ฆ Installation

โšก Automatic Installation (Recommended)

The application now automatically installs all required dependencies on first run!

Simply run the application and it will:

  • โœ… Check for missing dependencies
  • ๐Ÿ“ฆ Automatically install any missing packages
  • ๐Ÿš€ Start the application once everything is ready

โš ๏ธ Important: Repository Location

This repository MUST be located in your Ollama home directory for proper functionality.

  • Windows: C:\Users\{username}\.ollama\
  • macOS: ~/.ollama/
  • Linux: ~/.ollama/

The application relies on the Ollama model storage structure and configuration files that are located in the .ollama directory. Running from any other location is not guaranteed to work properly.

Prerequisites

  • Python 3.7 or higher
  • Ollama installed and running
  • Microphone access
  • Repository cloned/extracted to your .ollama directory

Quick Start

# IMPORTANT: Clone to your .ollama directory
# Windows:
cd C:\Users\{username}\.ollama
git clone <repository-url> .

# macOS/Linux:
cd ~/.ollama
git clone <repository-url> .

# Run the web portal (dependencies auto-install!)
python web_portal.py

# Or run the console versions
python ollama_stt_app.py
python ollama_stt_simple.py

# On Windows, you can also use the batch files:
start_portal.bat
# or
run_portal.bat

Manual Installation (Only if needed)

If automatic installation fails, you can manually install dependencies:

pip install -r requirements.txt

Or install packages individually:

pip install Flask==3.0.0 SpeechRecognition==3.10.0 pyaudio==0.2.11 pydub==0.25.1 Werkzeug==3.0.1

๐ŸŽฎ Usage

Basic Usage

# Start the STT application
python ollama_stt_app.py

# With custom settings
python ollama_stt_app.py --duration 30 --silence-threshold 3.0

Advanced Options

python ollama_stt_simple.py \
    --duration 60 \
    --silence-threshold 2.0 \
    --engine whisper \
    --model llama3.1:latest \
    --tts_script ollama_tts_app.py \
    --voice default \
    --save-path ~/Documents/transcriptions/ \
    --verbose

๐Ÿ”ง Configuration

Command Line Arguments

Argument Description Default
--duration Maximum recording duration (seconds) 60
--silence-threshold Silence detection threshold (seconds) 2.0
--engine Speech recognition engine google
--model Ollama model to use llama3.1:latest
--tts_script Path to TTS script ollama_tts_app.py
--voice Voice for TTS output default
--save-path Custom save path for transcriptions ~/Documents/SchmidtSims/STTHistory/
--verbose Enable verbose output False

Supported Models

  • LLaMA 3.1 (default) - High-quality text processing
  • Custom Ollama models - Any model available in your Ollama installation

๐Ÿ—๏ธ Architecture

๐Ÿ“ Project Structure
โ”œโ”€โ”€ ๐ŸŽ™๏ธ ollama_stt_app.py      # Main STT application
โ”œโ”€โ”€ ๐ŸŽ™๏ธ ollama_stt_simple.py   # Enhanced STT with audio processing
โ”œโ”€โ”€ ๐Ÿ“„ readme                  # This file
โ”œโ”€โ”€ ๐Ÿ”‘ id_xxxxx             # SSH key (private)
โ”œโ”€โ”€ ๐Ÿ”‘ id_xxxxx.pub         # SSH key (public)
โ””โ”€โ”€ ๐Ÿ“ models/                # Ollama model storage
    โ”œโ”€โ”€ ๐Ÿ“ blobs/             # Model binary data
    โ””โ”€โ”€ ๐Ÿ“ manifests/         # Model metadata
        โ””โ”€โ”€ ๐Ÿ“ registry.ollama.ai/
            โ””โ”€โ”€ ๐Ÿ“ library/
                โ””โ”€โ”€ ๐Ÿ“ llama3.1/
                    โ””โ”€โ”€ latest # LLaMA 3.1 model manifest

๐Ÿ”„ Workflow

graph TD
    A[๐ŸŽค Start Recording] --> B[๐Ÿ”Š Detect Audio]
    B --> C{Silence > Threshold?}
    C -->|No| B
    C -->|Yes| D[โน๏ธ Stop Recording]
    D --> E[๐Ÿ”ค Transcribe Audio]
    E --> F[๐Ÿ’พ Save Transcription]
    F --> G[๐Ÿ”„ Forward to TTS]
    G --> H[๐Ÿค– Ollama Processing]
    H --> I[๐Ÿ”Š Audio Output]
Loading

๐ŸŽจ Visual Feedback

The application provides rich console output with emojis and status indicators:

  • ๐ŸŽค Recording status - Visual feedback during audio capture
  • ๐Ÿ”ด Recording indicator - Shows when actively recording
  • ๐ŸŽต Audio detection - Real-time audio level indication
  • ๐Ÿ”‡ Silence detection - Shows when silence is detected
  • ๐Ÿ“ Transcription - Displays recognized text
  • โœ… Success messages - Confirmation of operations
  • โŒ Error handling - Clear error messages and fallbacks

๐Ÿ” Troubleshooting

Common Issues

Repository Location Error (CRITICAL)

If you're experiencing issues with model loading or file paths:

  • Ensure the repository is in your .ollama directory
  • Windows: C:\Users\{username}\.ollama\
  • macOS/Linux: ~/.ollama/
  • Check current directory: Use pwd (Unix) or cd (Windows) to verify location
  • Reinstall if needed: Move the entire repository to the correct location

PyAudio Installation Error (Windows)

pip install --upgrade pip setuptools wheel
pip install pyaudio

Microphone Not Found

  • Check microphone permissions
  • Ensure microphone is not used by other applications
  • Try running as administrator

Ollama Connection Error

  • Verify Ollama is running: ollama serve
  • Check model availability: ollama list
  • Ensure model is pulled: ollama pull llama3.1

๐Ÿ“Š Performance

  • Recording latency: < 100ms
  • Transcription speed: ~2-5 seconds (Google API)
  • Offline recognition: Available via Sphinx
  • Memory usage: < 50MB typical
  • Supported audio formats: WAV, MP3, FLAC, OGG (via PyDub)

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Test thoroughly
  5. Submit a pull request

๐Ÿ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • Ollama - For providing excellent local AI model serving
  • SpeechRecognition - For the robust speech recognition library
  • PyAudio - For audio capture capabilities
  • OpenAI Whisper - For offline speech recognition
  • Google Speech API - For high-quality online recognition

๐Ÿ“ž Support

For issues, questions, or contributions, please:

  • Open an issue on the repository
  • Check the troubleshooting section above
  • Review the Ollama documentation

Made with โค๏ธ for the Ollama community

๐ŸŒ Web Portal (NEW!)

Experience the easiest way to use Ollama STT with our beautiful web interface!

Quick Start Web Portal

# Windows users - navigate to .ollama directory first:
cd C:\Users\{username}\.ollama
start_portal.bat

# macOS/Linux users:
cd ~/.ollama
pip install -r requirements.txt
python web_portal.py

Web Portal Features

  • ๐ŸŽ™๏ธ Voice Recording: Click to record, automatic transcription
  • ๐Ÿ“ File Upload: Drag & drop audio files for instant transcription
  • ๐Ÿค– Ollama Integration: Send transcribed text directly to AI models
  • ๐Ÿ“Š Live Status: Real-time system status and health monitoring
  • ๐Ÿ“ History: Complete transcription history with search and management
  • ๐ŸŽจ Beautiful UI: Modern, responsive design with smooth animations
  • โšก Real-time: Instant feedback and processing indicators

Accessing the Web Portal

  1. Navigate to .ollama directory: cd ~/.ollama (or cd C:\Users\{username}\.ollama on Windows)
  2. Start the server: Run start_portal.bat or python web_portal.py
  3. Open browser: Navigate to http://localhost:55667
  4. Check status: Green indicators mean systems are ready
  5. Start transcribing: Use microphone or upload audio files!

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •