Skip to content

guntas-13/multimodal-input-with-gaze-switch-voice-llm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multimodal Input App

An accessible input web application featuring multiple modalities - Head-Gaze, Voice Recognition, Switch Control along with LLM-powered keyboard prediction for text entry and interaction. Utilizes Text Generation, Speech-to-Text, and Text-to-Speech services for web applications from huggingface/transformers.js. And utilizes Tracky-Mouse API for head tracking based cursor control.

Find the detailed README here.

Setup

cd App
npm install
npm run dev

Features

1. LLM-Based Text Prediction Keyboard

LLM-powered predictions using Xenova/distilgpt2 from huggingface/transformers.js

Blue keys represent the LLM's next word predictions

2. Speech Recognition

Speech-to-text using "Xenova/whisper-tiny.en" and text-to-speech using Web Speech API

3. Head Tracking

Head movement-based cursor control using Tracky-Mouse API

4. Switch Control

Single-switch scanning interface for accessibility. Auto-scanning through keyboard rows with individual key highlighting.