Multimodal Input App

An accessible input web application featuring multiple modalities - Head-Gaze, Voice Recognition, Switch Control along with LLM-powered keyboard prediction for text entry and interaction. Utilizes Text Generation, Speech-to-Text, and Text-to-Speech services for web applications from huggingface/transformers.js. And utilizes Tracky-Mouse API for head tracking based cursor control.

Find the detailed README here.

Setup

cd App
npm install
npm run dev

Features

1. LLM-Based Text Prediction Keyboard

LLM-powered predictions using Xenova/distilgpt2 from huggingface/transformers.js

Blue keys represent the LLM's next word predictions

2. Speech Recognition

Speech-to-text using "Xenova/whisper-tiny.en" and text-to-speech using Web Speech API

3. Head Tracking

Head movement-based cursor control using Tracky-Mouse API

4. Switch Control

Single-switch scanning interface for accessibility. Auto-scanning through keyboard rows with individual key highlighting.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
App		App
assets		assets
evaluation-data		evaluation-data
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo-video.mp4		demo-video.mp4
final-report.pdf		final-report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multimodal Input App

Setup

Features

1. LLM-Based Text Prediction Keyboard

2. Speech Recognition

3. Head Tracking

4. Switch Control

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

guntas-13/multimodal-input-with-gaze-switch-voice-llm

Folders and files

Latest commit

History

Repository files navigation

Multimodal Input App

Setup

Features

1. LLM-Based Text Prediction Keyboard

2. Speech Recognition

3. Head Tracking

4. Switch Control

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages