Skip to content

A SQLite extension for generating text embeddings from remote APIs (OpenAI, Nomic, Ollama, llamafile...)

License

Notifications You must be signed in to change notification settings

rsp2k/sqlite-rembed

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sqlite-rembed

Turn SQLite into an AI powerhouse. Generate embeddings from any AI provider with pure SQL.

CI MIT/Apache 2.0 Rust SQLite

-- One line. Any provider. Instant embeddings.
SELECT rembed('openai', 'Hello, universe');

Why This Exists

You have data in SQLite. You need embeddings. This bridges that gap with zero friction.

Features that matter:

  • Every major AI provider - OpenAI, Gemini, Anthropic, Ollama, and 10+ more
  • Batch processing - 1000 embeddings in one API call instead of 1000 calls
  • Multimodal - Text today, images tomorrow
  • Just SQL - No new languages, no new tools

Install

# Coming to PyPI. For now:
git clone https://github.com/asg017/sqlite-rembed && cd sqlite-rembed
make loadable

Or grab a binary release.

Use It

.load ./rembed0

-- Pick your provider
INSERT INTO temp.rembed_clients(name, options) VALUES
  ('openai', 'openai:sk-YOUR-KEY'),
  ('gemini', 'gemini:AIza-YOUR-KEY'),
  ('local', 'ollama::nomic-embed-text');  -- No key needed

-- Generate embeddings
SELECT rembed('openai', 'The future is distributed');

-- Batch mode: 1000 texts, 1 API call
SELECT rembed_batch('openai',
  json_array('text1', 'text2', 'text3', /*...*/ 'text1000')
);

-- Images? We do that too
SELECT rembed_image('local', readfile('photo.jpg'));

Python? pip install sqlite-rembed (coming soon) or see Python docs.

Real World Example: Semantic Search

-- Your data
CREATE TABLE articles(headline TEXT);
INSERT INTO articles VALUES
  ('Shohei Ohtani''s ex-interpreter pleads guilty'),
  ('Hunter Biden''s gun trial jury selected'),
  ('Larry Allen, Dallas Cowboys legend, dies at 52');

-- Add vector search (requires sqlite-vec)
CREATE VIRTUAL TABLE vec_articles USING vec0(embedding float[1536]);

-- Generate embeddings for all articles (one API call!)
WITH batch AS (
  SELECT json_group_array(headline) as texts,
         json_group_array(rowid) as ids
  FROM articles
)
INSERT INTO vec_articles
SELECT json_extract(ids, '$[' || key || ']'),
       base64_decode(value)
FROM batch, json_each(rembed_batch('openai', texts));

-- Search semantically
SELECT headline FROM articles
WHERE rowid IN (
  SELECT rowid FROM vec_articles
  WHERE embedding MATCH rembed('openai', 'legal proceedings')
  LIMIT 2
);
-- Returns: Hunter Biden and Shohei Ohtani articles

Configuration

-- Method 1: Direct
INSERT INTO temp.rembed_clients(name, options)
VALUES ('fast', 'openai:sk-YOUR-KEY');

-- Method 2: Environment variable
-- export OPENAI_API_KEY="sk-YOUR-KEY"
INSERT INTO temp.rembed_clients(name, options)
VALUES ('fast', 'openai::text-embedding-3-small');

-- Method 3: Advanced options
INSERT INTO temp.rembed_clients(name, options) VALUES
('custom', rembed_client_options(
    'format', 'openai',
    'model', 'text-embedding-3-large',
    'key', 'sk-YOUR-KEY'
));

Supported Providers

Powered by genai. All the providers you need:

  • OpenAI - openai::text-embedding-3-small
  • Gemini - gemini::text-embedding-004
  • Anthropic - anthropic::voyage-3
  • Ollama - ollama::nomic-embed-text (local, free)
  • Groq - groq::llama-3.3-70b
  • Cohere - cohere::embed-english-v3.0
  • Mistral - mistral::mistral-embed
  • DeepSeek, XAI, and more...

API

-- Core functions
rembed(client, text)                    -- Single embedding
rembed_batch(client, json_array)        -- Batch embeddings
rembed_image(client, image_blob)        -- Image embedding

-- Multimodal batch processing
rembed_images_batch(client, json_array)
rembed_images_concurrent(client, json_array)

-- Utilities
rembed_version()                        -- Extension version
rembed_debug()                          -- Debug info
rembed_client_options(...)              -- Advanced config

-- Virtual table for client management
INSERT INTO temp.rembed_clients(name, options) VALUES (...);
SELECT * FROM temp.rembed_clients;

Full docs: API Reference

Related

  • sqlite-vec - Vector search that pairs perfectly with this
  • sqlite-lembed - Local embeddings when you need offline
  • genai - The engine under the hood

License

MIT/Apache-2.0. Use it however you want.

About

A SQLite extension for generating text embeddings from remote APIs (OpenAI, Nomic, Ollama, llamafile...)

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 56.3%
  • Rust 38.7%
  • Makefile 4.3%
  • Other 0.7%