Skip to content

MindSQL: A Python Text-to-SQL RAG Library simplifying database interactions. Seamlessly integrates with MariaDB, PostgreSQL, MySQL, SQLite, Snowflake, and BigQuery. Powered by GPT-4 and Llama 2, it enables natural language queries. Supports MariaDB Vector, ChromaDB and Faiss for context-aware responses.

License

Notifications You must be signed in to change notification settings

CroWzblooD/MindSQL

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

mariadbimage

Native MariaDB Vector Store Support for MindSQL

MariaDB Python Hackathon 2025 - Integration Track

Important Links

Submission Checklist

  • Code Repository: Complete implementation with MariaDB native VECTOR(384) support
  • Pull Request: PR #34 submitted to MindSQL repository
  • Documentation: Comprehensive README, API reference, usage examples
  • Demo Video: 2-4 minute YouTube video showcasing the project
  • LinkedIn Post: Announced submission on LinkedIn

Overview

Gemini_Generated_Image_d93qe1d93qe1d93q

This project integrates MariaDB's native VECTOR(384) data type with MindSQL, a Python RAG framework for text-to-SQL conversion. Production-ready vector store implementation that enables unified vector-relational storage, eliminating the need for separate vector database infrastructure alongside production MariaDB instances.

Key Benefits

  • Unified Infrastructure: Single MariaDB instance for relational data and vector embeddings
  • Native Performance: Leverages MariaDB's VECTOR(384) data type with ACID guarantees
  • Hybrid Search: Combines FULLTEXT indexing with vector similarity
  • Query Learning: Persistent memory system that improves accuracy over time
  • Production Ready: Comprehensive error handling, connection management, full testing

Problem & Solution

The Problem

Organizations using MindSQL with MariaDB face infrastructure fragmentation - separate vector databases (ChromaDB, FAISS) required alongside MariaDB, increasing operational complexity, costs, and network latency.

Our Solution

Native MariaDB Vector Store implementing MindSQL's IVectorstore interface with three core capabilities:

  1. Semantic Schema Intelligence: Automatically vectorizes DDL schemas using VECTOR(384) columns
  2. AI-Powered Query Learning: Stores successful question-SQL pairs for continuous improvement
  3. Intelligent Query Optimization: Combines FULLTEXT search with vector similarity for optimal context retrieval

Technology Stack

  • Database: MariaDB 11.7+ (VECTOR support required)
  • Embeddings: sentence-transformers/all-MiniLM-L6-v2 (384 dimensions)
  • Connector: Official mariadb Python package
  • Framework: MindSQL RAG Core
  • LLM Support: Google Gemini, OpenAI, Ollama, Llama

Architecture

Screenshot 2025-10-11 000201

Database Schema

Main Collection (mindsql_vectors):

CREATE TABLE mindsql_vectors (
    id VARCHAR(36) PRIMARY KEY,
    document TEXT NOT NULL,
    embedding VECTOR(384) NOT NULL,
    metadata JSON,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    INDEX idx_created_at (created_at),
    FULLTEXT(document)
) ENGINE=InnoDB;

Query Memory (mindsql_vectors_sql_pairs):

CREATE TABLE mindsql_vectors_sql_pairs (
    id VARCHAR(36) PRIMARY KEY,
    question TEXT NOT NULL,
    sql_query TEXT NOT NULL,
    embedding VECTOR(384) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FULLTEXT(question, sql_query)
) ENGINE=InnoDB;

RAG Pipeline

  1. Embedding: User query → 384-dimensional vector
  2. Retrieval: FULLTEXT + vector similarity → relevant DDLs and examples
  3. Augmentation: Context enriches LLM prompt
  4. Generation: LLM generates SQL with context
  5. Learning: Successful pairs stored for future use

Installation

Prerequisites

  • Python 3.11+
  • MariaDB 11.7+ with VECTOR support
  • 4GB RAM minimum

Installation & Setup

Step 1: Install MariaDB

Windows:

choco install mariadb

Or download from MariaDB Downloads

Linux:

sudo apt update
sudo apt install mariadb-server mariadb-client
sudo systemctl start mariadb
sudo systemctl enable mariadb

macOS:

brew install mariadb
brew services start mariadb

Verify installation:

mariadb --version

Version must be 10.7 or higher for VECTOR support.

Step 2: Setup MariaDB Database

mariadb -u root -p

Create database and user:

CREATE DATABASE mindsql_demo;
CREATE USER 'mindsql_user'@'localhost' IDENTIFIED BY 'mindsql_password';
GRANT ALL PRIVILEGES ON mindsql_demo.* TO 'mindsql_user'@'localhost';
FLUSH PRIVILEGES;
EXIT;

Step 3: Clone and Install Python Dependencies

git clone https://github.com/Mindinventory/MindSQL.git
cd MindSQL

Install Python packages:

pip install mariadb
pip install sentence-transformers
pip install google-generativeai
pip install rich
pip install python-dotenv
pip install pandas
pip install numpy

Or install all at once:

pip install -r requirements_demo.txt

Step 4: Get Google Gemini API Key

  1. Go to Google AI Studio
  2. Create a new API key
  3. Copy the key

Step 5: Configure Environment

Create .env file in project root:

API_KEY=your_google_gemini_api_key_here
LLM_MODEL=gemini-1.5-flash
DB_URL=mariadb://mindsql_user:mindsql_password@localhost:3306/mindsql_demo

Step 6: Add Sample Data

mariadb -u mindsql_user -p mindsql_demo

Create sample tables:

CREATE TABLE customers (
    customer_id INT PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(100),
    email VARCHAR(100),
    city VARCHAR(50),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE orders (
    order_id INT PRIMARY KEY AUTO_INCREMENT,
    customer_id INT,
    order_date DATE,
    total_amount DECIMAL(10,2),
    status VARCHAR(20),
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

INSERT INTO customers (name, email, city) VALUES
('John Doe', 'john@email.com', 'New York'),
('Jane Smith', 'jane@email.com', 'Los Angeles'),
('Bob Johnson', 'bob@email.com', 'Chicago');

INSERT INTO orders (customer_id, order_date, total_amount, status) VALUES
(1, '2024-10-15', 150.00, 'completed'),
(2, '2024-10-20', 200.00, 'completed'),
(1, '2024-10-25', 75.00, 'pending');

EXIT;

Step 7: Run Demo CLI

cd tests
python mindsql_demo_cli.py

The demo will:

  1. Connect to MariaDB
  2. Discover your tables automatically
  3. Index table schemas into vector store
  4. Let you ask questions in natural language

Step 8: Try Sample Queries

Once demo is running, try these questions:

Show all customers
Which customers are from New York?
What are the total orders for each customer?
Show pending orders

Usage

Basic Example

from mindsql.core import MindSQLCore
from mindsql.databases import MariaDB
from mindsql.vectorstores import MariaDBVectorStore
from mindsql.llms import GoogleGenAi

# Configure components
vector_config = {
    'host': 'localhost',
    'port': 3306,
    'user': 'your_username',
    'password': 'your_password',
    'database': 'your_database',
    'collection_name': 'mindsql_vectors'
}

llm_config = {
    'api_key': 'your_api_key',
    'model': 'gemini-1.5-flash'
}

# Initialize MindSQL with MariaDB Vector Store
minds = MindSQLCore(
    database=MariaDB(),
    vectorstore=MariaDBVectorStore(config=vector_config),
    llm=GoogleGenAi(config=llm_config)
)

# Create connection and index schemas
connection = minds.database.create_connection(
    url="mariadb://user:pass@localhost:3306/mydb"
)
minds.index_all_ddls(connection=connection, db_name='mydb')

# Natural language to SQL
response = minds.ask_db(
    question="Find customers who haven't ordered in 3 months",
    connection=connection
)

print(response['sql'])
print(response['result'])
connection.close()

API Reference

MariaDBVectorStore Class

class MariaDBVectorStore(IVectorstore):
    """MariaDB Vector Store implementation."""
    
    def __init__(self, config: dict):
        """Initialize with connection parameters.
        
        Args:
            config: Dict with host, port, user, password, database, collection_name
        """
    
    def index_ddl(self, ddl: str, **kwargs) -> str:
        """Index a DDL statement. Returns success/error message."""
    
    def index_question_sql(self, question: str, sql: str, **kwargs) -> str:
        """Index question-SQL pair for learning."""
    
    def retrieve_relevant_ddl(self, question: str, **kwargs) -> list:
        """Retrieve relevant DDL statements."""
    
    def retrieve_relevant_question_sql(self, question: str, **kwargs) -> list:
        """Retrieve similar question-SQL pairs with scores."""
    
    def index_documentation(self, documentation: str, **kwargs) -> str:
        """Index documentation text."""
    
    def fetch_all_vectorstore_data(self, **kwargs) -> pd.DataFrame:
        """Fetch all stored data as DataFrame."""
    
    def delete_vectorstore_data(self, item_id: str, **kwargs) -> bool:
        """Delete specific entry. Returns success boolean."""

Contributing

We welcome contributions! This integration was created for MariaDB Python Hackathon 2025.

Quick Start

git clone https://github.com/YOUR_USERNAME/MindSQL.git
cd MindSQL
git checkout -b feature/your-feature

python -m venv venv
source venv/bin/activate
pip install -r requirements_demo.txt
pip install pytest black flake8

# Make changes and test
pytest tests/ -v
black mindsql/ tests/
flake8 mindsql/ tests/

git commit -m "feat: your feature"
git push origin feature/your-feature

See CONTRIBUTING.md for detailed guidelines.

Reporting Issues

Bug Reports: Description, reproduction steps, environment details, error messages

Feature Requests: Clear description, use case, proposed solution


Acknowledgments

MariaDB Foundation

Thank you for organizing the MariaDB Python Hackathon 2025, developing native VECTOR data type support, and maintaining the MariaDB Python connector.

Hackathon: MariaDB Python Hackathon 2025 - Integration Track

MindSQL Project

Thank you to the MindSQL maintainers for creating an excellent RAG framework with clean, extensible architecture.

Repository: https://github.com/Mindinventory/MindSQL

Open Source Community

Built upon outstanding work from sentence-transformers, MariaDB Server, and the Python ecosystem.


Resources

Documentation

Support


Team

Team Name: Squirtle Squad
Track: Integration
Project: Native MariaDB Vector Store for MindSQL RAG Framework

About

MindSQL: A Python Text-to-SQL RAG Library simplifying database interactions. Seamlessly integrates with MariaDB, PostgreSQL, MySQL, SQLite, Snowflake, and BigQuery. Powered by GPT-4 and Llama 2, it enables natural language queries. Supports MariaDB Vector, ChromaDB and Faiss for context-aware responses.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%