- Pull Request: #34 - Add MariaDB Integration with Native VECTOR Support
- Hackathon Platform: MariaDB Python Hackathon
- Demo Video: YouTube Demo
- Original Repository: MindSQL by Mindinventory
- Code Repository: Complete implementation with MariaDB native VECTOR(384) support
- Pull Request: PR #34 submitted to MindSQL repository
- Documentation: Comprehensive README, API reference, usage examples
- Demo Video: 2-4 minute YouTube video showcasing the project
- LinkedIn Post: Announced submission on LinkedIn
This project integrates MariaDB's native VECTOR(384) data type with MindSQL, a Python RAG framework for text-to-SQL conversion. Production-ready vector store implementation that enables unified vector-relational storage, eliminating the need for separate vector database infrastructure alongside production MariaDB instances.
- Unified Infrastructure: Single MariaDB instance for relational data and vector embeddings
- Native Performance: Leverages MariaDB's VECTOR(384) data type with ACID guarantees
- Hybrid Search: Combines FULLTEXT indexing with vector similarity
- Query Learning: Persistent memory system that improves accuracy over time
- Production Ready: Comprehensive error handling, connection management, full testing
Organizations using MindSQL with MariaDB face infrastructure fragmentation - separate vector databases (ChromaDB, FAISS) required alongside MariaDB, increasing operational complexity, costs, and network latency.
Native MariaDB Vector Store implementing MindSQL's IVectorstore interface with three core capabilities:
- Semantic Schema Intelligence: Automatically vectorizes DDL schemas using VECTOR(384) columns
- AI-Powered Query Learning: Stores successful question-SQL pairs for continuous improvement
- Intelligent Query Optimization: Combines FULLTEXT search with vector similarity for optimal context retrieval
- Database: MariaDB 11.7+ (VECTOR support required)
- Embeddings: sentence-transformers/all-MiniLM-L6-v2 (384 dimensions)
- Connector: Official mariadb Python package
- Framework: MindSQL RAG Core
- LLM Support: Google Gemini, OpenAI, Ollama, Llama
Main Collection (mindsql_vectors):
CREATE TABLE mindsql_vectors (
id VARCHAR(36) PRIMARY KEY,
document TEXT NOT NULL,
embedding VECTOR(384) NOT NULL,
metadata JSON,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_created_at (created_at),
FULLTEXT(document)
) ENGINE=InnoDB;Query Memory (mindsql_vectors_sql_pairs):
CREATE TABLE mindsql_vectors_sql_pairs (
id VARCHAR(36) PRIMARY KEY,
question TEXT NOT NULL,
sql_query TEXT NOT NULL,
embedding VECTOR(384) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FULLTEXT(question, sql_query)
) ENGINE=InnoDB;- Embedding: User query → 384-dimensional vector
- Retrieval: FULLTEXT + vector similarity → relevant DDLs and examples
- Augmentation: Context enriches LLM prompt
- Generation: LLM generates SQL with context
- Learning: Successful pairs stored for future use
- Python 3.11+
- MariaDB 11.7+ with VECTOR support
- 4GB RAM minimum
Windows:
choco install mariadbOr download from MariaDB Downloads
Linux:
sudo apt update
sudo apt install mariadb-server mariadb-client
sudo systemctl start mariadb
sudo systemctl enable mariadbmacOS:
brew install mariadb
brew services start mariadbVerify installation:
mariadb --versionVersion must be 10.7 or higher for VECTOR support.
mariadb -u root -pCreate database and user:
CREATE DATABASE mindsql_demo;
CREATE USER 'mindsql_user'@'localhost' IDENTIFIED BY 'mindsql_password';
GRANT ALL PRIVILEGES ON mindsql_demo.* TO 'mindsql_user'@'localhost';
FLUSH PRIVILEGES;
EXIT;git clone https://github.com/Mindinventory/MindSQL.git
cd MindSQLInstall Python packages:
pip install mariadb
pip install sentence-transformers
pip install google-generativeai
pip install rich
pip install python-dotenv
pip install pandas
pip install numpyOr install all at once:
pip install -r requirements_demo.txt- Go to Google AI Studio
- Create a new API key
- Copy the key
Create .env file in project root:
API_KEY=your_google_gemini_api_key_here
LLM_MODEL=gemini-1.5-flash
DB_URL=mariadb://mindsql_user:mindsql_password@localhost:3306/mindsql_demomariadb -u mindsql_user -p mindsql_demoCreate sample tables:
CREATE TABLE customers (
customer_id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(100),
email VARCHAR(100),
city VARCHAR(50),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE orders (
order_id INT PRIMARY KEY AUTO_INCREMENT,
customer_id INT,
order_date DATE,
total_amount DECIMAL(10,2),
status VARCHAR(20),
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);
INSERT INTO customers (name, email, city) VALUES
('John Doe', 'john@email.com', 'New York'),
('Jane Smith', 'jane@email.com', 'Los Angeles'),
('Bob Johnson', 'bob@email.com', 'Chicago');
INSERT INTO orders (customer_id, order_date, total_amount, status) VALUES
(1, '2024-10-15', 150.00, 'completed'),
(2, '2024-10-20', 200.00, 'completed'),
(1, '2024-10-25', 75.00, 'pending');
EXIT;cd tests
python mindsql_demo_cli.pyThe demo will:
- Connect to MariaDB
- Discover your tables automatically
- Index table schemas into vector store
- Let you ask questions in natural language
Once demo is running, try these questions:
Show all customers
Which customers are from New York?
What are the total orders for each customer?
Show pending orders
from mindsql.core import MindSQLCore
from mindsql.databases import MariaDB
from mindsql.vectorstores import MariaDBVectorStore
from mindsql.llms import GoogleGenAi
# Configure components
vector_config = {
'host': 'localhost',
'port': 3306,
'user': 'your_username',
'password': 'your_password',
'database': 'your_database',
'collection_name': 'mindsql_vectors'
}
llm_config = {
'api_key': 'your_api_key',
'model': 'gemini-1.5-flash'
}
# Initialize MindSQL with MariaDB Vector Store
minds = MindSQLCore(
database=MariaDB(),
vectorstore=MariaDBVectorStore(config=vector_config),
llm=GoogleGenAi(config=llm_config)
)
# Create connection and index schemas
connection = minds.database.create_connection(
url="mariadb://user:pass@localhost:3306/mydb"
)
minds.index_all_ddls(connection=connection, db_name='mydb')
# Natural language to SQL
response = minds.ask_db(
question="Find customers who haven't ordered in 3 months",
connection=connection
)
print(response['sql'])
print(response['result'])
connection.close()class MariaDBVectorStore(IVectorstore):
"""MariaDB Vector Store implementation."""
def __init__(self, config: dict):
"""Initialize with connection parameters.
Args:
config: Dict with host, port, user, password, database, collection_name
"""
def index_ddl(self, ddl: str, **kwargs) -> str:
"""Index a DDL statement. Returns success/error message."""
def index_question_sql(self, question: str, sql: str, **kwargs) -> str:
"""Index question-SQL pair for learning."""
def retrieve_relevant_ddl(self, question: str, **kwargs) -> list:
"""Retrieve relevant DDL statements."""
def retrieve_relevant_question_sql(self, question: str, **kwargs) -> list:
"""Retrieve similar question-SQL pairs with scores."""
def index_documentation(self, documentation: str, **kwargs) -> str:
"""Index documentation text."""
def fetch_all_vectorstore_data(self, **kwargs) -> pd.DataFrame:
"""Fetch all stored data as DataFrame."""
def delete_vectorstore_data(self, item_id: str, **kwargs) -> bool:
"""Delete specific entry. Returns success boolean."""We welcome contributions! This integration was created for MariaDB Python Hackathon 2025.
git clone https://github.com/YOUR_USERNAME/MindSQL.git
cd MindSQL
git checkout -b feature/your-feature
python -m venv venv
source venv/bin/activate
pip install -r requirements_demo.txt
pip install pytest black flake8
# Make changes and test
pytest tests/ -v
black mindsql/ tests/
flake8 mindsql/ tests/
git commit -m "feat: your feature"
git push origin feature/your-featureSee CONTRIBUTING.md for detailed guidelines.
Bug Reports: Description, reproduction steps, environment details, error messages
Feature Requests: Clear description, use case, proposed solution
Thank you for organizing the MariaDB Python Hackathon 2025, developing native VECTOR data type support, and maintaining the MariaDB Python connector.
Hackathon: MariaDB Python Hackathon 2025 - Integration Track
Thank you to the MindSQL maintainers for creating an excellent RAG framework with clean, extensible architecture.
Repository: https://github.com/Mindinventory/MindSQL
Built upon outstanding work from sentence-transformers, MariaDB Server, and the Python ecosystem.
- GitHub Issues: MindSQL Issues
- Security: See SECURITY.md for vulnerability reporting
Team Name: Squirtle Squad
Track: Integration
Project: Native MariaDB Vector Store for MindSQL RAG Framework
