Build software better, together

isaacus-dev / semchunk

Star

A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.

python nlp text splitting chunking text-chunking text-splitting semantic-chunking isaacus

Updated Oct 28, 2025
Python

lazyFrogLOL / llmdocparser

Star

A package for parsing PDFs and analyzing their content using LLMs.

nlp ocr chunking document-analysis pdf-parser pdfparser rag llm text-chunking

Updated Aug 6, 2024
Python

jparkerweb / semantic-chunking

Star

🍱 semantic-chunking ⇢ semantically create chunks from large document for passing to LLM workflows

vector embeddings chunking text-splitter llm text-chunking text-splitting semantic-chunking

Updated Oct 22, 2025
JavaScript

drittich / SemanticSlicer

Star

🧠✂️ SemanticSlicer — A smart text chunker for LLM-ready documents.

ai embeddings openai gpt chunking chunker gpt-4 azure-openai llm chatgpt chat-gpt langchain text-chunking

Updated Dec 1, 2025
C#

GregorBiswanger / SemanticChunker.NET

Sponsor

Star

Embedding-driven, context-aware text chunking for Semantic Kernel and RAG workflows in .NET

library ai csharp dotnet chunking slm embedding rag llm semantic-kernel semantickernel text-chunking semanticchunker

Updated Nov 11, 2025
C#

This project is designed to extract text from documents and prepare it for processing by Large Language Models (LLM). Implemented a feature to store and utilize text style information, enabling the program to identify and segment content based on potential headers and titles.

python data-processing text-parsing large-language-models llms text-chunking

Updated Nov 17, 2024
HTML

smart-models / Sentences-Chunker

Star

Cutting-edge tool designed to intelligently segment text documents into optimally-sized chunks

nlp docker-compose gpu-acceleration document-processing rag fastapi text-chunking

Updated Sep 30, 2025
Python

betcorg / llm-text-splitter

Star

A lightweight TypeScript text splitter for RAG applications

chatbots rag text-splitter text-chunking

Updated Mar 9, 2025
TypeScript

philnash / chunkers

Sponsor

Star

An exploration of text splitting and chunking in JavaScript

text-splitter llamaindex langchain-js text-chunking text-splitting

Updated Nov 20, 2025
TypeScript

Besthope-Official / predoc

Star

Preprocess document service for RAG (Retriveal Augumented Generation)

api microservice yolo pdf-parser text-embedding document-parser rag text-chunking

Updated Oct 22, 2025
Python

ushakiranmai / text_summarization

Star

This Text Summarization Tool uses advanced machine learning models to create concise, meaningful summaries of lengthy texts. Built with Hugging Face Transformers and Gradio, it efficiently handles various input lengths, ideal for summarizing articles, reports, and more

web-development file-handling text-summarization gradio-interface text-chunking model-handling output-formats python-libraries-and-tools

Updated Jan 23, 2025
Python

vivet / Vivet.AI

Sponsor

Star

A service-oriented .NET library for AI with interchangeable orchestrations and vector stores.

chat ai knowledge memory azure inference openai summarization embedding huggingface llm metadata-retrieval ollama amazon-bedrock text-chunking google-gemini context-deduplication

Updated Nov 20, 2025
C#

cspnms / MSchunker

Star

Smart text chunker for LLM preprocessing (sections → paragraphs → sentences → hard splits).

Updated Nov 28, 2025
Python

Yashraj-Muthyapwar / NotionAtlas-AI-Semantic-Search-And-RAG-Assistant-for-Notion

Star

End-to-end Retrieval Augmented Generation (RAG) pipeline using Notion, Qdrant, Sentence Transformers, and Streamlit for interactive question answering on private Notion workspaces.

retrieval chatbot embeddings llama knowledge-base semantic-search rag huggingface vector-database notion-api ai-assistant qdrant llm text-chunking

Updated Oct 19, 2025
Jupyter Notebook

samay-jain / Retrieval-Augmented-Generation-RAG-simple-program

Star

A lightweight, modular Retrieval-Augmented Generation (RAG) system built with Streamlit, FAISS, and LLMs like OpenAI and Ollama. Upload documents, embed them, and ask intelligent questions with real-time context-aware responses.

embeddings openai nomic chroma faiss python-nlp rag vector-search streamlit gpt4 langchain ollama text-chunking llama3 llm-app simple-rag document-question-answering pdf-nlp qa-application

Updated Jun 26, 2025
Python

mariamarmolejo / agentes-girly

Star

Curso completo de Agentes IA con LangChain, LangRAP y n8n. Incluye ejemplos prácticos, agentes simples, agentes que resumen PDFs, agentes girly, gestión de entornos, variables de entorno y buenas prácticas con GitHub.

environment-variables intelligent-agents practical-exercises faiss virtual-environment n8n langchain text-chunking custom-prompts faiss-vector-database huggingface-embeddings chatgroq n8n-automation pypdfloader langrap rag-flow simple-agents summarizer-agents line-by-line-comments

Updated Nov 17, 2025
Python

adityapathak-cubastion / cubastion-hr-chatbot

Star

Presenting, Cubastion's HR chatbot - it can answer queries based on all the latest HR documents published by Cubastion's HR team. This conveniently saves time, allowing a Cubastion employee to resolve their query without having to comb through the actual documents. <<Developed with Python, sentence-transformers, Pinecone, llama3.2, and Streamlit>>

python text-generation text-extraction cosine-similarity pinecone huggingface streamlit text-embeddings sentence-transformers prompt-engineering text-chunking llama3

Updated Jan 30, 2025
Python

adityapathakk / cubastion-hr-chatbot

Star

Presenting, Cubastion's HR chatbot - it can answer queries based on all the latest HR documents published by Cubastion's HR team. This conveniently saves time, allowing a Cubastion employee to resolve their query without having to comb through the actual documents. <<Developed with Python, sentence-transformers, Pinecone, llama3.2, and Streamlit>>

python text-generation text-extraction cosine-similarity pinecone huggingface streamlit text-embeddings sentence-transformers prompt-engineering text-chunking llama3

Updated Jan 30, 2025
Python

andrewschenck / ragl

Star

Vector Storage and Retrieval for RAG

python redis information-retrieval semantic-search nlp-machine-learning rag vector-search llm retrieval-augmented-generation text-chunking

Updated Oct 6, 2025
Python

mohsinraza2999 / Legal-Advisor-using-gpt-neo-1.3B

Star

This project aims to build an AI-powered Legal Advisor that leverages natural language processing and vector search technology to provide users with legal guidance based on authoritative legal texts.

embeddings tokenization similarity-search huggingface vector-database prompt-engineering llms langchain retrieval-augmented-generation llm-pipeline text-chunking

Updated Aug 8, 2025
Jupyter Notebook

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

text-chunking

Here are 22 public repositories matching this topic...

isaacus-dev / semchunk

lazyFrogLOL / llmdocparser

jparkerweb / semantic-chunking

drittich / SemanticSlicer

GregorBiswanger / SemanticChunker.NET

ChenTaHung / HTML-Text-Parser

smart-models / Sentences-Chunker

betcorg / llm-text-splitter

philnash / chunkers

Besthope-Official / predoc

ushakiranmai / text_summarization

vivet / Vivet.AI

cspnms / MSchunker

Yashraj-Muthyapwar / NotionAtlas-AI-Semantic-Search-And-RAG-Assistant-for-Notion

samay-jain / Retrieval-Augmented-Generation-RAG-simple-program

mariamarmolejo / agentes-girly

adityapathak-cubastion / cubastion-hr-chatbot

adityapathakk / cubastion-hr-chatbot

andrewschenck / ragl

mohsinraza2999 / Legal-Advisor-using-gpt-neo-1.3B

Improve this page

Add this topic to your repo