Skip to content

Commit c70a0f0

Browse files
localai-botmudler
andauthored
chore(model gallery): 🤖 add 1 new models via gallery agent (#6989)
chore(model gallery): 🤖 add new models via gallery agent Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
1 parent f85e2dd commit c70a0f0

File tree

1 file changed

+56
-0
lines changed

1 file changed

+56
-0
lines changed

gallery/index.yaml

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22938,3 +22938,59 @@
2293822938
- filename: ReForm-32B.i1-Q4_K_M.gguf
2293922939
sha256: a7f69d6e2efe002368bc896fc5682d34a1ac63669a4db0f42faf44a29012dc3f
2294022940
uri: huggingface://mradermacher/ReForm-32B-i1-GGUF/ReForm-32B.i1-Q4_K_M.gguf
22941+
- !!merge <<: *qwen3
22942+
name: "qwen3-4b-thinking-2507-gspo-easy"
22943+
urls:
22944+
- https://huggingface.co/mradermacher/Qwen3-4B-Thinking-2507-GSPO-Easy-GGUF
22945+
description: |
22946+
**Model Name:** Qwen3-4B-Thinking-2507-GSPO-Easy
22947+
**Base Model:** Qwen3-4B (by Alibaba Cloud)
22948+
**Fine-tuned With:** GRPO (Generalized Reward Policy Optimization)
22949+
**Framework:** Hugging Face TRL (Transformers Reinforcement Learning)
22950+
**License:** [MIT](https://huggingface.co/leonMW/Qwen3-4B-Thinking-2507-GSPO-Easy/blob/main/LICENSE)
22951+
22952+
---
22953+
22954+
### 📌 Description:
22955+
A fine-tuned 4-billion-parameter version of **Qwen3-4B**, optimized for **step-by-step reasoning and complex problem-solving** using **GRPO**, a reinforcement learning method designed to enhance mathematical and logical reasoning in language models.
22956+
22957+
This model excels in tasks requiring **structured thinking**, such as solving math problems, logical puzzles, and multi-step reasoning, making it ideal for applications in education, AI assistants, and reasoning benchmarks.
22958+
22959+
### 🔧 Key Features:
22960+
- Trained with **TRL 0.23.1** and **Transformers 4.57.1**
22961+
- Optimized for **high-quality reasoning output**
22962+
- Part of the **Qwen3-4B-Thinking** series, designed to simulate human-like thought processes
22963+
- Compatible with Hugging Face `transformers` and `pipeline` API
22964+
22965+
### 📚 Use Case:
22966+
Perfect for applications demanding **deep reasoning**, such as:
22967+
- AI tutoring systems
22968+
- Advanced chatbots with explanation capabilities
22969+
- Automated problem-solving in STEM domains
22970+
22971+
### 📌 Quick Start (Python):
22972+
```python
22973+
from transformers import pipeline
22974+
22975+
question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
22976+
generator = pipeline("text-generation", model="leonMW/Qwen3-4B-Thinking-2507-GSPO-Easy", device="cuda")
22977+
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
22978+
print(output["generated_text"])
22979+
```
22980+
22981+
> ✅ **Note**: This is the **original, non-quantized base model**. Quantized versions (e.g., GGUF) are available separately under the same repository for efficient inference on consumer hardware.
22982+
22983+
---
22984+
22985+
🔗 **Model Page:** [https://huggingface.co/leonMW/Qwen3-4B-Thinking-2507-GSPO-Easy](https://huggingface.co/leonMW/Qwen3-4B-Thinking-2507-GSPO-Easy)
22986+
📝 **Training Details & Visualizations:** [WandB Dashboard](https://wandb.ai/leonwenderoth-tu-darmstadt/huggingface/runs/t42skrc7)
22987+
22988+
---
22989+
*Fine-tuned using GRPO — a method proven to boost mathematical reasoning in open language models. Cite: Shao et al., 2024 (arXiv:2402.03300)*
22990+
overrides:
22991+
parameters:
22992+
model: Qwen3-4B-Thinking-2507-GSPO-Easy.Q4_K_M.gguf
22993+
files:
22994+
- filename: Qwen3-4B-Thinking-2507-GSPO-Easy.Q4_K_M.gguf
22995+
sha256: f75798ff521ce54c1663fb59d2d119e5889fd38ce76d9e07c3a28ceb13cf2eb2
22996+
uri: huggingface://mradermacher/Qwen3-4B-Thinking-2507-GSPO-Easy-GGUF/Qwen3-4B-Thinking-2507-GSPO-Easy.Q4_K_M.gguf

0 commit comments

Comments
 (0)