|
22938 | 22938 | - filename: ReForm-32B.i1-Q4_K_M.gguf |
22939 | 22939 | sha256: a7f69d6e2efe002368bc896fc5682d34a1ac63669a4db0f42faf44a29012dc3f |
22940 | 22940 | uri: huggingface://mradermacher/ReForm-32B-i1-GGUF/ReForm-32B.i1-Q4_K_M.gguf |
| 22941 | +- !!merge <<: *qwen3 |
| 22942 | + name: "qwen3-4b-thinking-2507-gspo-easy" |
| 22943 | + urls: |
| 22944 | + - https://huggingface.co/mradermacher/Qwen3-4B-Thinking-2507-GSPO-Easy-GGUF |
| 22945 | + description: | |
| 22946 | + **Model Name:** Qwen3-4B-Thinking-2507-GSPO-Easy |
| 22947 | + **Base Model:** Qwen3-4B (by Alibaba Cloud) |
| 22948 | + **Fine-tuned With:** GRPO (Generalized Reward Policy Optimization) |
| 22949 | + **Framework:** Hugging Face TRL (Transformers Reinforcement Learning) |
| 22950 | + **License:** [MIT](https://huggingface.co/leonMW/Qwen3-4B-Thinking-2507-GSPO-Easy/blob/main/LICENSE) |
| 22951 | + |
| 22952 | + --- |
| 22953 | + |
| 22954 | + ### 📌 Description: |
| 22955 | + A fine-tuned 4-billion-parameter version of **Qwen3-4B**, optimized for **step-by-step reasoning and complex problem-solving** using **GRPO**, a reinforcement learning method designed to enhance mathematical and logical reasoning in language models. |
| 22956 | + |
| 22957 | + This model excels in tasks requiring **structured thinking**, such as solving math problems, logical puzzles, and multi-step reasoning, making it ideal for applications in education, AI assistants, and reasoning benchmarks. |
| 22958 | + |
| 22959 | + ### 🔧 Key Features: |
| 22960 | + - Trained with **TRL 0.23.1** and **Transformers 4.57.1** |
| 22961 | + - Optimized for **high-quality reasoning output** |
| 22962 | + - Part of the **Qwen3-4B-Thinking** series, designed to simulate human-like thought processes |
| 22963 | + - Compatible with Hugging Face `transformers` and `pipeline` API |
| 22964 | + |
| 22965 | + ### 📚 Use Case: |
| 22966 | + Perfect for applications demanding **deep reasoning**, such as: |
| 22967 | + - AI tutoring systems |
| 22968 | + - Advanced chatbots with explanation capabilities |
| 22969 | + - Automated problem-solving in STEM domains |
| 22970 | + |
| 22971 | + ### 📌 Quick Start (Python): |
| 22972 | + ```python |
| 22973 | + from transformers import pipeline |
| 22974 | + |
| 22975 | + question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?" |
| 22976 | + generator = pipeline("text-generation", model="leonMW/Qwen3-4B-Thinking-2507-GSPO-Easy", device="cuda") |
| 22977 | + output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0] |
| 22978 | + print(output["generated_text"]) |
| 22979 | + ``` |
| 22980 | + |
| 22981 | + > ✅ **Note**: This is the **original, non-quantized base model**. Quantized versions (e.g., GGUF) are available separately under the same repository for efficient inference on consumer hardware. |
| 22982 | + |
| 22983 | + --- |
| 22984 | + |
| 22985 | + 🔗 **Model Page:** [https://huggingface.co/leonMW/Qwen3-4B-Thinking-2507-GSPO-Easy](https://huggingface.co/leonMW/Qwen3-4B-Thinking-2507-GSPO-Easy) |
| 22986 | + 📝 **Training Details & Visualizations:** [WandB Dashboard](https://wandb.ai/leonwenderoth-tu-darmstadt/huggingface/runs/t42skrc7) |
| 22987 | + |
| 22988 | + --- |
| 22989 | + *Fine-tuned using GRPO — a method proven to boost mathematical reasoning in open language models. Cite: Shao et al., 2024 (arXiv:2402.03300)* |
| 22990 | + overrides: |
| 22991 | + parameters: |
| 22992 | + model: Qwen3-4B-Thinking-2507-GSPO-Easy.Q4_K_M.gguf |
| 22993 | + files: |
| 22994 | + - filename: Qwen3-4B-Thinking-2507-GSPO-Easy.Q4_K_M.gguf |
| 22995 | + sha256: f75798ff521ce54c1663fb59d2d119e5889fd38ce76d9e07c3a28ceb13cf2eb2 |
| 22996 | + uri: huggingface://mradermacher/Qwen3-4B-Thinking-2507-GSPO-Easy-GGUF/Qwen3-4B-Thinking-2507-GSPO-Easy.Q4_K_M.gguf |
0 commit comments