Unsloth — 让 LLM 微调快 2 倍且显存减半

OA0

OA0 是一个探索 AI 的社区

现在注册

已注册用户请登录

### 以 2 倍速度、70% 更少的显存训练 gpt-oss、DeepSeek、Gemma、Qwen 和 Llama！ ![](https://i.ibb.co/sJ7RhGG/image-41.png)

✨ 免费训练

Notebooks 对初学者友好。请阅读我们的指南。添加数据集，运行，然后部署你训练好的模型。

模型	免费 Notebooks	性能	显存使用
gpt-oss (20B)	▶️ 免费开始	1.5 倍更快	减少 70%
gpt-oss (20B): GRPO	▶️ 免费开始	2 倍更快	减少 80%
Qwen3: Advanced GRPO	▶️ 免费开始	2 倍更快	减少 50%
Qwen3-VL (8B): GSPO	▶️ 免费开始	1.5 倍更快	减少 80%
Gemma 3 (4B) Vision	▶️ 免费开始	1.7 倍更快	减少 60%
Gemma 3n (e4B)	▶️ 免费开始	1.5 倍更快	减少 50%
embeddinggemma (300M)	▶️ 免费开始	2 倍更快	减少 20%
Mistral Ministral 3 (3B)	▶️ 免费开始	1.5 倍更快	减少 60%
Llama 3.1 (8B) Alpaca	▶️ 免费开始	2 倍更快	减少 70%
Llama 3.2 Conversational	▶️ 免费开始	2 倍更快	减少 70%
Orpheus-TTS (3B)	▶️ 免费开始	1.5 倍更快	减少 50%

查看我们所有的 notebooks：Kaggle、GRPO、TTS、embedding 和 Vision
查看我们所有的模型和我们所有的 notebooks
查看 Unsloth 的详细文档请点击这里

⚡ 快速开始

Linux 或 WSL

pip install unsloth

Windows

对于 Windows，只有在已安装 Pytorch 的情况下 pip install unsloth 才有效。请阅读我们的 Windows 指南。

Docker

使用我们官方的 Unsloth Docker 镜像 unsloth/unsloth 容器。请阅读我们的 Docker 指南。

AMD、Intel、Blackwell 和 DGX Spark

对于 RTX 50x、B200、6000 系列 GPU：pip install unsloth。阅读我们的指南：Blackwell 和 DGX Spark。

要在 AMD 和 Intel GPU 上安装 Unsloth，请遵循我们的 AMD 指南和 Intel 指南。

🦥 Unsloth 新闻

Qwen3.5 现已支持，包括 35-A3B、27B、112B-A10B。指南 + notebooks
以 12 倍速度、35% 更少显存 训练 MoE LLMs - DeepSeek、GLM、Qwen 和 gpt-oss。博客
Embedding 模型：Unsloth 现在支持约 1.8-3.3 倍 更快的 embedding 微调。博客 • Notebooks
通过我们新的批处理算法，实现新的 7 倍更长上下文 RL（相比所有其他设置）。博客
新的 RoPE 和 MLP Triton 内核 以及 Padding Free + Packing：3 倍更快的训练和 30% 更少的显存。博客
500K 上下文：现在可以在 80GB GPU 上训练具有 >500K 上下文的 20B 模型。博客
FP8 和 Vision RL：你现在可以在消费级 GPU 上进行 FP8 和 VLM GRPO。FP8 博客 • Vision RL
Docker：使用我们的新镜像，无需设置和环境问题即可使用 Unsloth。指南 • Docker 镜像
gpt-oss by OpenAI：阅读我们的 RL 博客、Flex Attention 博客和指南。

点击查看更多新闻

- **量化感知训练**：我们与 Pytorch 合作，恢复了约 70% 的准确率。[阅读博客](https://unsloth.ai/docs/blog/quantization-aware-training-qat) - **内存高效的 RL**：我们引入了更好的 RL。我们的新内核和算法允许使用 50% 更少的显存和 10 倍更多的上下文进行更快的 RL。[阅读博客](https://unsloth.ai/docs/get-started/reinforcement-learning-rl-guide/memory-efficient-rl) - **Mistral 3**：运行 Ministral 3 或 Devstral 2，并使用 vision/RL 数独 notebooks 进行微调。[指南](https://unsloth.ai/docs/models/tutorials/ministral-3) • [Notebooks](https://unsloth.ai/docs/models/ministral-3#fine-tuning-ministral-3) - **Gemma 3n** by Google：[阅读博客](https://unsloth.ai/docs/models/gemma-3-how-to-run-and-fine-tune/gemma-3n-how-to-run-and-fine-tune)。我们[上传了 GGUFs、4-bit 模型](https://huggingface.co/collections/unsloth/gemma-3n-685d3874830e49e1c93f9339)。 - **[文本转语音 (TTS)](https://unsloth.ai/docs/basics/text-to-speech-tts-fine-tuning)** 现已支持，包括 `sesame/csm-1b` 和 STT `openai/whisper-large-v3`。 - **[Qwen3](https://unsloth.ai/docs/models/qwen3-how-to-run-and-fine-tune)** 现已支持。Qwen3-30B-A3B 可在 17.5GB 显存上运行。 - 引入 **[Dynamic 2.0](https://unsloth.ai/docs/basics/unsloth-dynamic-2.0-ggufs)** 量化，在 5-shot MMLU 和 Aider Polyglot 上创下新基准。 - [**所有** 模型现已支持](https://unsloth.ai/blog/gemma3#everything) - 所有模型（TTS、BERT、Mamba）、FFT 等。[多 GPU](https://unsloth.ai/docs/basics/multi-gpu-training-with-unsloth) 现已支持。使用 `full_finetuning = True` 启用 FFT，使用 `load_in_8bit = True` 启用 8-bit。 - 📣 [DeepSeek-R1](https://unsloth.ai/blog/deepseek-r1) - 使用我们的指南[运行或微调它们](https://unsloth.ai/blog/deepseek-r1)。所有模型上传：[这里](https://huggingface.co/collections/unsloth/deepseek-r1-all-versions-678e1c48f5d2fce87892ace5)。 - 📣 在 Unsloth 中引入长上下文 [推理 (GRPO)](https://unsloth.ai/blog/grpo)。仅需 5GB 显存即可训练你自己的推理模型。将 Llama、Phi、Mistral 等转换为推理 LLMs！ - 📣 引入 Unsloth [动态 4-bit 量化](https://unsloth.ai/blog/dynamic-4bit)！我们动态选择不量化某些参数，这大大提高了准确性，同时仅比 BnB 4-bit 多使用 <10% 的显存。在 [Hugging Face 上查看我们的集合](https://huggingface.co/collections/unsloth/unsloth-4-bit-dynamic-quants-67503bb873f89e15276c44e7)。 - 📣 **[Llama 4](https://unsloth.ai/blog/llama4)** by Meta，包括 Scout 和 Maverick 现已支持。 - 📣 [Phi-4](https://unsloth.ai/blog/phi4) by Microsoft：我们还[修复了 Phi-4 中的错误](https://unsloth.ai/blog/phi4)并[上传了 GGUFs、4-bit 模型](https://huggingface.co/collections/unsloth/phi-4-all-versions-677eecf93784e61afe762afa)。 - 📣 [视觉模型](https://unsloth.ai/blog/vision) 现已支持！[Llama 3.2 Vision (11B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb)、[Qwen 2.5 VL (7B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2_VL_(7B)-Vision.ipynb) 和 [Pixtral (12B) 2409](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Pixtral_(12B)-Vision.ipynb) - 📣 [Llama 3.3 (70B)](https://huggingface.co/collections/unsloth/llama-33-all-versions-67535d7d994794b9d7cf5e9f)，Meta 的最新模型已支持。 - 📣 我们与 Apple 合作添加了 [Cut Cross Entropy](https://arxiv.org/abs/2411.09009)。Unsloth 现在支持在 80GB GPU 上为 Meta 的 Llama 3.3 (70B) 提供 89K 上下文 - 比 HF+FA2 长 13 倍。对于 Llama 3.1 (8B)，Unsloth 支持 342K 上下文，超过了其原生的 128K 支持。 - 📣 我们发现并帮助修复了一个[梯度累积错误](https://unsloth.ai/blog/gradient)！请更新 Unsloth 和 transformers。 - 📣 我们[进一步减少了 30% 的内存使用](https://unsloth.ai/blog/long-context)，现在支持[4 倍更长的上下文窗口](https://unsloth.ai/blog/long-context)！

🔗 链接和资源

类型	链接
r/unsloth Reddit	加入 Reddit 社区
📚 文档和 Wiki	阅读我们的文档
Twitter (又名 X)	在 X 上关注我们
💾 安装	Pip 和 Docker 安装
🔮 我们的模型	Unsloth 模型目录
✍️ 博客	阅读我们的博客

⭐ 主要特性

项目地址：https://github.com/unslothai/unsloth

216 次点击 ∙ 0 人收藏

登录后收藏

0 条回复