SWIFT（可扩展轻量级微调基础设施）

## 📖 目录 - [交流群](#-交流群) - [简介](#-简介) - [新闻](#-新闻) - [安装](#%EF%B8%8F-安装) - [快速开始](#-快速开始) - [用法](#-用法) - [许可证](#-许可证) - [引用](#-引用) ## ☎ 交流群您可以通过添加我们的群组与我们联系和交流： [Discord 群组](https://discord.gg/yeN59wxjwe) | 微信群 :-------------------------:|:-------------------------:

## 📝 简介 🍲 **ms-swift** 是魔搭社区提供的大模型与多模态大模型微调与部署框架。现已支持 600+ 纯文本大模型和 400+ 多模态大模型的训练（预训练、微调、人类对齐）、推理、评估、量化与部署。支持的纯文本大模型包括：Qwen3、Qwen3.5、InternLM3、GLM4.5、Mistral、DeepSeek-R1、Llama4 等。多模态大模型包括：Qwen3-VL、Qwen3-Omni、Llava、InternVL3.5、MiniCPM-V-4、Ovis2.5、GLM4.5-V、DeepSeek-VL2 等。 🍔 此外，ms-swift 集成了最新的训练技术，包括 TP、PP、CP、EP 等 Megatron 并行技术以加速训练，以及丰富的 GRPO 算法族强化学习算法，如 GRPO、DAPO、GSPO、SAPO、CISPO、RLOO、Reinforce++ 等，以提升模型智能。ms-swift 支持广泛的训练任务，包括 DPO、KTO、RM、CPO、SimPO、ORPO 等偏好学习算法，以及 Embedding、Reranker 和序列分类任务。ms-swift 为大模型训练提供全链路支持，包括使用 vLLM、SGLang 和 LMDeploy 加速推理、评估和部署模块，以及使用 GPTQ、AWQ、BNB 和 FP8 技术进行模型量化。 **为什么选择 ms-swift？** - 🍎 **模型类型**：支持 **600+ 纯文本大模型**、**400+ 多模态大模型**以及全模态 All-to-All 模型从训练到部署的完整流程，热门模型 Day-0 支持。 - **数据集类型**：内置 150+ 个预训练、微调、人类对齐、多模态等各类数据集，支持自定义数据集。用户只需准备好数据集即可一键训练。 - **硬件支持**：支持 A10/A100/H100、RTX 系列、T4/V100、CPU、MPS 以及国产硬件昇腾 NPU 等。 - **轻量训练**：支持 LoRA、QLoRA、DoRA、LoRA+、LLaMAPro、LongLoRA、LoRA-GA、ReFT、RS-LoRA、Adapter、LISA 等轻量微调方法。 - **量化训练**：支持在 BNB、AWQ、GPTQ、AQLM、HQQ、EETQ 量化模型上进行训练，7B 模型仅需 9GB 训练资源。 - **内存优化**：支持 GaLore、Q-Galore、UnSloth、Liger-Kernel、Flash-Attention 2/3 以及 **Ulysses 和 Ring-Attention 序列并行技术**，降低长文本训练的内存消耗。 - **分布式训练**：支持分布式数据并行 (DDP)、device_map 简单模型并行、DeepSpeed ZeRO2/ZeRO3、FSDP/FSDP2 以及 Megatron 分布式训练技术。 - 🍓 **多模态训练**：支持多模态打包技术，训练速度提升 100%+，支持文本、图像、视频和音频混合模态数据训练，支持独立控制 vit/aligner/llm。 - **智能体训练**：支持 Agent 模板，一份数据可用于训练不同模型。 - 🍊 **训练任务**：支持预训练和指令微调，以及 DPO、GKD、KTO、RM、CPO、SimPO、ORPO 等训练任务，并支持 **Embedding/Reranker** 和序列分类任务。 - 🥥 **Megatron 并行**：提供 TP/PP/SP/CP/ETP/EP/VPP 并行策略，显著提升 **MoE 模型训练速度**。支持 300+ 纯文本大模型和 100+ 多模态大模型的全参数和 LoRA 训练方式。支持 CPT/SFT/GRPO/DPO/KTO/RM 训练任务。 - 🍉 **强化学习**：内置**丰富的 GRPO 算法族**，包括 GRPO、DAPO、GSPO、SAPO、CISPO、CHORD、RLOO、Reinforce++ 等。支持同步和异步 vLLM 引擎推理加速，通过插件可实现可扩展的奖励函数、多轮推理 Scheduler 和环境。 - **全链路能力**：覆盖训练、推理、评估、量化和部署的完整工作流。 - **UI 训练**：提供 Web-UI 界面用于训练、推理、评估和量化，完成大模型全流程。 - **推理加速**：支持 Transformers、vLLM、SGLang 和 LmDeploy 推理加速引擎，为推理加速、部署和评估模块提供 OpenAI 接口。 - **模型评估**：使用 EvalScope 作为评估后端，支持 100+ 评估数据集，用于评估纯文本和多模态模型。 - **模型量化**：支持 AWQ、GPTQ、FP8 和 BNB 的量化导出。导出的模型可使用 vLLM/SGLang/LmDeploy 进行推理加速。 ## 🎉 新闻 - 🎁 2026.06.10：Megatron-Ray 现已支持 GRPO 和 GKD 训练。详见 [文档](./docs/source_en/Instruction/Ray.md) 和 [示例](examples/ray)。 - 🎁 2026.03.03：**ms-swift v4.0** 大版本正式发布。发布说明请参考 [此处](https://github.com/modelscope/ms-swift/releases/tag/v4.0.0)。您可以在 [此问题](https://github.com/modelscope/ms-swift/issues/7250) 中向我们提出建议。感谢您的支持。 - 🎁 2025.11.14：Megatron GRPO 现已可用！查看 [文档](./docs/source_en/Megatron-SWIFT/GRPO.md) 和 [示例](examples/megatron/grpo)。 - 🎁 2025.11.04：支持 [Mcore-Bridge](docs/source_en/Megatron-SWIFT/Mcore-Bridge.md)，使 Megatron 训练像 transformers 一样简单易用。 - 🎁 2025.10.28：Ray [文档](docs/source_en/Instruction/Ray.md) 已上线。 - 🎁 2025.09.07：新增 CHORD 训练算法支持。详见 [文档](./docs/source_en/Instruction/GRPO/AdvancedResearch/CHORD.md)。 - 🎁 2025.09.06：Ulysses 现在可以与 ring-attention 结合使用，允许将序列分片为任意数量的块（不再受头数限制）。参数仍为 `--sequence_parallel_size N`。 - 🎁 2025.09.02：Megatron-SWIFT 现在支持多模态模型训练。文档请见 [此处](./docs/source_en/Megatron-SWIFT/Multimodal-Model.md)。 - 🎁 2025.08.12：在 SFT 训练中支持 [Dynamic Fine-Tuning](https://arxiv.org/abs/2508.05629)（DFT），使用参数 `--enable_dft_loss true`。训练脚本见 [此处](https://github.com/modelscope/ms-swift/blob/main/examples/train/full/dft.sh)。 - 🎁 2025.07.09：Megatron-SWIFT 支持 LoRA 训练。与 ms-swift 相比，在 MoE 模型上实现了显著加速。训练脚本见 [此处](https://github.com/modelscope/ms-swift/blob/main/examples/megatron/lora)。 - 🎁 2025.06.23：支持 Reranker 模型微调。训练脚本见此处：[Reranker](https://github.com/modelscope/ms-swift/blob/main/examples/train/reranker/train_reranker.sh)。 - 🎁 2025.06.15：支持纯文本大模型和多模态模型的 GKD 训练。训练脚本见此处：[纯文本](https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/gkd)，[多模态](https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/rlhf/gkd)。

- 🎁 2025.06.11：支持使用 Megatron 并行技术进行 RLHF 训练。训练脚本见 [此处](https://github.com/modelscope/ms-swift/tree/main/examples/megatron/rlhf)。 - 🎁 2025.05.29：在预训练、SFT、DPO 和 GRPO 中支持序列并行，脚本见 [此处](https://github.com/modelscope/ms-swift/tree/main/examples/train/sequence_parallel)。 - 🎁 2025.05.11：GRPO 现在支持奖励模型的自定义处理逻辑。详见 GenRM 示例 [此处](./docs/source_en/Instruction/GRPO/DeveloperGuide/reward_model.md)。 - 🎁 2025.04.15：ms-swift 论文已被 AAAI 2025 接收。论文见 [此链接](https://ojs.aaai.org/index.php/AAAI/article/view/35383)。 - 🎁 2025.03.23：多轮 GRPO 现已被支持，用于训练多轮对话场景（例如，Agent 工具调用）。请参考 [文档](./docs/source_en/Instruction/GRPO/DeveloperGuide/multi_turn.md)。 - 🎁 2025.03.16：支持 Megatron 的并行训练技术。请参见 [Megatron-SWIFT 训练文档](https://swift.readthedocs.io/en/latest/Megatron-SWIFT/Quick-start.html)。 - 🎁 2025.03.15：支持纯文本和多模态模型的 Embedding 模型微调。请查看 [训练脚本](examples/train/embedding)。 - 🎁 2025.03.05：支持 GRPO 的混合模式，在 4 张 GPU（4*80G）上训练 72B 模型的脚本见 [此处](examples/train/grpo/internal/vllm_72b_4gpu.sh)。还支持 vllm 的张量并行，训练脚本见 [此处](examples/train/grpo/internal)。 - 🎁 2025.02.21：GRPO 算法现在支持 LMDeploy，训练脚本见 [此处](examples/train/grpo/internal/full_lmdeploy.sh)。此外，GRPO 算法的性能已得到测试，使用各种技巧训练速度最高提升 300%。请查看 WanDB 表格 [此处](https://wandb.ai/tastelikefeet/grpo_perf_test?nw=nwuseryuzezyz)。 - 🎁 2025.02.21：现在支持 `swift sample` 命令。强化微调脚本见 [此处](docs/source_en/Instruction/Reinforced-Fine-tuning.md)，大模型 API 蒸馏采样脚本见 [此处](examples/sampler/distill/distill.sh)。 - 🔥 2025.02.12：新增 GRPO（组相对策略优化）训练算法支持。文档见 [此处](docs/source_en/Instruction/GRPO/GetStarted/GRPO.md)。 - 🎁 2024.12.04：**ms-swift 3.0** 重大更新。请参考 [发布说明与变更](docs/source_en/Instruction/ReleaseNote3.0.md)。 - 🎉 2024.08.12：ms-swift 论文已在 arXiv 上发表，可 [在此阅读](https://arxiv.org/abs/2408.05517)。 - 🔥 2024.08.05：支持使用 [evalscope](https://github.com/modelscope/evalscope/) 作为后端评估大模型和多模态大模型。 - 🔥 2024.07.29：支持使用 [vllm](https://github.com/vllm-project/vllm) 和 [lmdeploy](https://github.com/InternLM/lmdeploy) 加速大模型和多模态大模型的推理。在执行 infer/deploy/eval 时，可以指定 `--infer_backend vllm/lmdeploy`。 - 🔥 2024.07.24：支持多模态大模型的人类偏好对齐训练，包括 DPO/ORPO/SimPO/CPO/KTO/RM/PPO。 - 🔥 2024.02.01：支持智能体训练！训练算法源自 [此论文](https://arxiv.org/pdf/2309.00986.pdf)。

## 🛠️ 安装使用 pip 安装：

pip install ms-swift -U

# 使用 uv
pip install uv
uv pip install ms-swift -U --torch-backend=auto

从源码安装：

# pip install git+https://github.com/modelscope/ms-swift.git

git clone https://github.com/modelscope/ms-swift.git
cd ms-swift
# main 分支用于 swift 4.x。要安装 swift 3.x，请运行以下命令：
# git checkout release/3.12
pip install -e .

# 使用 uv
uv pip install -e . --torch-backend=auto

运行环境： | | 范围 | 推荐 | 备注 | |----------|--------------|---------------------|-----------------------------------| | python | >=3.10 | 3.12 | | | cuda | | cuda12.8/13.0 | 如使用 CPU、NPU、MPS 则无需安装 | | torch | >=2.0 | 2.8.0/2.11.0 | | | transformers | >=4.33 | 4.57.6/5.8.1 | | | modelscope | >=1.23 | | | | datasets | >=3.0,<4.8.5 | 3.6.0/4.8.4 | | | peft | >=0.11,<0.20 | | | | flash_attn | | 2.8.3/4.0.0b15 | | | trl | >=0.15,<1.0 | 0.29.1 | RLHF | | deepspeed| >=0.14 | 0.18.9 | 训练 | | vllm | >=0.5.1 | 0.11.0/0.21.0 | 推理/部署 | | sglang | >=0.4.6 | | 推理/部署 | | evalscope | >=1.0 | | 评估 | | gradio | | 5.32.1 | Web-UI/App | 更多可选依赖项，请参考 [此处](https://github.com/modelscope/ms-swift/blob/main/requirements/install_all.sh)。 ## 🚀 快速开始在单张 3090 GPU 上进行 10 分钟的 Qwen3-4B-Instruct-2507 自我认知微调： ### 命令行界面（推荐）

# 13GB
CUDA_VISIBLE_DEVICES=0 \
swift sft \
    --model Qwen/Qwen3-4B-Instruct-2507 \
    --tuner_type lora \
    --dataset 'AI-ModelScope/alpaca-gpt4-data-zh#500' \
              'AI-ModelScope/alpaca-gpt4-data-en#500' \
              'swift/self-cognition#500' \
    --torch_dtype bfloat16 \
    --num_train_epochs 1 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --learning_rate 1e-4 \
    --lora_rank 8 \
    --lora_alpha 32 \
    --target_modules all-linear \
    --gradient_accumulation_steps 16 \
    --eval_steps 50 \
    --save_steps 50 \
    --save_total_limit 2 \
    --logging_steps 5 \
    --max_length 2048 \
    --output_dir output \
    --warmup_ratio 0.05 \
    --dataloader_num_workers 4 \
    --model_author swift \
    --model_name swift-robot

提示： - 如果要使用自定义数据集训练，可以参考 [此指南](https://swift.readthedocs.io/en/latest/Customization/Custom-dataset.html) 整理数据集格式，并指定 `--dataset `。 - `--model_author` 和 `--model_name` 参数仅在数据集包含 `swift/self-cognition` 时有效。 - 要训练其他模型，只需修改 `--model `。 - 默认使用 **ModelScope** 下载模型和数据集。如果要用 HuggingFace，只需指定 `--use_hf true`。训练完成后，使用以下命令用训练好的权重进行推理： - 此处 `--adapters` 应替换为训练过程中生成的最后一个 checkpoint 文件夹。由于 adapters 文件夹包含训练参数文件 `args.json`，因此无需单独指定 `--model`、`--system`；Swift 将自动读取这些参数。要禁用此行为，可设置 `--load_args false`。

# 交互式命令行推理
CUDA_VISIBLE_DEVICES=0 \
swift infer \
    --adapters output/vx-xxx/checkpoint-xxx \
    --stream true \
    --temperature 0 \
    --max_new_tokens 2048

# 合并 LoRA 并使用 vLLM 进行推理加速
CUDA_VISIBLE_DEVICES=0 \
swift infer \
    --adapters output/vx-xxx/checkpoint-xxx \
    --stream true \
    --merge_lora true \
    --infer_backend vllm \
    --vllm_max_model_len 8192 \
    --temperature 0 \
    --max_new_tokens 2048

最后，使用以下命令将模型推送到 ModelScope：

CUDA_VISIBLE_DEVICES=0 \
swift export \
    --adapters output/vx-xxx/checkpoint-xxx \
    --push_to_hub true \
    --hub_model_id '<your-model-id>' \
    --hub_token '<your-sdk-token>' \
    --use_hf false

### Web-UI Web-UI 是基于 Gradio 界面技术的**零门槛**训练和部署界面解决方案。更多细节可查看 [此处](https://swift.readthedocs.io/en/latest/GetStarted/Web-UI.html)。

SWIFT_UI_LANG=en swift web-ui

![image.png](./docs/resources/web-ui-en.jpg) ### 使用 Python ms-swift 也支持使用 Python 进行训练和推理。以下是用于训练和推理的伪代码。更多细节可参考 [此处](https://github.com/modelscope/ms-swift/blob/main/examples/notebook/qwen2_5-self-cognition/self-cognition-sft.ipynb)。训练：

from peft import LoraConfig, get_peft_model
from swift import get_model_processor, get_template, load_dataset, EncodePreprocessor
from swift.trainers import Seq2SeqTrainer, Seq2SeqTrainingArguments
# 获取模型和模板，并添加可训练的 LoRA 模块
model, tokenizer = get_model_processor(model_id_or_path, ...)
template = get_template(tokenizer, ...)
lora_config = LoraConfig(...)
model = get_peft_model(model, lora_config)

# 下载并加载数据集，并将文本编码为 tokens
train_dataset, val_dataset = load_dataset(dataset_id_or_path, ...)
train_dataset = EncodePreprocessor(template=template)(train_dataset, num_proc=num_proc)
val_dataset = EncodePreprocessor(template=template)(val_dataset, num_proc=num_proc)

# 训练模型
training_args = Seq2SeqTrainingArguments(...)
trainer = Seq2SeqTrainer(
    model=model,
    args=training_args,
    template=template,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
)
trainer.train()

推理：

from swift import TransformersEngine, InferRequest, RequestConfig
# 使用原生 Transformers 引擎进行推理
engine = TransformersEngine(model_id_or_path, adapters=[lora_checkpoint])
infer_request = InferRequest(messages=[{'role': 'user', 'content': 'who are you?'}])
request_config = RequestConfig(max_tokens=max_new_tokens, temperature=temperature)

resp_list = engine.infer([infer_request], request_config)
print(f'response: {resp_list[0].choices[0].message.content}')

## ✨ 用法以下是使用 ms-swift 从训练到部署的最小示例。更多细节可查看 [示例](https://github.com/modelscope/ms-swift/tree/main/examples)。 - 如果要使用其他模型或数据集（包括多模态模型和数据集），只需修改 `--model` 以指定相应模型的 ID 或路径，并修改 `--dataset` 以指定相应数据集的 ID 或路径。 - 默认使用 ModelScope 下载模型和数据集。如果要使用 HuggingFace，只需指定 `--use_hf true`。 | 实用链接 | | ------ | | [🔥命令行参数](https://swift.readthedocs.io/en/latest/Instruction/Command-line-parameters.html) | | [Megatron-SWIFT](https://swift.readthedocs.io/en/latest/Megatron-SWIFT/Quick-start.html) | | [GRPO](https://swift.readthedocs.io/en/latest/Instruction/GRPO/GetStarted/GRPO.html) | | [支持的模型和数据集](https://swift.readthedocs.io/en/latest/Instruction/Supported-models-and-datasets.html) | | [自定义模型](https://swift.readthedocs.io/en/latest/Customization/Custom-model.html)，[🔥自定义数据集](https://swift.readthedocs.io/en/latest/Customization/Custom-dataset.html) | | [LLM 教程](https://github.com/modelscope/modelscope-classroom/tree/main/LLM-tutorial) | ### 训练支持的训练方法： | 方法 | 全参数 | LoRA | QLoRA | Deepspeed | 多机 | 多模态 | | ------------------------------------------------------------ | ------ | ---- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | | [预训练](https://github.com/modelscope/ms-swift/blob/main/examples/train/pretrain) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | [有监督微调](https://github.com/modelscope/ms-swift/blob/main/examples/train/lora_sft.sh) | [✅](https://github.com/modelscope/ms-swift/blob/main/examples/train/full/train.sh) | ✅ | [✅](https://github.com/modelscope/ms-swift/tree/main/examples/train/qlora) | [✅](https://github.com/modelscope/ms-swift/tree/main/examples/train/multi-gpu/deepspeed) | [✅](https://github.com/modelscope/ms-swift/tree/main/examples/train/multi-node) | [✅](https://github.com/modelscope/ms-swift/tree/main/examples/train/multimodal) | | [GRPO](https://github.com/modelscope/ms-swift/blob/main/examples/train/grpo) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | [GKD](https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/gkd) | ✅ | ✅ | ✅ | ✅ | ✅ | [✅](https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/rlhf/gkd) | | [PPO](https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/ppo) | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | | [DPO](https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/dpo) | ✅ | ✅ | ✅ | ✅ | ✅ | [✅](https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/rlhf/dpo) | | [KTO](https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/kto.sh) | ✅ | ✅ | ✅ | ✅ | ✅ | [✅](https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/rlhf/kto.sh) | | [奖励模型](https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/rm.sh) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | [CPO](https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/cpo.sh) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | [SimPO](https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/simpo.sh) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | [ORPO](https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/orpo.sh) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | [Embedding](https://github.com/modelscope/ms-swift/blob/main/examples/train/embedding) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | [Reranker](https://github.com/modelscope/ms-swift/tree/main/examples/train/reranker) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | [序列分类](https://github.com/modelscope/ms-swift/blob/main/examples/train/seq_cls) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | 预训练：

# 8*A100
NPROC_PER_NODE=8 \
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
swift pt \
    --model Qwen/Qwen3-4B-Base \
    --dataset swift/chinese-c4 \
    --streaming true \
    --tuner_type full \
    --deepspeed zero2 \
    --output_dir output \
    --max_steps 10000 \
    ...

微调：

CUDA_VISIBLE_DEVICES=0 swift sft \
    --model Qwen/Qwen3-4B-Instruct-2507 \
    --dataset AI-ModelScope/alpaca-gpt4-data-en \
    --tuner_type lora \
    --output_dir output \
    ...

RLHF：

CUDA_VISIBLE_DEVICES=0 swift rlhf \
    --rlhf_type dpo \
    --model Qwen/Qwen3-4B-Instruct-2507 \
    --dataset hjh0119/shareAI-Llama3-DPO-zh-en-emoji \
    --tuner_type lora \
    --output_dir output \
    ...

### Megatron-SWIFT ms-swift 支持使用 Megatron 并行技术加速训练，包括大规模集群训练和 MoE 模型训练。支持以下训练方法： | 方法 | 全参数 | LoRA | MoE | 多模态 | FP8 | | ---------------------------| ------ | ---- | --- | ------ | --- | | 预训练 | ✅ | ✅ | ✅ | ✅ | ✅ | | [有监督微调](https://github.com/modelscope/ms-swift/tree/main/examples/megatron) | ✅ | ✅ | ✅ | ✅ | ✅ | | [GRPO](https://github.com/modelscope/ms-swift/tree/main/examples/megatron/grpo) | ✅ | ✅ | ✅ | ✅ | ✅ | | [GKD](https://github.com/modelscope/ms-swift/tree/main/examples/megatron/rlhf/gkd) | ✅ | ✅ | ✅ | ✅ | ✅ | | [DPO](https://github.com/modelscope/ms-swift/tree/main/examples/megatron/rlhf/dpo) | ✅ | ✅ | ✅ | ✅ | ✅ | | [KTO](https://github.com/modelscope/ms-swift/tree/main/examples/megatron/rlhf/kto) | ✅ | ✅ | ✅ | ✅ | ✅ | | [RM](https://github.com/modelscope/ms-swift/tree/main/examples/megatron/rlhf/rm) | ✅ | ✅ | ✅ | ✅ | ✅ | | [Embedding](https://github.com/modelscope/ms-swift/tree/main/examples/megatron/embedding) | ✅ | ✅ | ✅ | ✅ | ✅ | | [Reranker](https://github.com/modelscope/ms-swift/tree/main/examples/megatron/reranker) | ✅ | ✅ | ✅ | ✅ | ✅ | | [序列分类](https://github.com/modelscope/ms-swift/tree/main/examples/megatron/seq_cls) | ✅ | ✅ | ✅ | ✅ | ✅ |

NPROC_PER_NODE=2 CUDA_VISIBLE_DEVICES=0,1 megatron sft \
    --model Qwen/Qwen3-4B-Instruct-2507 \
    --save_safetensors true \
    --dataset AI-ModelScope/alpaca-gpt4-data-zh \
    --tuner_type lora \
    --output_dir output \
    ...

### 强化学习 ms-swift 支持丰富的 GRPO 算法族： | 方法 | 全参数 | LoRA | 多模态 | 多机 | | ------------------------------------------------------------ | ------ | ---- | ------ | ------ | | [GRPO](https://swift.readthedocs.io/en/latest/Instruction/GRPO/GetStarted/GRPO.html) | ✅ | ✅ | ✅ | ✅ | | [DAPO](https://swift.readthedocs.io/en/latest/Instruction/GRPO/AdvancedResearch/DAPO.html) | ✅ | ✅ | ✅ | ✅ | | [GSPO](https://swift.readthedocs.io/en/latest/Instruction/GRPO/AdvancedResearch/GSPO.html) | ✅ | ✅ | ✅ | ✅ | | [SAPO](https://swift.readthedocs.io/en/latest/Instruction/GRPO/AdvancedResearch/SAPO.html) | ✅ | ✅ | ✅ | ✅ | | [CISPO](https://swift.readthedocs.io/en/latest/Instruction/GRPO/AdvancedResearch/CISPO.html) | ✅ | ✅ | ✅ | ✅ | | [CHORD](https://swift.readthedocs.io/en/latest/Instruction/GRPO/AdvancedResearch/CHORD.html) | ✅ | ✅ | ✅ | ✅ | | [RLOO](https://swift.readthedocs.io/en/latest/Instruction/GRPO/AdvancedResearch/RLOO.html) | ✅ | ✅ | ✅ | ✅ | | [Reinforce++](https://swift.readthedocs.io/en/latest/Instruction/GRPO/AdvancedResearch/REINFORCEPP.html) | ✅ | ✅ | ✅ | ✅ |

CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 \
swift rlhf \
    --rlhf_type grpo \
    --model Qwen/Qwen3-4B-Instruct-2507 \
    --tuner_type lora \
    --use_vllm true \
    --vllm_mode colocate \
    --dataset AI-MO/NuminaMath-TIR#10000 \
    --output_dir output \
    ...

### 推理

CUDA_VISIBLE_DEVICES=0 swift infer \
    --model Qwen/Qwen3-4B-Instruct-2507 \
    --stream true \
    --infer_backend transformers \
    --max_new_tokens 2048

### 界面推理

CUDA_VISIBLE_DEVICES=0 swift app \
    --model Qwen/Qwen3-4B-Instruct-2507 \
    --stream true \
    --infer_backend transformers \
    --max_new_tokens 2048

### 部署

CUDA_VISIBLE_DEVICES=0 swift deploy \
    --model Qwen/Qwen3-4B-Instruct-2507 \
    --infer_backend vllm

### 采样

项目地址：https://github.com/modelscope/ms-swift

26 次点击 ∙ 0 人收藏

登录后收藏

0 条回复

BishengJDK miniCPM-V? no

SWIFT（可扩展轻量级微调基础设施）