Transformers — HuggingFace 模型库

OA0

OA0 是一个探索 AI 的社区

现在注册

已注册用户请登录

English | 简体中文 | 繁體中文 | 한국어 | Español | 日本語 | हिन्दी | Русский | Português | తెలుగు | Français | Deutsch | Italiano | Tiếng Việt | العربية | اردو | বাংলা |

用于推理和训练的最先进的预训练模型

Transformers 是一个模型定义框架，用于处理文本、计算机视觉、音频、视频和多模态领域最先进的机器学习模型，支持推理和训练。

它统一了模型定义，确保整个生态系统对此定义达成一致。transformers 是连接不同框架的枢纽：如果一个模型定义得到支持，它将与大多数训练框架（Axolotl、Unsloth、DeepSpeed、FSDP、PyTorch-Lightning 等）、推理引擎（vLLM、SGLang、TGI 等）以及利用 transformers 模型定义的相邻建模库（llama.cpp、mlx 等）兼容。

我们致力于帮助支持新的最先进模型，并通过使其模型定义简单、可定制且高效来普及它们的使用。

在 Hugging Face Hub 上有超过 100 万个 Transformers 模型检查点可供使用。

立即探索 Hub 以找到合适的模型，并使用 Transformers 快速上手。

安装

Transformers 需要 Python 3.10+ 和 PyTorch 2.4+。

使用 venv 或 uv（一个基于 Rust 的快速 Python 包和项目管理器）创建并激活虚拟环境。

# venv
python -m venv .my-env
source .my-env/bin/activate
# uv
uv venv .my-env
source .my-env/bin/activate

在虚拟环境中安装 Transformers。

# pip
pip install "transformers[torch]"

# uv
uv pip install "transformers[torch]"

如果你想获取库的最新更改或有意参与贡献，可以从源码安装 Transformers。但请注意，最新版本可能不稳定。如果遇到错误，欢迎提交 issue。

git clone https://github.com/huggingface/transformers.git
cd transformers

# pip
pip install '.[torch]'

# uv
uv pip install '.[torch]'

快速开始

使用 Pipeline API 立即开始使用 Transformers。Pipeline 是一个高级推理类，支持文本、音频、视觉和多模态任务。它负责预处理输入并返回相应的输出。

实例化一个 pipeline 并指定用于文本生成的模型。模型会被下载并缓存，方便后续重复使用。最后，传入一些文本作为提示。

from transformers import pipeline

pipeline = pipeline(task="text-generation", model="Qwen/Qwen2.5-1.5B")
pipeline("the secret to baking a really good cake is ")
[{'generated_text': 'the secret to baking a really good cake is 1) to use the right ingredients and 2) to follow the recipe exactly. the recipe for the cake is as follows: 1 cup of sugar, 1 cup of flour, 1 cup of milk, 1 cup of butter, 1 cup of eggs, 1 cup of chocolate chips. if you want to make 2 cakes, how much sugar do you need? To make 2 cakes, you will need 2 cups of sugar.'}]

与模型对话的使用模式相同。唯一的区别是需要构建一个你与系统之间的聊天历史记录（作为 Pipeline 的输入）。

[!TIP]
你也可以直接从命令行与模型聊天，前提是 transformers serve 正在运行。
shell transformers chat Qwen/Qwen2.5-0.5B-Instruct

import torch
from transformers import pipeline

chat = [
    {"role": "system", "content": "You are a sassy, wise-cracking robot as imagined by Hollywood circa 1986."},
    {"role": "user", "content": "Hey, can you tell me any fun things to do in New York?"}
]

pipeline = pipeline(task="text-generation", model="meta-llama/Meta-Llama-3-8B-Instruct", dtype=torch.bfloat16, device_map="auto")
response = pipeline(chat, max_new_tokens=512)
print(response[0]["generated_text"][-1]["content"])

展开下面的示例，了解 Pipeline 如何用于不同模态和任务。

自动语音识别

from transformers import pipeline

pipeline = pipeline(task="automatic-speech-recognition", model="openai/whisper-large-v3")
pipeline("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac")
{'text': ' I have a dream that one day this nation will rise up and live out the true meaning of its creed.'}

图像分类

from transformers import pipeline

pipeline = pipeline(task="image-classification", model="facebook/dinov2-small-imagenet1k-1-layer")
pipeline("https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png")
[{'label': 'macaw', 'score': 0.997848391532898},
 {'label': 'sulphur-crested cockatoo, Kakatoe galerita, Cacatua galerita',
  'score': 0.0016551691805943847},
 {'label': 'lorikeet', 'score': 0.00018523589824326336},
 {'label': 'African grey, African gray, Psittacus erithacus',
  'score': 7.85409429227002e-05},
 {'label': 'quail', 'score': 5.502637941390276e-05}]

视觉问答

from transformers import pipeline

pipeline = pipeline(task="visual-question-answering", model="Salesforce/blip-vqa-base")
pipeline(
    image="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/idefics-few-shot.jpg",
    question="What is in the image?",
)
[{'answer': 'statue of liberty'}]

为什么我应该使用 Transformers？

易于使用的最先进模型：
- 在自然语言理解与生成、计算机视觉、音频、视频和多模态任务上具有高性能。
- 对研究人员、工程师和开发者来说入门门槛低。
- 用户只需学习三个核心类，抽象层次少。
- 统一的 API 用于使用所有预训练模型。
降低计算成本，减少碳足迹：
- 共享训练好的模型，无需从头开始训练。
- 减少计算时间和生产成本。
- 提供数十种模型架构，涵盖所有模态，拥有超过 100 万个预训练检查点。
为模型生命周期的每个阶段选择合适的框架：
- 用 3 行代码训练最先进的模型。
- 可以在 PyTorch/JAX/TF2.0 框架之间随意迁移单个模型。
- 为训练、评估和生产选择最合适的框架。
轻松定制模型或示例以满足需求：
- 我们为每种架构提供了示例，以复现其原作者发布的结果。
- 模型内部结构尽可能一致地暴露出来。
- 模型文件可以独立于库使用，便于快速实验。

为什么我不应该使用 Transformers？

这个库不是神经网络的模块化构建工具箱。模型文件中的代码故意没有进行额外的抽象重构，以便研究人员能够快速迭代每个模型，而无需深入额外的抽象层/文件。
训练 API 针对 Transformers 提供的 PyTorch 模型进行了优化。对于通用的机器学习循环，你应该使用其他库，如 Accelerate。
示例脚本仅仅是示例。它们可能无法在你的特定用例上开箱即用，你需要调整代码才能使其工作。

100 个使用 Transformers 的项目

Transformers 不仅仅是一个使用预训练模型的工具包，它还是一个围绕它和 Hugging Face Hub 构建的项目社区。我们希望 Transformers 能够赋能开发者、研究人员、学生、教授、工程师以及任何人构建他们梦想的项目。

为了庆祝 Transformers 获得 100,000 颗星，我们希望通过 awesome-transformers 页面来聚焦社区，该页面列出了 100 个使用 Transformers 构建的杰出项目。

如果你拥有或使用一个你认为应该列入列表的项目，请提交 PR 添加它！

模型示例

你可以在它们的 Hub 模型页面上直接测试我们的大多数模型。

展开下面的每个模态，查看针对不同用例的一些示例模型。

音频

* 音频分类：[CLAP](https://huggingface.co/laion/clap-htsat-fused) * 自动语音识别：[Parakeet](https://huggingface.co/nvidia/parakeet-ctc-1.1b#transcribing-using-transformers-%F0%9F%A4%97)、[Whisper](https://huggingface.co/openai/whisper-large-v3-turbo)、[GLM-ASR](https://huggingface.co/zai-org/GLM-ASR-Nano-2512) 和 [Moonshine-Streaming](https://huggingface.co/UsefulSensors/moonshine-streaming-medium) * 关键词检测：[Wav2Vec2](https://huggingface.co/superb/wav2vec2-base-superb-ks) * 语音到语音生成：[Moshi](https://huggingface.co/kyutai/moshiko-pytorch-bf16) * 文本到音频：[MusicGen](https://huggingface.co/facebook/musicgen-large) * 文本到语音：[CSM](https://huggingface.co/sesame/csm-1b)

计算机视觉

* 自动掩码生成：[SAM](https://huggingface.co/facebook/sam-vit-base) * 深度估计：[DepthPro](https://huggingface.co/apple/DepthPro-hf) * 图像分类：[DINO v2](https://huggingface.co/facebook/dinov2-base) * 关键点检测：[SuperPoint](https://huggingface.co/magic-leap-community/superpoint) * 关键点匹配：[SuperGlue](https://huggingface.co/magic-leap-community/superglue_outdoor) * 目标检测：[RT-DETRv2](https://huggingface.co/PekingU/rtdetr_v2_r50vd) * 姿态估计：[VitPose](https://huggingface.co/usyd-community/vitpose-base-simple) * 通用分割：[OneFormer](https://huggingface.co/shi-labs/oneformer_ade20k_swin_large) * 视频分类：[VideoMAE](https://huggingface.co/MCG-NJU/videomae-large)

多模态

* 音频或文本到文本：[Voxtral](https://huggingface.co/mistralai/Voxtral-Mini-

项目地址：https://github.com/huggingface/transformers

75 次点击 ∙ 0 人收藏

登录后收藏

0 条回复