Petals — 分布式 LLM 推理网络

OA0

OA0 是一个探索 AI 的社区

现在注册

已注册用户请登录

以 BitTorrent 风格在家运行大语言模型。
微调与推理速度比卸载快高达 10 倍

使用分布式 Llama 3.1（高达 405B）、Mixtral（8x22B）、Falcon（40B+）或 BLOOM（176B）生成文本，并针对您的任务进行微调 — 直接在您的台式机或 Google Colab 上即可完成：

from transformers import AutoTokenizer
from petals import AutoDistributedModelForCausalLM

# 选择 https://health.petals.dev 上可用的任意模型
model_name = "meta-llama/Meta-Llama-3.1-405B-Instruct"

# 连接到托管模型层的分布式网络
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoDistributedModelForCausalLM.from_pretrained(model_name)

# 像在本地计算机上一样运行模型
inputs = tokenizer("A cat sat", return_tensors="pt")["input_ids"]
outputs = model.generate(inputs, max_new_tokens=5)
print(tokenizer.decode(outputs[0]))  # A cat sat on a mat...

🚀 立即在 Colab 中尝试

🦙 想运行 Llama？ 请先申请访问权限获取其权重，然后在加载模型前在终端运行 huggingface-cli login。或者直接在我们的聊天机器人应用中试用。

🔏 隐私说明。 您的数据将在公共集群中借助其他人的帮助进行处理。了解更多关于隐私的信息请点击此处。对于敏感数据，您可以在信任的人之间建立私有集群。

💬 有任何问题？ 欢迎在我们的 Discord 中联系我们！

连接您的 GPU 以增加 Petals 容量

Petals 是一个社区运行的系统 — 我们依赖于人们共享他们的 GPU。您可以帮助服务一个可用模型，或者从 🤗 Model Hub 托管一个新模型！

例如，以下是如何在您的 GPU 上托管 Llama 3.1 (405B) Instruct 的一部分：

🦙 想托管 Llama？ 请先申请访问权限获取其权重，然后在加载模型前在终端运行 huggingface-cli login。

🐧 Linux + Anaconda。 对于 NVIDIA GPU 运行以下命令（或参考此指南配置 AMD GPU）：

conda install pytorch pytorch-cuda=11.7 -c pytorch -c nvidia
pip install git+https://github.com/bigscience-workshop/petals
python -m petals.cli.run_server meta-llama/Meta-Llama-3.1-405B-Instruct

🪟 Windows + WSL。 请遵循我们 Wiki 上的此指南。

🐋 Docker。 对于 NVIDIA GPU 运行我们的 Docker 镜像（或参考此指南配置 AMD GPU）：

sudo docker run -p 31330:31330 --ipc host --gpus all --volume petals-cache:/cache --rm \
    learningathome/petals:main \
    python -m petals.cli.run_server --port 31330 meta-llama/Meta-Llama-3.1-405B-Instruct

🍏 macOS + Apple M1/M2 GPU。 安装 Homebrew，然后运行以下命令：

brew install python
python3 -m pip install git+https://github.com/bigscience-workshop/petals
python3 -m petals.cli.run_server meta-llama/Meta-Llama-3.1-405B-Instruct

📚 了解更多（如何使用多个 GPU、开机自启动服务器等）

🔒 安全性。 托管服务器不会允许他人在您的计算机上运行自定义代码。了解更多信息请点击此处。

💬 有任何问题？ 欢迎在我们的 Discord 中联系我们！

🏆 感谢您！ 一旦您加载并托管了 10 个以上的模块，我们可以在集群监控器上显示您的名字或链接以示感谢。您可以使用 --public_name YOUR_NAME 来指定它们。

工作原理

您加载模型的一小部分，然后加入一个由其他人服务其他部分的网络。单批次推理速度可达 Llama 2 (70B) 6 个 token/秒，Falcon (180B) 4 个 token/秒 — 足以支持聊天机器人和交互式应用。
您可以采用任何微调和采样方法，执行通过模型的自定义路径，或查看其隐藏状态。您既能享受 API 的便利，又能获得 PyTorch 和 🤗 Transformers 的灵活性。

📜 阅读论文 📚 查看常见问题

📚 教程、示例及其他

基础教程：

入门指南：教程
为 Llama-65B 进行文本语义分类的提示调优：教程
为 BLOOM 进行个性化聊天机器人创建的提示调优：教程

实用工具：

聊天机器人网页应用（通过 HTTP/WebSocket 端点连接到 Petals）：源代码
公共集群监控器：源代码

高级指南：

启动私有集群：指南
运行自定义模型：指南

基准测试

请参阅我们论文的 第 3.3 节。

🛠️ 贡献

请参阅关于贡献的常见问题。

📜 引用文献

Alexander Borzunov, Dmitry Baranchuk, Tim Dettmers, Max Ryabinin, Younes Belkada, Artem Chumachenko, Pavel Samygin, and Colin Raffel.
Petals: Collaborative Inference and Fine-tuning of Large Models.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations). 2023.

@inproceedings{borzunov2023petals,
  title = {Petals: Collaborative Inference and Fine-tuning of Large Models},
  author = {Borzunov, Alexander and Baranchuk, Dmitry and Dettmers, Tim and Riabinin, Maksim and Belkada, Younes and Chumachenko, Artem and Samygin, Pavel and Raffel, Colin},
  booktitle = {Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)},
  pages = {558--568},
  year = {2023},
  url = {https://arxiv.org/abs/2209.01188}
}

Alexander Borzunov, Max Ryabinin, Artem Chumachenko, Dmitry Baranchuk, Tim Dettmers, Younes Belkada, Pavel Samygin, and Colin Raffel.
Distributed inference and fine-tuning of large language models over the Internet.
Advances in Neural Information Processing Systems 36 (2023).

@inproceedings{borzunov2023distributed,
  title = {Distributed inference and fine-tuning of large language models over the {I}nternet},
  author = {Borzunov, Alexander and Ryabinin, Max and Chumachenko, Artem and Baranchuk, Dmitry and Dettmers, Tim and Belkada, Younes and Samygin, Pavel and Raffel, Colin},
  booktitle = {Advances in Neural Information Processing Systems},
  volume = {36},
  pages = {12312--12331},
  year = {2023},
  url = {https://arxiv.org/abs/2312.08361}
}

本项目是 BigScience 研究研讨会的一部分。

项目地址：https://github.com/bigscience-workshop/petals

67 次点击 ∙ 0 人收藏

登录后收藏

0 条回复