LightLLM — 高性能 LLM 推理框架

OA0

OA0 是一个探索 AI 的社区

现在注册

已注册用户请登录

[![docs](https://img.shields.io/badge/docs-latest-blue)](https://lightllm-en.readthedocs.io/en/latest/) [![Docker](https://github.com/ModelTC/lightllm/actions/workflows/docker-publish.yml/badge.svg)](https://github.com/ModelTC/lightllm/actions/workflows/docker-publish.yml) [![stars](https://img.shields.io/github/stars/ModelTC/lightllm?style=social)](https://github.com/ModelTC/lightllm) ![visitors](https://komarev.com/ghpvc/?username=lightllm&label=visitors) [![Discord Banner](https://img.shields.io/discord/1139835312592392214?logo=discord&logoColor=white)](https://discord.gg/WzzfwVSguU) [![license](https://img.shields.io/github/license/ModelTC/lightllm)](https://github.com/ModelTC/lightllm/blob/main/LICENSE)

LightLLM 是一个基于 Python 的 LLM（大语言模型）推理与服务框架，以其轻量级设计、易于扩展和高性能而著称。LightLLM 汲取了多个备受赞誉的开源实现（包括但不限于 FasterTransformer、TGI、vLLM 和 FlashAttention）的优点。

English Docs | 中文文档 | 博客

技术博客

[2025/11] 🚀 现已支持 DP ranker 之间的前缀 KV Cache 传输！查看我们的博客文章了解技术深度解析。

快速开始

性能

了解更多信息，请查看发布博客：v1.1.0 博客。

常见问题

更多信息请参阅 FAQ。

使用 LightLLM 的项目

我们欢迎任何形式的合作与贡献。如果有项目需要 LightLLM 的支持，请通过电子邮件联系我们或创建 Pull Request。

基于 LightLLM 或参考了 LightLLM 组件的项目：
- LoongServe，北京大学
- vLLM（使用了部分 LightLLM 的内核）
- SGLang（使用了部分 LightLLM 的内核）
- ParrotServe，微软
- Aphrodite（使用了部分 LightLLM 的内核）
- S-LoRA
- OmniKV，蚂蚁集团
- Lab4AI LightLLM+LlamaIndex, Lab4AI LightLLM+Qwen3-8B
- LazyLLM

此外，LightLLM 的纯 Python 设计和令牌级 KV Cache 管理使其易于作为研究项目的基础。

基于或使用了部分 LightLLM 的学术工作：
- ParrotServe (OSDI’24)
- SLoRA (MLSys’24)
- LoongServe (SOSP’24)
- ByteDance’s CXL (Eurosys’24)
- VTC (OSDI’24)
- OmniKV (ICLR’25)
- CaraServe, LoRATEE, FastSwitch ...

社区

如需了解更多信息和参与讨论，请加入我们的 Discord 服务器。欢迎成为社区一员，期待您的贡献！

许可证

本仓库根据 Apache-2.0 许可证发布。

致谢

在开发 LightLLM 的过程中，我们从以下项目中学习良多。
- Faster Transformer
- Text Generation Inference
- vLLM
- SGLang
- flashinfer
- Flash Attention 1&2
- OpenAI Triton

引用

我们围绕 LightLLM 的组件或特性发表了一系列论文，如果您在工作中使用了 LightLLM，请考虑引用相关论文。

约束解码：被 ACL2025 录用并荣获杰出论文奖。

@inproceedings{
anonymous2025pre,
title={Pre\${\textasciicircum}3\$: Enabling Deterministic Pushdown Automata for Faster Structured {LLM} Generation},
author={Anonymous},
booktitle={Submitted to ACL Rolling Review - February 2025},
year={2025},
url={https://openreview.net/forum?id=g1aBeiyZEi},
note={under review}
}

请求调度器：被 ASPLOS’25 录用：

@inproceedings{gong2025past,
  title={Past-Future Scheduler for LLM Serving under SLA Guarantees},
  author={Gong, Ruihao and Bai, Shihao and Wu, Siyu and Fan, Yunqian and Wang, Zaijun and Li, Xiuhong and Yang, Hailong and Liu, Xianglong},
  booktitle={Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2},
  pages={798--813},
  year={2025}
}

项目地址：https://github.com/ModelTC/lightllm

28 次点击 ∙ 0 人收藏

登录后收藏

0 条回复

LightLLM — 高性能 LLM 推理框架

技术博客

最新动态

快速开始

性能

常见问题

使用 LightLLM 的项目

社区

许可证

致谢

引用