OA0

OA0 是一个探索 AI 的社区

现在注册

已注册用户请登录

OA0 › 代码 › LLMLingua — 面向提示压缩的高效工具，降低上下文成本

LLMLingua — 面向提示压缩的高效工具，降低上下文成本

network · 2026-02-19 09:19:04 · 57 次点击 · 0 条评论

LLMLingua 系列 | 通过提示词压缩高效传递信息给大语言模型

https://github.com/microsoft/LLMLingua/assets/30883354/eb0ea70d-6d4c-4aa7-8977-61f94bb87438

摘要

LLMLingua 利用一个紧凑、训练有素的语言模型（例如 GPT2-small、LLaMA-7B）来识别并移除提示词中的非必要 token。这种方法能够实现大语言模型（LLM）的高效推理，在性能损失最小的情况下实现高达 20 倍的压缩。

LLMLingua: 压缩提示词以加速大语言模型推理 (EMNLP 2023)

Huiqiang Jiang, Qianhui Wu, Chin-Yew Lin, Yuqing Yang and Lili Qiu

LongLLMLingua 缓解了 LLM 中的“中间丢失”问题，增强了长上下文信息处理能力。它通过提示词压缩降低成本并提高效率，仅使用 1/4 的 token 即可将 RAG 性能提升高达 21.4%。

LongLLMLingua: 通过提示词压缩在长上下文场景中加速和增强 LLM (ACL 2024 和 ICLR ME-FoMo 2024)

Huiqiang Jiang, Qianhui Wu, Xufang Luo, Dongsheng Li, Chin-Yew Lin, Yuqing Yang and Lili Qiu

LLMLingua-2 是一种小型但强大的提示词压缩方法，通过从 GPT-4 进行数据蒸馏来训练用于 token 分类的 BERT 级编码器，擅长任务无关的压缩。它在处理领域外数据方面超越了 LLMLingua，性能快 3-6 倍。

LLMLingua-2: 用于高效且忠实任务无关提示词压缩的数据蒸馏 (ACL 2024 Findings)

Zhuoshi Pan, Qianhui Wu, Huiqiang Jiang, Menglin Xia, Xufang Luo, Jue Zhang, Qingwei Lin, Victor Ruhle, Yuqing Yang, Chin-Yew Lin, H. Vicky Zhao, Lili Qiu, Dongmei Zhang

SecurityLingua 是一个安全护栏模型，它使用安全感知的提示词压缩来揭示越狱攻击背后的恶意意图，使 LLM 能够检测攻击并生成安全响应。由于高效的提示词压缩，与最先进的 LLM 护栏方法相比，其防御开销可忽略不计，token 成本减少 100 倍。

SecurityLingua: 通过安全感知提示词压缩高效防御 LLM 越狱攻击 (CoLM 2025)

Yucheng Li, Surin Ahn, Huiqiang Jiang, Amir H. Abdi, Yuqing Yang and Lili Qiu

🎥 概述

是否曾遇到过让 ChatGPT 总结长文本时遇到 token 限制？
是否对 ChatGPT 在大量微调后忘记之前的指令感到沮丧？
是否曾因使用 GPT3.5/4 API 进行实验而成本高昂，尽管结果很好？

像 ChatGPT 和 GPT-4 这样的大语言模型虽然在泛化和推理方面表现出色，但常常面临提示词长度限制和基于提示词的定价方案等挑战。

LLMLingua 的动机

现在你可以使用 LLMLingua、LongLLMLingua 和 LLMLingua-2！

这些工具提供了一种高效的解决方案，可将提示词压缩高达 20 倍，从而增强 LLM 的实用性。

💰 节省成本：减少提示词和生成内容的长度，开销最小。
📝 扩展上下文支持：增强对更长上下文的支持，缓解“中间丢失”问题，并提升整体性能。
⚖️ 鲁棒性：无需对 LLM 进行额外训练。
🕵️ 知识保留：保持原始提示词信息，如 ICL 和推理链。
📜 KV 缓存压缩：加速推理过程。
🪃 完整恢复：GPT-4 可以从压缩后的提示词中恢复所有关键信息。

LLMLingua 框架

LongLLMLingua 框架

LLMLingua-2 框架

PS：此演示基于 alt-gpt 项目。特别感谢 @Livshitz 的宝贵贡献。

如果你觉得这个仓库有帮助，请引用以下论文：

@inproceedings{jiang-etal-2023-llmlingua,
    title = "{LLML}ingua: Compressing Prompts for Accelerated Inference of Large Language Models",
    author = "Huiqiang Jiang and Qianhui Wu and Chin-Yew Lin and Yuqing Yang and Lili Qiu",
    editor = "Bouamor, Houda  and
      Pino, Juan  and
      Bali, Kalika",
    booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
    month = dec,
    year = "2023",
    address = "Singapore",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.emnlp-main.825",
    doi = "10.18653/v1/2023.emnlp-main.825",
    pages = "13358--13376",
}

@inproceedings{jiang-etal-2024-longllmlingua,
    title = "{L}ong{LLML}ingua: Accelerating and Enhancing {LLM}s in Long Context Scenarios via Prompt Compression",
    author = "Huiqiang Jiang and Qianhui Wu and and Xufang Luo and Dongsheng Li and Chin-Yew Lin and Yuqing Yang and Lili Qiu",
    editor = "Ku, Lun-Wei  and
      Martins, Andre  and
      Srikumar, Vivek",
    booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.acl-long.91",
    pages = "1658--1677",
}

@inproceedings{pan-etal-2024-llmlingua,
    title = "{LLML}ingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression",
    author = "Zhuoshi Pan and Qianhui Wu and Huiqiang Jiang and Menglin Xia and Xufang Luo and Jue Zhang and Qingwei Lin and Victor Ruhle and Yuqing Yang and Chin-Yew Lin and H. Vicky Zhao and Lili Qiu and Dongmei Zhang",
    editor = "Ku, Lun-Wei  and
      Martins, Andre  and
      Srikumar, Vivek",
    booktitle = "Findings of the Association for Computational Linguistics ACL 2024",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand and virtual meeting",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.findings-acl.57",
    pages = "963--981",
}

@inproceedings{li2025securitylingua,
  title={{S}ecurity{L}ingua: Efficient Defense of {LLM} Jailbreak Attacks via Security-Aware Prompt Compression},
  author={Yucheng Li and Surin Ahn and Huiqiang Jiang and Amir H. Abdi and Yuqing Yang and Lili Qiu},
  booktitle={Second Conference on Language Modeling},
  year={2025},
  url={https://openreview.net/forum?id=tybbSo6wba}
}

🎯 快速开始

1. 安装 LLMLingua:

要开始使用 LLMLingua，只需使用 pip 安装：

pip install llmlingua

2. 使用 LLMLingua 系列方法进行提示词压缩:

使用 LLMLingua，你可以轻松压缩你的提示词。以下是如何操作：

from llmlingua import PromptCompressor

llm_lingua = PromptCompressor()
compressed_prompt = llm_lingua.compress_prompt(prompt, instruction="", question="", target_token=200)

# > {'compressed_prompt': 'Question: Sam bought a dozen boxes, each with 30 highlighter pens inside, for $10 each box. He reanged five of boxes into packages of sixlters each and sold them $3 per. He sold the rest theters separately at the of three pens $2. How much did make in total, dollars?\nLets think step step\nSam bought 1 boxes x00 oflters.\nHe bought 12 * 300ters in total\nSam then took 5 boxes 6ters0ters.\nHe sold these boxes for 5 *5\nAfterelling these  boxes there were 3030 highlighters remaining.\nThese form 330 / 3 = 110 groups of three pens.\nHe sold each of these groups for $2 each, so made 110 * 2 = $220 from them.\nIn total, then, he earned $220 + $15 = $235.\nSince his original cost was $120, he earned $235 - $120 = $115 in profit.\nThe answer is 115',
#  'origin_tokens': 2365,
#  'compressed_tokens': 211,
#  'ratio': '11.2x',
#  'saving': ', Saving $0.1 in GPT-4.'}

## 或者使用 phi-2 模型，
llm_lingua = PromptCompressor("microsoft/phi-2")

## 或者使用量化模型，例如 TheBloke/Llama-2-7b-Chat-GPTQ，仅需 <8GB GPU 内存。
## 在此之前，你需要 pip install optimum auto-gptq
llm_lingua = PromptCompressor("TheBloke/Llama-2-7b-Chat-GPTQ", model_config={"revision": "main"})

要在你的场景中尝试 LongLLMLingua，可以使用：

from llmlingua import PromptCompressor

llm_lingua = PromptCompressor()
compressed_prompt = llm_lingua.compress_prompt(
    prompt_list,
    question=question,
    rate=0.55,
    # 设置 LongLLMLingua 的特殊参数
    condition_in_question="after_condition",
    reorder_context="sort",
    dynamic_context_compression_ratio=0.3, # 或 0.4
    condition_compare=True,
    context_budget="+100",
    rank_method="longllmlingua",
)

要在你的场景中尝试 LLMLingua-2，可以使用：

from llmlingua import PromptCompressor

llm_lingua = PromptCompressor(
    model_name="microsoft/llmlingua-2-xlm-roberta-large-meetingbank",
    use_llmlingua2=True, # 是否使用 llmlingua-2
)
compressed_prompt = llm_lingua.compress_prompt(prompt, rate=0.33, force_tokens = ['\n', '?'])

## 或者使用 LLMLingua-2-small 模型
llm_lingua = PromptCompressor(
    model_name="microsoft/llmlingua-2-bert-base-multilingual-cased-meetingbank",
    use_llmlingua2=True, # 是否使用 llmlingua-2
)

要在你的场景中尝试 SecurityLingua，可以使用：

from llmlingua import PromptCompressor

securitylingua = PromptCompressor(
    model_name="SecurityLingua/securitylingua-xlm-s2s",
    use_slingua=True
)
intention = securitylingua.compress_prompt(malicious_prompt)

有关 SecurityLingua 的更多详细信息，请参阅 securitylingua 自述文件。

3. 高级用法 - 结构化提示词压缩:

将文本分成多个部分，决定是否压缩及其压缩率。使用 <llmlingua></llmlingua> 标签进行上下文分割，可选参数包括 rate 和 compress。

```python
structured_prompt = """Speaker 4: Thank you. And can we do the functions for

项目地址：https://github.com/microsoft/LLMLingua

57 次点击 ∙ 0 人收藏

登录后收藏

0 条回复