| 项目主页 | LLMLingua | LongLLMLingua | LLMLingua-2 | LLMLingua 演示 | LLMLingua-2 演示 |
https://github.com/microsoft/LLMLingua/assets/30883354/eb0ea70d-6d4c-4aa7-8977-61f94bb87438
LLMLingua 利用一个紧凑、训练有素的语言模型(例如 GPT2-small、LLaMA-7B)来识别并移除提示词中的非必要 token。这种方法能够实现大语言模型(LLM)的高效推理,在性能损失最小的情况下实现高达 20 倍的压缩。
LongLLMLingua 缓解了 LLM 中的“中间丢失”问题,增强了长上下文信息处理能力。它通过提示词压缩降低成本并提高效率,仅使用 1/4 的 token 即可将 RAG 性能提升高达 21.4%。
LLMLingua-2 是一种小型但强大的提示词压缩方法,通过从 GPT-4 进行数据蒸馏来训练用于 token 分类的 BERT 级编码器,擅长任务无关的压缩。它在处理领域外数据方面超越了 LLMLingua,性能快 3-6 倍。
SecurityLingua 是一个安全护栏模型,它使用安全感知的提示词压缩来揭示越狱攻击背后的恶意意图,使 LLM 能够检测攻击并生成安全响应。由于高效的提示词压缩,与最先进的 LLM 护栏方法相比,其防御开销可忽略不计,token 成本减少 100 倍。

像 ChatGPT 和 GPT-4 这样的大语言模型虽然在泛化和推理方面表现出色,但常常面临提示词长度限制和基于提示词的定价方案等挑战。

现在你可以使用 LLMLingua、LongLLMLingua 和 LLMLingua-2!
这些工具提供了一种高效的解决方案,可将提示词压缩高达 20 倍,从而增强 LLM 的实用性。



PS:此演示基于 alt-gpt 项目。特别感谢 @Livshitz 的宝贵贡献。
如果你觉得这个仓库有帮助,请引用以下论文:
@inproceedings{jiang-etal-2023-llmlingua,
title = "{LLML}ingua: Compressing Prompts for Accelerated Inference of Large Language Models",
author = "Huiqiang Jiang and Qianhui Wu and Chin-Yew Lin and Yuqing Yang and Lili Qiu",
editor = "Bouamor, Houda and
Pino, Juan and
Bali, Kalika",
booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
month = dec,
year = "2023",
address = "Singapore",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.emnlp-main.825",
doi = "10.18653/v1/2023.emnlp-main.825",
pages = "13358--13376",
}
@inproceedings{jiang-etal-2024-longllmlingua,
title = "{L}ong{LLML}ingua: Accelerating and Enhancing {LLM}s in Long Context Scenarios via Prompt Compression",
author = "Huiqiang Jiang and Qianhui Wu and and Xufang Luo and Dongsheng Li and Chin-Yew Lin and Yuqing Yang and Lili Qiu",
editor = "Ku, Lun-Wei and
Martins, Andre and
Srikumar, Vivek",
booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = aug,
year = "2024",
address = "Bangkok, Thailand",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.acl-long.91",
pages = "1658--1677",
}
@inproceedings{pan-etal-2024-llmlingua,
title = "{LLML}ingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression",
author = "Zhuoshi Pan and Qianhui Wu and Huiqiang Jiang and Menglin Xia and Xufang Luo and Jue Zhang and Qingwei Lin and Victor Ruhle and Yuqing Yang and Chin-Yew Lin and H. Vicky Zhao and Lili Qiu and Dongmei Zhang",
editor = "Ku, Lun-Wei and
Martins, Andre and
Srikumar, Vivek",
booktitle = "Findings of the Association for Computational Linguistics ACL 2024",
month = aug,
year = "2024",
address = "Bangkok, Thailand and virtual meeting",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.findings-acl.57",
pages = "963--981",
}
@inproceedings{li2025securitylingua,
title={{S}ecurity{L}ingua: Efficient Defense of {LLM} Jailbreak Attacks via Security-Aware Prompt Compression},
author={Yucheng Li and Surin Ahn and Huiqiang Jiang and Amir H. Abdi and Yuqing Yang and Lili Qiu},
booktitle={Second Conference on Language Modeling},
year={2025},
url={https://openreview.net/forum?id=tybbSo6wba}
}
要开始使用 LLMLingua,只需使用 pip 安装:
pip install llmlingua
使用 LLMLingua,你可以轻松压缩你的提示词。以下是如何操作:
from llmlingua import PromptCompressor
llm_lingua = PromptCompressor()
compressed_prompt = llm_lingua.compress_prompt(prompt, instruction="", question="", target_token=200)
# > {'compressed_prompt': 'Question: Sam bought a dozen boxes, each with 30 highlighter pens inside, for $10 each box. He reanged five of boxes into packages of sixlters each and sold them $3 per. He sold the rest theters separately at the of three pens $2. How much did make in total, dollars?\nLets think step step\nSam bought 1 boxes x00 oflters.\nHe bought 12 * 300ters in total\nSam then took 5 boxes 6ters0ters.\nHe sold these boxes for 5 *5\nAfterelling these boxes there were 3030 highlighters remaining.\nThese form 330 / 3 = 110 groups of three pens.\nHe sold each of these groups for $2 each, so made 110 * 2 = $220 from them.\nIn total, then, he earned $220 + $15 = $235.\nSince his original cost was $120, he earned $235 - $120 = $115 in profit.\nThe answer is 115',
# 'origin_tokens': 2365,
# 'compressed_tokens': 211,
# 'ratio': '11.2x',
# 'saving': ', Saving $0.1 in GPT-4.'}
## 或者使用 phi-2 模型,
llm_lingua = PromptCompressor("microsoft/phi-2")
## 或者使用量化模型,例如 TheBloke/Llama-2-7b-Chat-GPTQ,仅需 <8GB GPU 内存。
## 在此之前,你需要 pip install optimum auto-gptq
llm_lingua = PromptCompressor("TheBloke/Llama-2-7b-Chat-GPTQ", model_config={"revision": "main"})
要在你的场景中尝试 LongLLMLingua,可以使用:
from llmlingua import PromptCompressor
llm_lingua = PromptCompressor()
compressed_prompt = llm_lingua.compress_prompt(
prompt_list,
question=question,
rate=0.55,
# 设置 LongLLMLingua 的特殊参数
condition_in_question="after_condition",
reorder_context="sort",
dynamic_context_compression_ratio=0.3, # 或 0.4
condition_compare=True,
context_budget="+100",
rank_method="longllmlingua",
)
要在你的场景中尝试 LLMLingua-2,可以使用:
from llmlingua import PromptCompressor
llm_lingua = PromptCompressor(
model_name="microsoft/llmlingua-2-xlm-roberta-large-meetingbank",
use_llmlingua2=True, # 是否使用 llmlingua-2
)
compressed_prompt = llm_lingua.compress_prompt(prompt, rate=0.33, force_tokens = ['\n', '?'])
## 或者使用 LLMLingua-2-small 模型
llm_lingua = PromptCompressor(
model_name="microsoft/llmlingua-2-bert-base-multilingual-cased-meetingbank",
use_llmlingua2=True, # 是否使用 llmlingua-2
)
要在你的场景中尝试 SecurityLingua,可以使用:
from llmlingua import PromptCompressor
securitylingua = PromptCompressor(
model_name="SecurityLingua/securitylingua-xlm-s2s",
use_slingua=True
)
intention = securitylingua.compress_prompt(malicious_prompt)
有关 SecurityLingua 的更多详细信息,请参阅 securitylingua 自述文件。
将文本分成多个部分,决定是否压缩及其压缩率。使用 <llmlingua></llmlingua> 标签进行上下文分割,可选参数包括 rate 和 compress。
```python
structured_prompt = """Speaker 4: Thank you. And can we do the functions for