OA0

OA0 是一个探索 AI 的社区

现在注册

已注册用户请登录

OA0 › 代码 › Agnostic — 面向多模型与多后端的轻量 LLM 编排思路

Agnostic — 面向多模型与多后端的轻量 LLM 编排思路

anchor · 2026-05-09 11:00:19 · 59 次点击 · 0 条评论

🥤 RAGLite

RAGLite 是一个基于 DuckDB 或 PostgreSQL 的检索增强生成（RAG）Python 工具包。

功能特性

可配置

🧠 借助 LiteLLM 选择任意的 LLM 提供商，包括本地 llama-cpp-python 模型
💾 选择 DuckDB 或 PostgreSQL 作为关键词和向量搜索数据库
🥇 借助 rerankers 使用任何重排序器，默认包含多语言 FlashRank

快速且许可友好

❤️ 仅依赖轻量且许可友好的开源依赖（例如，无需 PyTorch 或 LangChain）
🚀 在 macOS 上支持 Metal 加速，在 Linux 和 Windows 上支持 CUDA 加速

功能无限制

📖 基于 pdftext 和 pypdfium2 实现 PDF 转 Markdown
🧬 使用延迟分块和上下文块标题实现多向量块嵌入
✏️ 通过解决二元整数规划问题，使用 wtpsplit-lite 实现最优句子分割
✂️ 通过解决二元整数规划问题实现最优语义分块
🔍 利用数据库的原生关键词和向量搜索（FTS+VSS；tsvector+pgvector）实现混合搜索
💭 自适应检索功能，LLM 可根据查询内容自行决定是否进行检索以及检索哪些内容
💰 通过提示缓存感知消息数组结构提升性价比并降低延迟
🍰 借助 Anthropic 的长上下文提示格式提升输出质量
🌀 通过求解正交 Procrustes 问题实现最优闭式线性查询适配器

可扩展

🔌 内置模型上下文协议（MCP）服务器，任何 MCP 客户端（如 Claude 桌面版）均可连接
💬 基于 Chainlit 的可选的可定制类 ChatGPT 前端，支持 Web、Slack 和 Teams
✍️ 借助 Pandoc 将任何输入文档转换为 Markdown（可选）
🔎 借助 Mistral OCR 实现高质量文档处理，支持 PDF、图片、DOCX 和 PPTX，并自动添加图片描述（可选）
✅ 借助 Ragas 评估检索和生成性能（可选）

安装

[!TIP]
🚀 如果你想使用本地模型，建议安装加速版 llama-cpp-python 预编译二进制文件，命令如下：
```sh

配置要安装的 llama-cpp-python 预编译二进制文件（⚠️ 并非所有组合都可用）：

LLAMA_CPP_PYTHON_VERSION=0.3.9
PYTHON_VERSION=310|311|312
ACCELERATOR=metal|cu121|cu122|cu123|cu124
PLATFORM=macosx_11_0_arm64|linux_x86_64|win_amd64

安装 llama-cpp-python：

pip install "https://github.com/abetlen/llama-cpp-python/releases/download/v$LLAMA_CPP_PYTHON_VERSION-$ACCELERATOR/llama_cpp_python-$LLAMA_CPP_PYTHON_VERSION-cp$PYTHON_VERSION-cp$PYTHON_VERSION-$PLATFORM.whl"
```

安装 RAGLite：

pip install raglite

如需增加可定制的类 ChatGPT 前端支持，请使用 chainlit 扩展：

pip install raglite[chainlit]

如需增加对 PDF 之外其他文件类型的支持，请使用 pandoc 扩展：

pip install raglite[pandoc]

如需增加基于 Mistral OCR 的高质量文档处理支持，请使用 mistral-ocr 扩展：

pip install raglite[mistral-ocr]

如需增加评估支持，请使用 ragas 扩展：

pip install raglite[ragas]

使用方法

1. 配置 RAGLite

[!TIP]
🧠 RAGLite 通过 llama-cpp-python 扩展了 LiteLLM，以支持 llama.cpp 模型。要选择 llama.cpp 模型（例如来自 Unsloth 的模型库），请使用 "llama-cpp-python/<hugging_face_repo_id>/<filename>@<n_ctx>" 格式的模型标识符，其中 n_ctx 是可选参数，用于指定模型的上下文大小。

[!TIP]
💾 你可以在 neon.tech 上通过几次点击创建一个 PostgreSQL 数据库。

首先，使用你偏好的 DuckDB 或 PostgreSQL 数据库以及 LiteLLM 支持的任何 LLM 来配置 RAGLite：

from raglite import RAGLiteConfig

# '远程'配置示例：PostgreSQL 数据库 + OpenAI LLM：
my_config = RAGLiteConfig(
    db_url="postgresql://my_username:my_password@my_host:5432/my_database",
    llm="gpt-4o-mini",  # 或任何 LiteLLM 支持的 LLM
    embedder="text-embedding-3-large",  # 或任何 LiteLLM 支持的嵌入模型
)

# '本地'配置示例：DuckDB 数据库 + llama.cpp LLM：
my_config = RAGLiteConfig(
    db_url="duckdb:///raglite.db",
    llm="llama-cpp-python/unsloth/Qwen3-8B-GGUF/*Q4_K_M.gguf@8192",
    embedder="llama-cpp-python/lm-kit/bge-m3-gguf/*F16.gguf@512", # 超过 512 个 token 会降低 bge-m3 的性能
)

你还可以配置 rerankers 支持的任何重排序器：

from rerankers import Reranker

# 示例：基于远程 API 的重排序器：
my_config = RAGLiteConfig(
    db_url="postgresql://my_username:my_password@my_host:5432/my_database"
    reranker=Reranker("rerank-v3.5", model_type="cohere", api_key=COHERE_API_KEY, verbose=0)  # 多语言
)

# 示例：按语言配置的本地交叉编码器重排序器（此为默认配置）：
my_config = RAGLiteConfig(
    db_url="duckdb:///raglite.db",
    reranker={
        "en": Reranker("ms-marco-MiniLM-L-12-v2", model_type="flashrank", verbose=0),  # 英语
        "other": Reranker("ms-marco-MultiBERT-L-12", model_type="flashrank", verbose=0),  # 其他语言
    }
)

也支持自查询（self-query），允许 LLM 根据用户输入自动生成并应用元数据过滤器以优化搜索结果。要启用自查询，请在 RAGLiteConfig 中设置 self_query=True：

my_config = RAGLiteConfig(
    db_url="duckdb:///raglite.db",
    llm="gpt-4o-mini",
    embedder="text-embedding-3-large",
    self_query=True,  # 启用自查询
)

2. 插入文档

[!TIP]
✍️ 要插入 PDF 以外的文档，请安装 pandoc 扩展：pip install raglite[pandoc]。

[!TIP]
🔎 如需更高质量的文档处理（含自动图像描述），请安装 mistral-ocr 扩展：pip install raglite[mistral-ocr]，并按如下方式配置：
```python
from raglite import RAGLiteConfig, MistralOCRConfig

my_config = RAGLiteConfig(
document_processor=MistralOCRConfig(
include_image_descriptions=True, # 将图片、图表、图示描述为文本
image_types=frozenset({"chart", "diagram", "photo", "table", "logo", "icon"}), # 自定义图像类别
exclude_image_types=frozenset({"logo", "icon"}), # 从输出中过滤掉特定类型
),
)
``image_types参数定义了 Mistral 对每张图片进行分类的类别——你可以使用默认值，也可以提供你自己的领域特定类型。使用exclude_image_types` 可过滤掉对检索无用的分类类型。

接下来，向数据库插入一些文档。RAGLite 将负责转换为 Markdown、最优的 4 级语义分块以及使用延迟分块的多向量嵌入：

# 通过文件路径插入文档
from pathlib import Path
from raglite import Document, insert_documents

documents = [
    Document.from_path(Path("On the Measure of Intelligence.pdf")),
    Document.from_path(Path("Special Relativity.pdf")),
]
insert_documents(documents, config=my_config)

# 通过 text/plain 或 text/markdown 内容插入文档
content = """
# 论动体的电动力学
## 作者：A. 爱因斯坦  1905年6月30日
众所周知，麦克斯韦...
"""
documents = [
    Document.from_text(content, author="Einstein", topic="physics", year=1905)
]
insert_documents(documents, config=my_config)

[!TIP]
📝 文档可以通过向 Document.from_text() 或 Document.from_path() 传递关键字参数来包含元数据。这些元数据稍后可在检索过程中用于过滤。
对于列表值，元数据会原样存储（例如 domain=["open", "music"]）。

你可能还希望在插入前扩展文档元数据：

from typing import Annotated
from pydantic import Field
from raglite import expand_document_metadata

# 扩展文档的元数据。
metadata_fields = {
    "title": Annotated[str, Field(..., description="文档标题。")],
    "author": Annotated[str, Field(..., description="主要作者。")],
    "topics": Annotated[list[Literal["A", "B", "C"]], Field(..., description="关键主题。")],
}
documents = list(expand_document_metadata(documents, metadata_fields, config=my_config))

# 通过 text/plain 或 text/markdown 内容插入文档
insert_documents(documents, config=my_config)

3. 检索增强生成（RAG）

3.1 自适应 RAG

现在你可以运行一个自适应 RAG 流程，该流程包括将用户提示添加到消息历史记录中并流式输出 LLM 响应：

from raglite import rag

# 创建一条用户消息
messages = []  # 或从现有消息历史开始
messages.append({
    "role": "user",
    "content": "如何衡量智能？"
})

# 自适应地决定是否检索，然后流式输出响应
chunk_spans = []
stream = rag(messages, on_retrieval=lambda x: chunk_spans.extend(x), config=my_config)
for update in stream:
    print(update, end="")

# 访问 RAG 上下文中引用的文档
documents = [chunk_span.document for chunk_span in chunk_spans]

LLM 将根据用户提示的复杂度自适应地决定是否检索信息。如果需要检索，LLM 会生成搜索查询，RAGLite 将应用混合搜索和重排序来检索最相关的块跨度（每个块跨度是连续块的列表）。检索结果通过 on_retrieval 回调发送，并作为函数调用的输出附加到消息历史记录中。最后，助手响应被流式输出并附加到消息历史记录中。

3.2 可编程 RAG

如果你需要手动控制 RAG 流程，可以运行一个基础但强大的流程，该流程包括使用混合搜索和重排序检索最相关的块跨度，将用户提示转换为 RAG 指令并附加到消息历史记录，最后生成 RAG 响应：

from raglite import add_context, rag, retrieve_context, vector_search

# 选择一种搜索方法
from dataclasses import replace
my_config = replace(my_config, search_method=vector_search)  # 或 `hybrid_search`、`search_and_rerank_chunks` 等

# 使用配置的搜索方法检索相关的块跨度
user_prompt = "如何衡量智能？"
chunk_spans = retrieve_context(
    query=user_prompt, 
    num_chunks=5, 
    metadata_filter={"author": "Einstein"},  # 可选：按元数据过滤
    config=my_config
)

# 基于用户提示和上下文将 RAG 指令附加到消息历史记录
messages = []  # 或从现有消息历史开始
messages.append(add_context(user_prompt=user_prompt, context=chunk_spans, config=my_config))

# 流式输出 RAG 响应并附加到消息历史记录
stream = rag(messages, config=my_config)
for update in stream:
    print(update, end="")

# 访问 RAG 上下文中引用的文档
documents = [chunk_span.document for chunk_span in chunk_spans]

[!TIP]
🥇 重排序可以显著提高 RAG 应用的输出质量。要将重排序添加到您的应用：首先搜索更大范围的 20 个相关块，然后使用 rerankers 重排序器对它们进行重排序，最后保留前 5 个块。

RAGLite 还提供了对完整 RAG 流程各个步骤的更高级控制：

使用关键词、向量或混合搜索搜索相关块
从数据库中检索块
重排序块并选择前 5 个结果
用相邻块扩展块并将它们分组到块跨度中
将用户提示转换为 RAG 指令并附加到消息历史记录
将 LLM 响应流式输出到消息历史记录
访问来自块跨度的引用文档

使用 RAGLite 实现完整的 RAG 流程非常简单：

# 搜索块
from raglite import hybrid_search, keyword_search, vector_search

user_prompt = "如何衡量智能？"
chunk_ids_vector, _ = vector_search(user_prompt, num_results=20, config=my_config)
chunk_ids_keyword, _ = keyword_search(user_prompt, num_results=20, config=my_config)
chunk_ids_hybrid, _ = hybrid_search(
    user_prompt, num_results=20, metadata_filter={"topic": "physics"}, config=my_config
)  # 过滤结果，仅包含来自 topic="physics" 文档的块（适用于任何搜索方法）

# 同一字段的多值过滤使用 OR 语义：
chunk_ids_or, _ = hybrid_search(
    user_prompt,
    num_results=20,
    metadata_filter={"domain": ["open", "music"]},
    config=my_config,
)  # 返回 domain 包含 "open" OR "music" 的块。

# 检索块
from raglite import retrieve_chunks

chunks_hybrid = retrieve_chunks(chunk_ids_hybrid, config=my_config)

# 重排序块并保留前 5 个（可选，但推荐）
from raglite import rerank_chunks

chunks_reranked = rerank_chunks(user_prompt, chunks_hybrid, config=my_config)
chunks_reranked = chunks_reranked[:5]

# 用相邻块扩展块并将它们分组到块跨度中
from raglite import retrieve_chunk_spans

chunk_spans = retrieve_chunk_spans(chunks_reranked, config=my_config)

# 基于用户提示和上下文将 RAG 指令附加到消息历史记录
from raglite import add_context

messages = []  # 或从现有消息历史开始
messages.append(add_context(user_prompt=user_prompt, context=chunk_spans, config=my_config))

# 流式输出 RAG 响应并附加到消息历史记录
from raglite import rag

stream = rag(messages, config=my_config)
for update in stream:
    print(update, end="")

# 访问 RAG 上下文中引用的文档
documents = [chunk_span.document for chunk_span in chunk_spans]

4. 计算并使用最优查询适配器

RAGLite 可以计算并将一个最优闭式查询适配器应用于提示嵌入，以提高 RAG 的输出质量。要享受此功能，首先使用 insert_evals 生成一组评估数据，然后使用 update_query_adapter 计算并存储最优查询适配器：

# 使用最优查询适配器改进 RAG
from raglite import insert_evals, update_query_adapter

insert_evals(num_evals=100, config=my_config)
update_query_adapter(config=my_config)  # 从现在起，每次向量搜索都会使用查询适配器

5. 检索与生成评估

如果你安装了 ragas 扩展，可以使用 RAGLite 来回答评估问题，然后使用 Ragas 评估 RAG 的检索和生成步骤的质量：

# 评估检索和生成
from raglite import answer_evals, evaluate, insert_evals

insert_evals(num_evals=100, config=my_config)
answered_evals_df = answer_evals(num_evals=10, config=my_config)
evaluation_df = evaluate(answered_evals_df, config=my_config)

6. 运行模型上下文协议（MCP）服务器

RAGLite 附带一个基于 FastMCP 实现的 MCP 服务器，它暴露了一个 search_knowledge_base 工具。要使用该服务器：

安装 Claude 桌面版
安装 uv，以便 Claude 桌面版可以启动服务器
配置 Claude 桌面版使用 uv 来启动 MCP 服务器：

raglite \
    --db-url duckdb:///raglite.db \
    --llm llama-cpp-python/unsloth/Qwen3-4B-GGUF/*Q4_K_M.gguf@8192 \
    --embedder llama-cpp-python/lm-kit/bge-m3-gguf/*F16.gguf@512 \
    mcp install

要使用基于 API 的 LLM，请确保将您的凭据包含在 .env 文件中或内联提供：

export OPENAI_API_KEY=sk-...
raglite \
    --llm gpt-4o-mini \
    --embedder text-embedding-3-large \
    mcp install

现在，当您启动 Claude 桌面版时，应在提示符的右下角看到一个 🔨 图标，表示 Claude 已成功连接到 MCP 服务器。

在相关时，Claude 会建议使用 MCP 服务器提供的 search_knowledge_base 工具。您也可以明确要求 Claude 搜索知识库，以确保它执行搜索操作。

7. 提供可定制的类 ChatGPT 前端服务

如果你安装了 chainlit 扩展，可以通过以下命令提供一个可定制的类 ChatGPT 前端：

raglite chainlit

该应用还可以部署到 Web、Slack 和 Teams。

您可以直接在 Chainlit 前端指定数据库 URL、LLM 和嵌入模型，或者通过 CLI 指定如下：

raglite \
    --db-url duckdb:///raglite.db \
    --llm llama-cpp-python/unsloth/Qwen3-4B-GGUF/*Q4_K_M.gguf@8192 \
    --embedder llama-cpp-python/lm-kit/bge-m3-gguf/*F16.gguf@512 \
    chainlit

要使用基于 API 的 LLM，请确保将您的凭据包含在 .env 文件中或内联提供：

OPENAI_API_KEY=sk-... raglite --llm gpt-4o-mini --embedder text-embedding-3-large chainlit

贡献

先决条件

1. [生成 SSH 密钥](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent#generating-a-new-ssh-key) 并 [将 SSH 密钥添加到您的 GitHub 帐户](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/adding-a-new-ssh-key-to-your-github-account)。 1. 配置 SSH 以自动加载您的 SSH 密钥： ```sh cat << EOF >> ~/.ssh/config Host * AddKeysToAgent yes IgnoreUnknown UseKeychain UseKeychain yes ForwardAgent yes EOF ``` 1. [安装 Docker Desktop](https://www.docker.com/get-started)。 1. [安装 VS Code](https://code.visualstudio.com/) 和 [VS Code 的 Dev Containers 扩展](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers)。或者，安装 [PyCharm](https://www.jetbrains.com/pycharm/download/)。 1. _可选：_ 安装 [Nerd Font](https://www.nerdfonts.com/font-downloads)（如 [FiraCode Nerd Font](https://github.com/ryanoasis/nerd-fonts/tree/master/patched-fonts/FiraCode)）并 [配置 VS Code](https://github.com/tonsky/FiraCode/wiki/VS-Code-Instructions) 或 [PyCharm](https://github.com/tonsky/FiraCode/wiki/Intellij-products-instructions) 来使用它。

开发环境

支持以下开发环境： 1. ⭐️ _GitHub Codespaces_：点击 [Open in GitHub Codespaces](https://github.com/codespaces/new/superlinear-ai/raglite) 在浏览器中开始开发。 1. ⭐️ _VS Code Dev Container（带容器卷）_：点击 [Open in Dev Containers](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/superlinear-ai/raglite) 将此仓库克隆到容器卷中，并使用 VS Code 创建 Dev Container。 1. ⭐️ _uv_：克隆此仓库，并在仓库根目录下运行以下命令： ```sh # 创建并安装虚拟环境 uv sync --python 3.10 --all-extras # 激活虚拟环境 source .venv/bin/activate # 安装 pre-commit hooks pre-commit install --install-hooks ``` 1. _VS Code Dev Container_：克隆此仓库，用 VS Code 打开它，然后按 Ctrl/⌘ + ⇧ + P → _Dev Containers: Reopen in Container_。 1. _PyCharm Dev Container_：克隆此仓库，用 PyCharm 打开它，[创建带有挂载源的 Dev Container](https://www.jetbrains.com/help/pycharm/start-dev-container-inside-ide.html)，并 [配置现有的 Python 解释器](https://www.jetbrains.com/help/pycharm/configuring-python-interpreter.html#widget) 为 `/opt/venv/bin/python`。

开发指南

- 本项目遵循 [Conventional Commits](https://www.conventionalcommits.org/) 标准，使用 [Commitizen](https://github.com/commitizen-tools/commitizen) 自动执行 [语义化版本控制](https://semver.org/) 和 [Keep A Changelog](https://keepachangelog.com/)。 - 在开发环境中运行 `poe` 以打印此项目可用的 [Poe the Poet](https://github.com/nat-n/poethepoet) 任务列表。 - 在开发环境中运行 `uv add {package}` 以安装运行时依赖并将其添加到 `pyproject.toml` 和 `uv.lock`。添加 `--dev` 以安装开发依赖。 - 在开发环境中运行 `uv sync --upgrade` 以将所有依赖升级到 `pyproject.toml` 允许的最新版本。添加 `--only-dev` 以仅升级开发依赖。 - 运行 `cz bump` 以提升包版本、更新 `CHANGELOG.md` 并创建 git 标签。然后使用 `git push origin main --tags` 推送更改和 git 标签。

Star 历史

项目地址：https://github.com/superlinear-ai/raglite

59 次点击 ∙ 0 人收藏

登录后收藏

0 条回复