OA0

OA0 是一个探索 AI 的社区

现在注册

已注册用户请登录

⚡️ FastEmbed 是什么？

FastEmbed 是一个专为生成嵌入向量而构建的轻量级、快速的 Python 库。我们支持流行的文本模型。如果您希望我们添加新模型，请提交 GitHub Issue。

默认的文本嵌入（TextEmbedding）模型是 MTEB 排行榜中展示的 Flag Embedding。它支持为输入文本添加 "query" 和 "passage" 前缀。这里有一个检索嵌入生成示例以及如何在 Qdrant 中使用 FastEmbed。

📈 为什么选择 FastEmbed？

轻量：FastEmbed 是一个轻量级库，外部依赖极少。我们不需要 GPU，也不下载 GB 级别的 PyTorch 依赖，而是使用 ONNX Runtime。这使其成为 AWS Lambda 等无服务器运行时的理想选择。
快速：FastEmbed 为速度而设计。我们使用比 PyTorch 更快的 ONNX Runtime。同时，我们使用数据并行技术来编码大型数据集。
准确：FastEmbed 优于 OpenAI Ada-002。我们还支持不断扩展的模型集，包括一些多语言模型。

🚀 安装

要安装 FastEmbed 库，使用 pip 是最佳方式。您可以选择安装带或不带 GPU 支持的版本：

pip install fastembed

# 或安装带 GPU 支持的版本

pip install fastembed-gpu

📖 快速开始

from fastembed import TextEmbedding


# 文档列表示例
documents: list[str] = [
    "这个库旨在比其他嵌入库（例如 Transformers、Sentence-Transformers 等）更快、更轻量。",
    "fastembed 由 Qdrant 提供支持并维护。",
]

# 这将触发模型下载和初始化
embedding_model = TextEmbedding()
print("模型 BAAI/bge-small-en-v1.5 已准备就绪。")

embeddings_generator = embedding_model.embed(documents)  # 注意这是一个生成器
embeddings_list = list(embedding_model.embed(documents))
  # 您也可以将生成器转换为列表，再转换为 numpy 数组
len(embeddings_list[0]) # 384 维向量

Fastembed 支持多种用于不同任务和模态的模型。
所有可用模型的列表可以在这里找到。

🎒 稠密文本嵌入

from fastembed import TextEmbedding

model = TextEmbedding(model_name="BAAI/bge-small-en-v1.5")
embeddings = list(model.embed(documents))

# [
#   array([-0.1115,  0.0097,  0.0052,  0.0195, ...], dtype=float32),
#   array([-0.1019,  0.0635, -0.0332,  0.0522, ...], dtype=float32)
# ]

稠密文本嵌入也可以扩展到不在支持模型列表中的模型。

from fastembed import TextEmbedding
from fastembed.common.model_description import PoolingType, ModelSource

TextEmbedding.add_custom_model(
    model="intfloat/multilingual-e5-small",
    pooling=PoolingType.MEAN,
    normalization=True,
    sources=ModelSource(hf="intfloat/multilingual-e5-small"),  # 可以使用 `url` 从私有存储加载文件
    dim=384,
    model_file="onnx/model.onnx",  # 可用于加载已支持模型的其他优化或量化版本，例如 onnx/model_O4.onnx
)
model = TextEmbedding(model_name="intfloat/multilingual-e5-small")
embeddings = list(model.embed(documents))

🔱 稀疏文本嵌入

SPLADE++

from fastembed import SparseTextEmbedding

model = SparseTextEmbedding(model_name="prithivida/Splade_PP_en_v1")
embeddings = list(model.embed(documents))

# [
#   SparseEmbedding(indices=[ 17, 123, 919, ... ], values=[0.71, 0.22, 0.39, ...]),
#   SparseEmbedding(indices=[ 38,  12,  91, ... ], values=[0.11, 0.22, 0.39, ...])
# ]

🦥 延迟交互模型（又称 ColBERT）

from fastembed import LateInteractionTextEmbedding

model = LateInteractionTextEmbedding(model_name="colbert-ir/colbertv2.0")
embeddings = list(model.embed(documents))

# [
#   array([
#       [-0.1115,  0.0097,  0.0052,  0.0195, ...],
#       [-0.1019,  0.0635, -0.0332,  0.0522, ...],
#   ]),
#   array([
#       [-0.9019,  0.0335, -0.0032,  0.0991, ...],
#       [-0.2115,  0.8097,  0.1052,  0.0195, ...],
#   ]),  
# ]

🖼️ 图像嵌入

from fastembed import ImageEmbedding

images = [
    "./path/to/image1.jpg",
    "./path/to/image2.jpg",
]

model = ImageEmbedding(model_name="Qdrant/clip-ViT-B-32-vision")
embeddings = list(model.embed(images))

# [
#   array([-0.1115,  0.0097,  0.0052,  0.0195, ...], dtype=float32),
#   array([-0.1019,  0.0635, -0.0332,  0.0522, ...], dtype=float32)
# ]

延迟交互多模态模型（ColPali）

from fastembed import LateInteractionMultimodalEmbedding

doc_images = [
    "./path/to/qdrant_pdf_doc_1_screenshot.jpg",
    "./path/to/colpali_pdf_doc_2_screenshot.jpg",
]

query = "什么是 Qdrant？"

model = LateInteractionMultimodalEmbedding(model_name="Qdrant/colpali-v1.3-fp16")
doc_images_embeddings = list(model.embed_image(doc_images))
# 形状 (2, 1030, 128)
# [array([[-0.03353882, -0.02090454, ..., -0.15576172, -0.07678223]], dtype=float32)]
query_embedding = model.embed_text(query)
# 形状 (1, 20, 128)
# [array([[-0.00218201,  0.14758301, ...,  -0.02207947,  0.16833496]], dtype=float32)]

🔄 重排序器

from fastembed.rerank.cross_encoder import TextCrossEncoder

query = "谁在维护 Qdrant？"
documents: list[str] = [
    "这个库旨在比其他嵌入库（例如 Transformers、Sentence-Transformers 等）更快、更轻量。",
    "fastembed 由 Qdrant 提供支持并维护。",
]
encoder = TextCrossEncoder(model_name="Xenova/ms-marco-MiniLM-L-6-v2")
scores = list(encoder.rerank(query, documents))

# [-11.48061752319336, 5.472434997558594]

文本交叉编码器也可以扩展到不在支持模型列表中的模型。

from fastembed.rerank.cross_encoder import TextCrossEncoder 
from fastembed.common.model_description import ModelSource

TextCrossEncoder.add_custom_model(
    model="Xenova/ms-marco-MiniLM-L-4-v2",
    model_file="onnx/model.onnx",
    sources=ModelSource(hf="Xenova/ms-marco-MiniLM-L-4-v2"),
)
model = TextCrossEncoder(model_name="Xenova/ms-marco-MiniLM-L-4-v2")
scores = list(model.rerank_pairs(
    [("什么是 AI？", "人工智能是..."), ("什么是 ML？", "机器学习是..."),]
))

⚡️ 在 GPU 上使用 FastEmbed

FastEmbed 支持在 GPU 设备上运行。
这需要安装 fastembed-gpu 包。

pip install fastembed-gpu

查看我们的示例以获取详细说明、CUDA 12.x 支持以及常见问题的故障排除。

from fastembed import TextEmbedding

embedding_model = TextEmbedding(
    model_name="BAAI/bge-small-en-v1.5", 
    providers=["CUDAExecutionProvider"]
)
print("模型 BAAI/bge-small-en-v1.5 已准备好在 GPU 上使用。")

与 Qdrant 一起使用

在 Python 中与 Qdrant Client 一起安装：

pip install qdrant-client[fastembed]

或

pip install qdrant-client[fastembed-gpu]

在 zsh 上，您可能需要使用引号：pip install 'qdrant-client[fastembed]'。

from qdrant_client import QdrantClient, models

# 初始化客户端
client = QdrantClient("localhost", port=6333) # 用于生产环境
# client = QdrantClient(":memory:") # 用于实验

model_name = "sentence-transformers/all-MiniLM-L6-v2"
payload = [
    {"document": "Qdrant 有 Langchain 集成", "source": "Langchain-docs", },
    {"document": "Qdrant 也有 Llama Index 集成", "source": "LlamaIndex-docs"},
]
docs = [models.Document(text=data["document"], model=model_name) for data in payload]
ids = [42, 2]

client.create_collection(
    "demo_collection",
    vectors_config=models.VectorParams(
        size=client.get_embedding_size(model_name), distance=models.Distance.COSINE)
)

client.upload_collection(
    collection_name="demo_collection",
    vectors=docs,
    ids=ids,
    payload=payload,
)

search_result = client.query_points(
    collection_name="demo_collection",
    query=models.Document(text="这是一个查询文档", model=model_name)
).points
print(search_result)

项目地址：https://github.com/qdrant/fastembed

68 次点击 ∙ 0 人收藏

登录后收藏

0 条回复

FastEmbed — 轻量高效的文本向量嵌入工具