OA0

OA0 是一个探索 AI 的社区

现在注册

已注册用户请登录

OA0 › 代码 › Marqo — 兼顾向量搜索与多模态检索的 AI 搜索引擎

Marqo — 兼顾向量搜索与多模态检索的 AI 搜索引擎

act · 2026-05-28 11:00:25 · 56 次点击 · 0 条评论

Marqo

面向人类的神经搜索。

一个基于深度学习的开源搜索引擎，可无缝集成到您的应用、网站和工作流程中。

快速开始

Marqo 需要 Docker。安装 Docker 请访问 https://docs.docker.com/get-docker/
使用 Docker 运行 Opensearch：

docker run -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" opensearchproject/opensearch:2.1.0

安装 Marqo 客户端：

pip install marqo

开始索引和搜索！以下是一个简单示例：

import marqo

mq = marqo.Client(url='https://localhost:9200', main_user="admin", main_password="admin")

mq.index("my-first-index").add_documents([
    {
        "Title": "The Travels of Marco Polo",
        "Description": "A 13th-century travelogue describing Polo's travels"
    }, 
    {
        "Title": "Extravehicular Mobility Unit (EMU)",
        "Description": "The EMU is a spacesuit that provides environmental protection, "
                       "mobility, life support, and communications for astronauts",
        "_id": "article_591"
    }]
)

results = mq.index("my-first-index").search(
    q="What is the best outfit to wear on the moon?"
)

mq 是封装了 marqo API 的客户端
add_documents() 接收一个文档列表（Python 字典形式）进行索引
add_documents() 会在索引不存在时，使用默认设置创建索引
您可以通过特殊的 _id 字段设置文档 ID，否则 Marqo 会自动生成
如果索引不存在，Marqo 会创建它；如果存在，则向索引添加文档

查看结果：

# 打印结果：
import pprint
pprint.pprint(results)

{
    'hits': [
        {   
            'Title': 'Extravehicular Mobility Unit (EMU)',
            'Description': 'The EMU is a spacesuit that provides environmental protection, mobility, life support, and' 
                           'communications for astronauts',
            '_highlights': {
                'Description': 'The EMU is a spacesuit that provides environmental protection, '
                               'mobility, life support, and communications for astronauts'
            },
            '_id': 'article_591',
            '_score': 1.2387788
        }, 
        {   
            'Title': 'The Travels of Marco Polo',
            'Description': "A 13th-century travelogue describing Polo's travels",
            '_highlights': {'Title': 'The Travels of Marco Polo'},
            '_id': 'e00d1a8d-894c-41a1-8e3b-d8b2a8fce12a',
            '_score': 1.2047464
        }
    ],
    'limit': 10,
    'processingTimeMs': 49,
    'query': 'What is the best outfit to wear on the moon?'
}

每个 hit 对应一个匹配搜索查询的文档
按匹配度从高到低排序
limit 是返回的最大命中数，可在搜索时设置
每个命中包含 _highlights 字段，表示文档中与查询最匹配的部分

其他基本操作

获取文档

通过文档 ID 检索文档。

result = mq.index("my-first-index").get_document(document_id="article_591")

注意：使用相同的 _id 再次调用 add_documents 会更新文档。

获取索引统计信息

获取索引的相关信息。

results = mq.index("my-first-index").get_stats()

词汇搜索

执行关键词搜索。

result = mq.index("my-first-index").search('marco polo', search_method=marqo.SearchMethods.LEXICAL)

搜索特定字段

使用默认的神经搜索方法。

result = mq.index("my-first-index").search('adventure', searchable_attributes=['Title'])

多模态与跨模态搜索

为了支持图像和文本搜索，Marqo 允许用户从 HuggingFace 插入并使用 CLIP 模型。注意，如果不配置多模态搜索，图像 URL 将被视为字符串。 要开始使用图像进行索引和搜索，请先创建一个带有 CLIP 配置的索引，如下所示：

settings = {
  "treat_urls_and_pointers_as_images":True,   # 允许我们找到图像文件并索引它
  "model":"ViT-B/32"
}
response = mq.create_index("my-multimodal-index", **settings)

然后可以在文档中添加图像，如下所示。您可以使用互联网上的 URL（例如 S3）或机器磁盘上的文件：

response = mq.index("my-multimodal-index").add_documents([{
    "My Image": "https://upload.wikimedia.org/wikipedia/commons/thumb/b/b3/Hipop%C3%B3tamo_%28Hippopotamus_amphibius%29%2C_parque_nacional_de_Chobe%2C_Botsuana%2C_2018-07-28%2C_DD_82.jpg/640px-Hipop%C3%B3tamo_%28Hippopotamus_amphibius%29%2C_parque_nacional_de_Chobe%2C_Botsuana%2C_2018-07-28%2C_DD_82.jpg",
    "Description": "The hippopotamus, also called the common hippopotamus or river hippopotamus, is a large semiaquatic mammal native to sub-Saharan Africa",
    "_id": "hippo-facts"
}])

您可以像往常一样使用文本进行搜索。文本和图像字段都会被搜索：

results = mq.index("my-multimodal-index").search('animal')

将 searchable_attributes 设置为图像字段 ['My Image']，可确保此索引中仅搜索图像：

results = mq.index("my-multimodal-index").search('animal', searchable_attributes=['My Image'])

使用图像进行搜索

通过提供图像链接即可使用图像进行搜索。

results = mq.index("my-multimodal-index").search('https://upload.wikimedia.org/wikipedia/commons/thumb/9/96/Standing_Hippopotamus_MET_DP248993.jpg/1920px-Standing_Hippopotamus_MET_DP248993.jpg')

删除索引

删除一个索引。

results = mq.index("my-first-index").delete()

删除文档

删除文档。

results = mq.index("my-first-index").delete_documents(ids=["article_591", "article_602"])

使用 GPU 的注意事项

根据 GPU 的类别，可能需要安装与最新 CUDA（>11.3）编译的 PyTorch 版本。
例如，如果出现类似以下错误：

NVIDIA #### with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA #### GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

请安装合适的 PyTorch 版本。例如，要安装 PyTorch 1.12 和 CUDA 11.6，请执行以下操作：

pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116 --upgrade

请注意，当前驱动程序支持的 CUDA 版本可通过终端中的以下命令获取：

$nvidia-smi

相应的 PyTorch 安装应使用不超过此版本的 CUDA。PyTorch 安装说明可在 https://pytorch.org/get-started/locally/ 找到，历史版本及不同的 CUDA 选项可在 https://pytorch.org/get-started/previous-versions/ 找到。

警告

请注意，您不应在 Marqo 的 Opensearch 集群上运行其他应用程序，因为 Marqo 会自动更改和调整集群的设置。

贡献者

Marqo 是一个社区项目，旨在让更广泛的开发者社区能够使用神经搜索。我们很高兴您有兴趣提供帮助！请阅读此文档开始贡献。

开发环境设置

创建虚拟环境：python -m venv ./venv
激活虚拟环境：source ./venv/bin/activate
从 requirements 文件安装依赖：pip install -r requirements.txt
通过运行 tox 文件执行测试：在此目录中执行 tox
如果更新了依赖，请确保删除 .tox 目录并重新运行

合并指南

运行完整的测试套件（在此目录中使用命令 tox）。
创建一个包含 GitHub issue 的拉取请求。

大数据测试将从 main 分支构建 Marqo，用数据填充索引，并针对这些数据执行测试查询。请参考 https://github.com/S2Search/NeuralSearchLargeDataTest。

支持

加入我们的 Slack 社区，与其他社区成员讨论想法。
Marqo 社区会议（即将推出！）

项目地址：https://github.com/marqo-ai/marqo

56 次点击 ∙ 0 人收藏

登录后收藏

0 条回复