MiniChain — 小而美的 LLM 应用开发实验框架

OA0

OA0 是一个探索 AI 的社区

现在注册

已注册用户请登录

一个用于大型语言模型编程的微型库。查看 MiniChain 动物园以了解其工作原理。

编程

代码 (math_demo.py)：通过注解 Python 函数来调用语言模型。

@prompt(OpenAI(), template_file="math.pmpt.tpl")
def math_prompt(model, question):
    "使用 Jinja 模板调用 GPT 的提示"
    return model(dict(question=question))

@prompt(Python(), template="import math\n{{code}}")
def python(model, code):
    "调用 Python 解释器的提示"
    code = "\n".join(code.strip().split("\n")[1:-1])
    return model(dict(code=code))

def math_demo(question):
    "将它们链接在一起"
    return python(math_prompt(question))

链 (Space)： MiniChain 会构建一个调用图（类似于 PyTorch），用于调试和错误处理。

show(math_demo,
     examples=["What is the sum of the powers of 3 (3^i) that are smaller than 100?",
               "What is the sum of the 10 first positive integers?"],
     subprompts=[math_prompt, python],
     out_type="markdown").queue().launch()

模板 (math.pmpt.tpl)：提示词与代码分离。

...
Question:
一件长袍需要 2 匹蓝色纤维和一半数量的白色纤维。总共需要多少匹？
Code:
2 + 2/2

Question:
{{question}}
Code:

安装

pip install minichain
export OPENAI_API_KEY="sk-***"

示例

这个库允许我们用几行代码实现几种流行的方法。

它支持以下后端。

OpenAI (补全 / 嵌入)
Hugging Face 🤗
Google 搜索
Python
Manifest-ML (AI21, Cohere, Together)
Bash

为什么选择 Mini-Chain？

目前有几个非常流行的提示链库，例如：LangChain、Promptify 和 GPTIndex。这些库很有用，但它们非常庞大和复杂。MiniChain 旨在以一个微型、易于理解的库实现核心的提示链功能。

教程

Mini-chain 基于将函数注解为提示。

@prompt(OpenAI())
def color_prompt(model, input):
    return model(f"Answer 'Yes' if this is a color, {input}. Answer:")

提示函数的行为类似于 Python 函数，但它们是惰性的，要访问结果你需要调用 run()。

if color_prompt("blue").run() == "Yes":
    print("It's a color")

或者，你可以将提示链接在一起。提示是惰性的，所以如果你想操作它们，需要在函数上添加 @transform()。例如：

@transform()
def said_yes(input):
    return input == "Yes"

@prompt(OpenAI())
def adjective_prompt(model, input):
    return model(f"Give an adjective to describe {input}. Answer:")

adjective = adjective_prompt("rainbow")
if said_yes(color_prompt(adjective)).run():
    print("It's a color")

我们还包含一个 template_file 参数，它假定模型使用 Jinja 语言的模板。这使我们能够将提示文本与 Python 代码分离。

@prompt(OpenAI(), template_file="math.pmpt.tpl")
def math_prompt(model, question):
    return model(dict(question=question))

可视化

MiniChain 使用 Gradio 内置了提示可视化系统。如果你构建了一个调用提示链的函数，可以通过调用 show 和 launch 来可视化它。这也可以在 notebook 中直接完成。

show(math_demo,
     examples=["What is the sum of the powers of 3 (3^i) that are smaller than 100?",
              "What is the sum of the 10 first positive integers?"],
     subprompts=[math_prompt, python],
     out_type="markdown").queue().launch()

记忆

MiniChain 没有内置显式的有状态记忆类。我们建议将其实现为一个队列。

这里有一个类，你可能觉得对跟踪响应有用。

@dataclass
class State:
    memory: List[Tuple[str, str]]
    human_input: str = ""

    def push(self, response: str) -> "State":
        memory = self.memory if len(self.memory) < MEMORY_LIMIT else self.memory[1:]
        return State(memory + [(self.human_input, response)])

查看完整的聊天示例。它跟踪最近看到的两个响应。

工具和智能体

MiniChain 不提供 agents 或 tools。如果你想要这个功能，可以使用模型的 tool_num 参数，它允许你从多个不同的可能后端中进行选择。很容易添加你自己的新后端（参见 GradioExample）。

@prompt([Python(), Bash()])
def math_prompt(model, input, lang):
    return model(input, tool_num= 0 if lang == "python" else 1)

文档和嵌入

MiniChain 不管理文档和嵌入。我们建议使用 Hugging Face Datasets 库及其内置的 FAISS 索引。

以下是实现。

# 加载并索引数据集
olympics = datasets.load_from_disk("olympics.data")
olympics.add_faiss_index("embeddings")

@prompt(OpenAIEmbed())
def get_neighbors(model, inp, k):
    embedding = model(inp)
    res = olympics.get_nearest_examples("embeddings", np.array(embedding), k)
    return res.examples["content"]

这会创建一个 K-最近邻 (KNN) 提示，根据所提问题的嵌入查找最接近的 3 个文档。查看完整的检索增强问答示例。

我们建议使用数据集库的批量映射功能离线创建这些嵌入。

def embed(x):
    emb = openai.Embedding.create(input=x["content"], engine=EMBEDDING_MODEL)
    return {"embeddings": [np.array(emb['data'][i]['embedding'])
                           for i in range(len(emb["data"]))]}
x = dataset.map(embed, batch_size=BATCH_SIZE, batched=True)
x.save_to_disk("olympics.data")

还有其他方法可以做到这一点，例如 sqllite 或 Weaviate。

类型化提示

MiniChain 可以自动为你生成提示头，旨在确保输出遵循给定的类型规范。例如，如果你运行以下代码，MiniChain 将生成一个返回 Player 对象列表的提示。

class StatType(Enum):
    POINTS = 1
    REBOUNDS = 2
    ASSISTS = 3

@dataclass
class Stat:
    value: int
    stat: StatType

@dataclass
class Player:
    player: str
    stats: List[Stat]


@prompt(OpenAI(), template_file="stats.pmpt.tpl", parser="json")
def stats(model, passage):
    out = model(dict(passage=passage, typ=type_to_prompt(Player)))
    return [Player(**j) for j in out]

具体来说，它会为你的模板提供一个字符串 typ 供你使用。对于这个例子，字符串将具有以下形式：

You are a highly intelligent and accurate information extraction system. You take passage as input and your task is to find parts of the passage to answer questions.

You need to output a list of JSON encoded values

You need to classify in to the following types for key: "color":

RED
GREEN
BLUE


Only select from the above list, or "Other".⏎


You need to classify in to the following types for key: "object":⏎

String



You need to classify in to the following types for key: "explanation":

String

[{ "color" : "color" ,  "object" : "object" ,  "explanation" : "explanation"}, ...]

Make sure every output is exactly seen in the document. Find as many as you can.

然后，这将自动为你转换为一个对象。

项目地址：https://github.com/srush/MiniChain

75 次点击 ∙ 0 人收藏

登录后收藏

0 条回复