LiveKit Agents — 实时语音与多模态交互 Agent 框架

OA0

OA0 是一个探索 AI 的社区

现在注册

已注册用户请登录

The LiveKit icon, the name of the repository and some sample code in the background.

PyPI - Version

寻找 JS/TS 库？请查看 AgentsJS

Agents 是什么？

Agent Framework 专为构建运行在服务器上的实时、可编程参与者而设计。使用它可以创建能够看、听和理解的多模态语音对话代理。

特性

灵活的集成：一个全面的生态系统，可以混合搭配适合您用例的 STT、LLM、TTS 和实时 API。
集成的任务调度：内置的任务调度和分发功能，通过 dispatch APIs 将终端用户连接到代理。
广泛的 WebRTC 客户端：使用 LiveKit 的开源 SDK 生态系统构建客户端应用程序，支持所有主要平台。
电话集成：与 LiveKit 的电话栈无缝协作，允许您的代理拨打或接听电话。
与客户端交换数据：使用 RPCs 和其他数据 API 与客户端无缝交换数据。
语义轮次检测：使用 Transformer 模型检测用户何时结束发言，有助于减少打断。
MCP 支持：原生支持 MCP。通过一行代码集成 MCP 服务器提供的工具。
内置测试框架：编写测试并使用评判器来确保您的代理按预期运行。
开源：完全开源，允许您在自己的服务器上运行整个技术栈，包括 LiveKit server，这是使用最广泛的 WebRTC 媒体服务器之一。

安装

要安装核心 Agents 库以及流行模型提供商的插件：

pip install "livekit-agents[openai,silero,deepgram,cartesia,turn-detector]~=1.4"

文档和指南

关于框架及其使用方法的文档可以在这里找到。

使用 AI 编程助手进行开发

如果您正在使用 AI 编程助手来构建 LiveKit Agents，我们推荐以下设置以获得最佳效果：

安装 LiveKit Docs MCP server — 让您的编程助手能够访问最新的 LiveKit 文档、跨 LiveKit 仓库的代码搜索以及工作示例。
安装 LiveKit Agent Skill — 为您的编程助手提供构建语音 AI 应用程序的架构指导和最佳实践，包括工作流设计、交接、任务和测试模式。

shell npx skills add livekit/agent-skills --skill livekit-agents

Agent Skill 与 MCP 服务器配合使用效果最佳：该技能教您的助手如何着手使用 LiveKit 进行构建，而 MCP 服务器则提供最新的 API 细节以正确实现。

核心概念

Agent：一个具有定义指令的基于 LLM 的应用程序。
AgentSession：一个管理代理与终端用户交互的容器。
entrypoint：交互会话的起点，类似于 Web 服务器中的请求处理器。
AgentServer：协调任务调度并为用户会话启动代理的主进程。

使用方法

简单的语音代理

from livekit.agents import (
    Agent,
    AgentServer,
    AgentSession,
    JobContext,
    RunContext,
    cli,
    function_tool,
    inference,
)
from livekit.plugins import silero


@function_tool
async def lookup_weather(
    context: RunContext,
    location: str,
):
    """Used to look up weather information."""

    return {"weather": "sunny", "temperature": 70}


server = AgentServer()


@server.rtc_session()
async def entrypoint(ctx: JobContext):
    session = AgentSession(
        vad=silero.VAD.load(),
        # 可以使用 STT、LLM、TTS 或实时 API 的任意组合
        # 此示例展示了 LiveKit Inference，这是一个通过 LiveKit Cloud 访问不同模型的统一 API
        # 要直接使用模型提供商的密钥，请替换为以下内容：
        # from livekit.plugins import deepgram, openai, cartesia
        # stt=deepgram.STT(model="nova-3"),
        # llm=openai.LLM(model="gpt-4.1-mini"),
        # tts=cartesia.TTS(model="sonic-3", voice="9626c31c-bec5-4cca-baa8-f8ba9e84c8bc"),
        stt=inference.STT("deepgram/nova-3", language="multi"),
        llm=inference.LLM("openai/gpt-4.1-mini"),
        tts=inference.TTS("cartesia/sonic-3", voice="9626c31c-bec5-4cca-baa8-f8ba9e84c8bc"),
    )

    agent = Agent(
        instructions="You are a friendly voice assistant built by LiveKit.",
        tools=[lookup_weather],
    )

    await session.start(agent=agent, room=ctx.room)
    await session.generate_reply(instructions="greet the user and ask about their day")


if __name__ == "__main__":
    cli.run_app(server)

此示例需要以下环境变量：

LIVEKIT_URL
LIVEKIT_API_KEY
LIVEKIT_API_SECRET

多代理交接

此代码片段经过缩写。完整示例请参见 multi_agent.py

...
class IntroAgent(Agent):
    def __init__(self) -> None:
        super().__init__(
            instructions=f"You are a story teller. Your goal is to gather a few pieces of information from the user to make the story personalized and engaging."
            "Ask the user for their name and where they are from"
        )

    async def on_enter(self):
        self.session.generate_reply(instructions="greet the user and gather information")

    @function_tool
    async def information_gathered(
        self,
        context: RunContext,
        name: str,
        location: str,
    ):
        """Called when the user has provided the information needed to make the story personalized and engaging.

        Args:
            name: The name of the user
            location: The location of the user
        """

        context.userdata.name = name
        context.userdata.location = location

        story_agent = StoryAgent(name, location)
        return story_agent, "Let's start the story!"


class StoryAgent(Agent):
    def __init__(self, name: str, location: str) -> None:
        super().__init__(
            instructions=f"You are a storyteller. Use the user's information in order to make the story personalized."
            f"The user's name is {name}, from {location}"
            # 覆盖默认模型，从标准 LLM 切换到实时 API
            llm=openai.realtime.RealtimeModel(voice="echo"),
            chat_ctx=chat_ctx,
        )

    async def on_enter(self):
        self.session.generate_reply()


@server.rtc_session()
async def entrypoint(ctx: JobContext):
    userdata = StoryData()
    session = AgentSession[StoryData](
        vad=silero.VAD.load(),
        stt="deepgram/nova-3",
        llm="openai/gpt-4.1-mini",
        tts="cartesia/sonic-3:9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",
        userdata=userdata,
    )

    await session.start(
        agent=IntroAgent(),
        room=ctx.room,
    )
...

测试

自动化测试对于构建可靠的代理至关重要，尤其是在 LLM 行为不确定的情况下。LiveKit Agents 包含原生测试集成，帮助您创建可靠的代理。

@pytest.mark.asyncio
async def test_no_availability() -> None:
    llm = google.LLM()
    async AgentSession(llm=llm) as sess:
        await sess.start(MyAgent())
        result = await sess.run(
            user_input="Hello, I need to place an order."
        )
        result.expect.skip_next_event_if(type="message", role="assistant")
        result.expect.next_event().is_function_call(name="start_order")
        result.expect.next_event().is_function_call_output()
        await (
            result.expect.next_event()
            .is_message(role="assistant")
            .judge(llm, intent="assistant should be asking the user what they would like")
        )

示例

更多示例和详细设置说明，请参见 examples 目录。更多示例，请参见 python-agents-examples 仓库。

🎙️ 入门代理

一个针对语音对话优化的入门代理。

代码

🔄 多用户按键通话

通过按键通话响应房间内的多个用户。

代码

🎵 背景音频

背景环境音和思考音效，以提高真实感。

代码

🛠️ 动态工具创建

动态创建函数工具。

代码

☎️ 外呼呼叫者

拨打外呼电话的代理

代码

📋 结构化输出

使用 LLM 的结构化输出来指导 TTS 语调。

代码

🔌 MCP 支持

使用来自 MCP 服务器的工具

代码

💬 纯文本代理

完全跳过语音，使用相同的代码进行纯文本集成

代码

📝 多用户转录器

生成房间内所有用户的转录文本

代码

🎥 视频化身

使用 Tavus、Hedra、Bithuman、LemonSlice 等添加 AI 化身

代码

🍽️ 餐厅点餐和预订

处理餐厅来电的代理完整示例。

代码

👁️ Gemini Live 视觉

能够“看见”的 Gemini Live 代理完整示例（包括 iOS 应用）。

代码

运行您的代理

在终端中测试

python myagent.py console

在终端模式下运行您的代理，启用本地音频输入和输出进行测试。
此模式不需要外部服务器或依赖项，对于快速验证行为非常有用。

使用 LiveKit 客户端进行开发

python myagent.py dev

启动代理服务器，并在文件更改时启用热重载。此模式允许每个进程高效地托管多个并发代理。

代理连接到 LiveKit Cloud 或您自托管的服务器。设置以下环境变量：
- LIVEKIT_URL
- LIVEKIT_API_KEY
- LIVEKIT_API_SECRET

您可以使用任何 LiveKit 客户端 SDK 或电话集成进行连接。
要快速入门，请尝试 Agents Playground。

生产环境运行

python myagent.py start

以生产就绪的优化方式运行代理。

贡献

Agents 框架在一个快速发展的领域中处于积极开发阶段。我们欢迎并感谢任何形式的贡献，无论是反馈、错误修复、功能、新插件和工具，还是更好的文档。您可以在此仓库下提交问题，发起 PR，或在 LiveKit 社区中与我们交流。

开发设置

本项目使用 uv 进行包管理。要为开发安装依赖项：

uv sync --all-extras --dev

示例

本项目在 examples 目录中包含许多示例。要运行它们，请创建文件 examples/.env，其中包含 LiveKit Server 和任何必要的模型提供商的凭据（参见 examples/.env.example），然后运行：

uv run examples/voice_agents/basic_agent.py dev

更多信息，请参见 examples README。

测试

单元测试位于 tests 目录中，可以使用以下命令运行：

uv run pytest tests/test_tools.py

每个插件的集成测试需要各种 API 凭据，并在项目维护者提交的 PR 的 GitHub CI 中自动运行。详情请参见 tests workflow。

代码格式化

本项目使用 ruff 进行格式化和代码检查：

uv run ruff format
uv run ruff check --fix

文档

要使用 pdoc 在本地生成文档：

uv sync --all-extras --group docs
uv run --active pdoc --skip-errors --html --output-dir=docs livekit

LiveKit 生态系统
Agents SDKs	Python · Node.js
LiveKit SDKs	浏览器 · Swift · Android · Flutter · React Native · Rust · Node.js · Python · Unity · Unity (WebGL) · ESP32 · C++
入门应用	Python 代理 ·

项目地址：https://github.com/livekit/agents

76 次点击 ∙ 0 人收藏

登录后收藏

0 条回复