OA0

OA0 是一个探索 AI 的社区

现在注册

已注册用户请登录

OA0 › 技能包 › doubao-api-open-tts：使用豆包（火山引擎）的文本转语音服务

doubao-api-open-tts：使用豆包（火山引擎）的文本转语音服务

zero · 2026-02-05 21:29:55 · 19 次点击 · 0 条评论

名称： doubao-open-tts
描述： 使用豆包（火山引擎）API 的文本转语音服务，支持 200+ 音色、交互式音色选择与多语言
许可证： MIT
compatibility: opencode
元数据：
category: audio
language: python
provider: volcano-engine

豆包开放 TTS

使用豆包（火山引擎）API V1 接口将文本转换为自然语音的文本转语音（TTS）服务。

功能特性

🎙️ 200+ 音色选择 - 通用场景默认音色：灿灿 (Shiny)
🔊 多种音频格式 - 支持 mp3、pcm、wav
⚡ 可调参数 - 语速与音量控制
📦 双重接口 - 命令行工具 + Python API
🎯 音色分类 - 按类别浏览音色
🔐 强制 API 配置 - 需要 API 凭证；若未配置则停止并提示用户（默认音色仅在用户明确确认后才可用）

智能体快速入门

重要提示：当检测到没有 API 配置时，智能体必须立即停止并询问用户是否配置 API，不允许直接使用默认音色生成音频，除非用户明确坚持使用默认音色。

使用此技能时，智能体应遵循以下工作流程：

步骤 1：检查 API 配置

首先，检查 API 凭证是否已配置：

from scripts.tts import check_api_config, setup_api_config

# 检查 API 是否已配置
config = check_api_config()
if not config:
    # API 未配置，必须停止并询问用户
    # 不允许继续使用默认音色
    print("⚠️ 未找到 API 凭证。没有 API 配置无法继续。")
    # ... 询问用户提供凭证或是否使用默认音色

步骤 2：处理缺失的 API 配置

如果 API 凭证缺失，智能体必须停止并询问用户：用户可以选择配置 API，或者明确坚持使用默认音色（仅当用户明确要求时才允许）。

智能体：我需要 API 凭证来使用豆包 TTS 服务。
       请提供以下信息：

1. 应用 ID (APPID)
2. 访问令牌 (Access Token)
3. 密钥 (Secret Key)

您可以从以下地址获取：https://console.volcengine.com/

请输入您的应用 ID：

用户：4269953601

智能体：请输入您的访问令牌：

用户：uAGE3iP8nJf3ewu-d6U1P6Jthv7i1DH7

智能体：请输入您的密钥：

用户：Dfta36aYgcdqY-ylHoh9FQyqiERS5iTv

智能体：[将凭证保存到 .env 文件]
       ✅ API 配置保存成功！

用户选择：如果用户不想配置 API 但坚持使用默认音色：

用户：我不想配置API，就用默认音色吧

智能体：⚠️ 确认使用默认音色吗？这将使用内置的默认音色生成音频。
       输入 'yes' 确认使用默认音色，或提供API凭证以获得更好的体验。

用户：yes

智能体：[继续执行，使用默认音色]

步骤 3：使用服务

在 API 配置完成或用户明确确认使用默认音色后：

from scripts.tts import VolcanoTTS

tts = VolcanoTTS()
output = tts.synthesize("Hello world", output_file="output.mp3")

API 配置检测

函数：`check_api_config()`

检查 API 凭证是否可用。返回配置字典或 None。

from scripts.tts import check_api_config

config = check_api_config()
if config:
    print(f"应用 ID: {config['app_id']}")
    print(f"访问令牌: {config['access_token'][:10]}...")
    print(f"密钥: {config['secret_key'][:10]}...")
else:
    print("API 未配置")

函数：`setup_api_config(app_id, access_token, secret_key, voice_type=None)`

将 API 凭证保存到技能目录下的 .env 文件。

from scripts.tts import setup_api_config

# 保存凭证
setup_api_config(
    app_id="4269953601",
    access_token="uAGE3iP8nJf3ewu-d6U1P6Jthv7i1DH7",
    secret_key="Dfta36aYgcdqY-ylHoh9FQyqiERS5iTv",
    voice_type="zh_female_cancan_mars_bigtts"  # 可选
)

print("✅ 配置已保存到 .env 文件")

完整的智能体工作流程示例

from scripts.tts import check_api_config, setup_api_config, VolcanoTTS

def synthesize_with_auto_config(text, output_file="output.mp3", use_default_voice=False):
    """
    使用自动 API 配置合成语音。

    重要：如果 API 未配置，此函数将停止并询问用户。
    除非用户明确确认，否则不会自动使用默认音色。
    """
    # 步骤 1：检查 API 是否已配置
    config = check_api_config()

    if not config:
        # 步骤 2：停止并询问用户 - 不允许自动继续
        print("🔐 需要 API 配置")
        print("=" * 50)
        print("\n⚠️ 未找到 API 凭证。您有两个选择：")
        print("\n选项 1：配置 API（推荐）")
        print("  请访问 https://console.volcengine.com/ 获取您的凭证")
        print("\n选项 2：使用默认音色")
        print("  ⚠️ 仅在您明确确认后可用")

        # 询问用户想要做什么
        choice = input("\n输入 '1' 配置 API，或 '2' 使用默认音色：").strip()

        if choice == '1':
            # 配置 API
            print("\n所需信息：")
            app_id = input("1. 输入您的应用 ID：").strip()
            access_token = input("2. 输入您的访问令牌：").strip()
            secret_key = input("3. 输入您的密钥：").strip()

            # 可选：询问首选音色
            print("\n🎙️ 可选：选择默认音色（按 Enter 使用灿灿）")
            voice_type = input("音色类型（或音色名称）：").strip()

            # 保存配置
            setup_api_config(app_id, access_token, secret_key, voice_type or None)
            print("\n✅ 配置已保存！")

        elif choice == '2':
            # 用户明确选择使用默认音色
            confirm = input("\n⚠️ 确认要使用默认音色吗？(yes/no)：").strip().lower()
            if confirm != 'yes':
                print("❌ 已取消。请配置 API 以继续。")
                return None
            use_default_voice = True
            print("\n⚠️ 按请求使用默认音色...")
        else:
            print("❌ 无效选择。请配置 API 以继续。")
            return None

    # 步骤 3：使用服务
    if use_default_voice:
        # 使用默认音色（仅在用户明确确认后）
        tts = VolcanoTTS(use_default=True)
    else:
        tts = VolcanoTTS()

    output_path = tts.synthesize(text, output_file=output_file)
    return output_path

# 使用示例
output = synthesize_with_auto_config("Hello, this is a test")
if output:
    print(f"音频已保存至：{output}")
else:
    print("操作已取消 - 需要 API 配置")

配置方法

安装

cd skills/volcano-tts
pip install -r requirements.txt

配置

方法 1：环境变量

export VOLCANO_TTS_APPID="your_app_id"
export VOLCANO_TTS_ACCESS_TOKEN="your_access_token"
export VOLCANO_TTS_SECRET_KEY="your_secret_key"
export VOLCANO_TTS_VOICE_TYPE="zh_female_cancan_mars_bigtts"  # 可选：设置默认音色

方法 2：.env 文件

复制 .env.example 为 .env 并填写您的凭证：

cp .env.example .env
# 使用您的凭证编辑 .env 文件

使用

命令行

# 基本用法（使用默认音色：灿灿）
python scripts/tts.py "Hello, this is a test of Doubao text-to-speech service"

# 指定输出文件和格式
python scripts/tts.py "Welcome to use TTS" -o output.mp3 -e mp3

# 从文件读取文本
python scripts/tts.py -f input.txt -o output.mp3

# 调整参数
python scripts/tts.py "Custom voice" --speed 1.2 --volume 0.8 -v zh_female_cancan_mars_bigtts

# 列出所有可用音色
python scripts/tts.py --list-voices

# 按类别列出音色
python scripts/tts.py --list-voices --category "General-Multilingual"

# 使用不同集群
python scripts/tts.py "Hello" --cluster volcano_tts

# 启用调试模式
python scripts/tts.py "Test" --debug

Python API

from scripts.tts import VolcanoTTS, VOICE_TYPES, VOICE_CATEGORIES

# 初始化客户端
tts = VolcanoTTS(
    app_id="your_app_id",
    access_token="your_access_token",
    secret_key="your_secret_key",
    voice_type="zh_female_cancan_mars_bigtts"  # 可选：设置默认音色
)

# 列出可用音色
print("所有音色：", tts.list_voices())
print("通用音色：", tts.list_voices("General-Normal"))

# 更改音色
tts.set_voice("zh_male_xudong_conversation_wvae_bigtts")  # 设置为“快乐小东”

# 合成语音
output_path = tts.synthesize(
    text="Hello, this is Doubao text-to-speech",
    voice_type="zh_female_cancan_mars_bigtts",  # 可选：覆盖默认设置
    encoding="mp3",
    cluster="volcano_tts",
    speed=1.0,
    volume=1.0,
    output_file="output.mp3"
)

print(f"音频已保存至：{output_path}")

交互式音色选择

此技能支持智能体与用户协作的交互式音色选择工作流程：

工作流程

智能体提示用户 - 智能体要求用户选择音色
显示音色选项 - 按类别显示推荐音色
用户选择 - 用户告知智能体其偏好的音色
智能体调用技能 - 智能体使用选定的音色生成音频

交互式选择的 Python API

重要：在使用以下代码之前，必须先检查 API 配置。如果没有配置，必须停止并询问用户。

from scripts.tts import (
    get_voice_selection_prompt,
    find_voice_by_name,
    get_voice_info,
    check_api_config,
    VolcanoTTS
)

# 步骤 0：首先检查 API 配置
config = check_api_config()
if not config:
    print("⚠️ 未找到 API 凭证。请先配置 API。")
    print("访问：https://console.volcengine.com/")
    # 在此停止并询问用户配置 API
    # 在 API 配置完成或用户明确确认使用默认音色之前，不允许继续音色选择

# 步骤 1：获取选择提示以显示给用户
prompt = get_voice_selection_prompt()
print(prompt)
# 智能体将此显示给用户并等待响应

# 步骤 2：用户响应其选择（例如，“Shiny”或“灿灿”）
user_input = "Shiny"  # 来自用户的输入

# 步骤 3：从用户输入中查找 voice_type
voice_type, voice_name = find_voice_by_name(user_input)
if voice_type:
    print(f"选择的音色：{voice_name} ({voice_type})")

    # 获取详细信息
    info = get_voice_info(voice_type)
    print(f"类别：{info['category_display']}")

    # 步骤 4：使用该音色进行合成（API 已验证）
    tts = VolcanoTTS(
        app_id="your_app_id",
        access_token="your_access_token",
        secret_key="your_secret_key"
    )

    output_path = tts.synthesize(
        text="Hello, this is the selected voice",
        voice_type=voice_type,
        output_file="output.mp3"
    )
    print(f"音频已保存至：{output_path}")
else:
    print("未找到音色。请选择有效的音色。")
    # 不允许自动使用默认音色 - 改为询问用户

智能体-用户对话示例

智能体：🎙️ 请为文本转语音合成选择一个音色：

以下是按类别推荐的音色：

[通用 - 常规]
  • 灿灿/Shiny [默认] (中文) -> voice_type: zh_female_cancan_mars_bigtts
  • 快乐小东 (中文) -> voice_type: zh_male_xudong_conversation_wvae_bigtts
  • 亲切女声 (中文) -> voice_type: zh_female_qinqienvsheng_moon_bigtts

[角色扮演]
  • 纯真少女 (中文) -> voice_type: ICL_zh_female_chunzhenshaonv_e588402fb8ad_tob
  • 霸道总裁 (中文) -> voice_type: ICL_zh_male_badaozongcai_v1_tob
  • 撒娇男友 (中文) -> voice_type: ICL_zh_male_sajiaonanyou_tob

[视频配音]
  • 猴哥 (中文) -> voice_type: zh_male_sunwukong_mars_bigtts
  • 熊二 (中文) -> voice_type: zh_male_xionger_mars_bigtts
  • 佩奇猪 (中文) -> voice_type: zh_female_peiqi_mars_bigtts

💡 提示：
  • 您可以说出音色名称（例如，‘Shiny’、‘猴哥’、‘霸道总裁’）
  • 或直接提供 voice_type
  • 输入 ‘list all’ 查看所有 200+ 可用音色
  • 按 Enter 使用默认音色 (Shiny) - **仅在 API 已配置时可用**

⚠️ **注意**：音色选择需要 API 凭证。如果未配置，您必须先配置 API 或明确确认使用默认音色。

您想使用哪个音色？

用户：I want to use 猴哥

智能体：[使用 voice_type="zh_male_sunwukong_mars_bigtts" 调用技能]
       ✅ 使用音色：猴哥 生成了音频

支持的输入格式

find_voice_by_name() 函数支持：
- 直接 voice_type：zh_female_cancan_mars_bigtts
- 中文名称：灿灿、猴哥、霸道总裁
- 英文别名：Shiny、Skye、Alvin
- 部分匹配：灿灿 匹配 灿灿/Shiny

参数

参数	描述	默认值	选项
voice_type	音色类型	zh_female_cancan_mars_bigtts	参见下方音色列表
encoding	音频格式	mp3	mp3, pcm, wav
sample_rate	采样率	24000	8000, 16000, 24000
speed	语速	1.0	0.5 - 2.0
volume	音量	1.0	0.5 - 2.0
cluster	集群名称	volcano_tts	volcano_tts

音色类别

通用 - 多语言（支持情感）

支持的情感：happy, sad, angry, surprised, fear, hate, excited, coldness, neutral, depressed, lovey-dovey, shy, comfort, tension, tender, storytelling, radio, magnetic, advertising, vocal-fry, ASMR, news, entertainment, dialect

voice_type	音色名称	语言
zh_male_lengkugege_emo_v2_mars_bigtts	冷酷哥哥 (情感)	中文
zh_female_tianxinxiaomei_emo_v2_mars_bigtts	甜心小妹 (情感)	中文
zh_female_gaolengyujie_emo_v2_mars_bigtts	高冷御姐 (情感)	中文
zh_male_aojiaobazong_emo_v2_mars_bigtts	傲娇霸总 (情感)	中文
zh_male_guangzhoudege_emo_mars_bigtts	广州的哥 (情感)	中文
zh_male_jingqiangkanye_emo_mars_bigtts	京腔侃爷 (情感)	中文
zh_female_linjuayi_emo_v2_mars_bigtts	邻居阿姨 (情感)	中文
zh_male_yourougongzi_emo_v2_mars_bigtts	温柔公子 (情感)	中文
zh_male_ruyayichen_emo_v2_mars_bigtts	儒雅男友 (情感)	中文
zh_m

技能包地址：https://github.com/openclaw/skills/tree/main/skills/xdrshjr/doubao-api-open-tts/SKILL.md

19 次点击 ∙ 0 人收藏

登录后收藏

0 条回复