eachlabs-voice-audio：使用 ElevenLabs/Whisper/RVC 的语音合成、转录与转换工具

canyon · 2026-02-05 21:34:46 · 28 次点击 · 0 条评论

名称： eachlabs-voice-audio
描述： 使用 EachLabs AI 模型进行文本转语音、语音转文本、声音转换和音频处理。支持 ElevenLabs TTS、带说话人分离的 Whisper 转录以及 RVC 声音转换。当用户需要 TTS、转录或声音转换时使用。
元数据：
author: eachlabs
version: "1.0"

EachLabs 语音与音频

通过 EachLabs Predictions API 实现文本转语音、语音转文本转录、声音转换及音频工具。

身份验证

请求头: X-API-Key: <你的API密钥>

请设置 EACHLABS_API_KEY 环境变量。访问 eachlabs.ai 获取你的密钥。

可用模型

文本转语音

模型	标识符	最佳用途
ElevenLabs TTS	`elevenlabs-text-to-speech`	高质量 TTS
ElevenLabs TTS（带时间戳）	`elevenlabs-text-to-speech-with-timestamp`	带词级时间戳的 TTS
ElevenLabs 文本转对话	`elevenlabs-text-to-dialogue`	多说话人对话生成
ElevenLabs 音效生成	`elevenlabs-sound-effects`	音效生成
ElevenLabs 声音设计 v2	`elevenlabs-voice-design-v2`	自定义声音设计
Kling V1 TTS	`kling-v1-tts`	Kling 文本转语音
Kokoro 82M	`kokoro-82m`	轻量级 TTS
Play AI 对话	`play-ai-text-to-speech-dialog`	对话 TTS
Stable Audio 2.5	`stable-audio-2-5-text-to-audio`	文本转音频

语音转文本

模型	标识符	最佳用途
ElevenLabs Scribe v2	`elevenlabs-speech-to-text-scribe-v2`	最高质量转录
ElevenLabs STT	`elevenlabs-speech-to-text`	标准转录
Wizper（带时间戳）	`wizper-with-timestamp`	带时间戳的转录
Wizper	`wizper`	基础转录
Whisper	`whisper`	开源转录
Whisper 说话人分离	`whisper-diarization`	说话人识别
极速 Whisper	`incredibly-fast-whisper`	最快转录

声音转换与克隆

模型	标识符	最佳用途
RVC v2	`rvc-v2`	声音转换
训练 RVC 模型	`train-rvc`	训练自定义声音模型
ElevenLabs 声音克隆	`elevenlabs-voice-clone`	声音克隆
ElevenLabs 声音变换器	`elevenlabs-voice-changer`	声音变换
ElevenLabs 声音设计 v3	`elevenlabs-voice-design-v3`	高级声音设计
ElevenLabs 视频配音	`elevenlabs-dubbing`	视频配音
Chatterbox 语音转语音	`chatterbox-speech-to-speech`	语音到语音转换
Open Voice	`openvoice`	开源声音克隆
XTTS v2	`xtts-v2`	多语言声音克隆
Stable Audio 2.5 音频修复	`stable-audio-2-5-inpaint`	音频修复
Stable Audio 2.5 音频转音频	`stable-audio-2-5-audio-to-audio`	音频转换
音频修剪器（带淡入淡出）	`audio-trimmer-with-fade`	带淡入淡出效果的音频修剪

音频工具

模型	标识符	最佳用途
FFmpeg 音视频合并	`ffmpeg-api-merge-audio-video`	音频与视频合并
工具包视频转换	`toolkit`	视频/音频格式转换

预测流程

检查模型 GET https://api.eachlabs.ai/v1/model?slug=<slug> — 验证模型是否存在，并返回包含精确输入参数的 request_schema。创建预测前务必执行此步骤，以确保输入正确。
POST https://api.eachlabs.ai/v1/prediction 请求，包含模型标识符、版本 "0.0.1" 以及符合架构的输入数据。
轮询 GET https://api.eachlabs.ai/v1/prediction/{id} 直到状态变为 "success" 或 "failed"。
提取响应中的输出结果。

示例

使用 ElevenLabs 进行文本转语音

curl -X POST https://api.eachlabs.ai/v1/prediction \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $EACHLABS_API_KEY" \
  -d '{
    "model": "elevenlabs-text-to-speech",
    "version": "0.0.1",
    "input": {
      "text": "欢迎来到我们的产品演示。今天我们将介绍主要功能。",
      "voice_id": "EXAVITQu4vr4xnSDxMaL",
      "model_id": "eleven_v3",
      "stability": 0.5,
      "similarity_boost": 0.7
    }
  }'

使用 ElevenLabs Scribe 进行转录

curl -X POST https://api.eachlabs.ai/v1/prediction \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $EACHLABS_API_KEY" \
  -d '{
    "model": "elevenlabs-speech-to-text-scribe-v2",
    "version": "0.0.1",
    "input": {
      "media_url": "https://example.com/recording.mp3",
      "diarize": true,
      "timestamps_granularity": "word"
    }
  }'

使用 Wizper (Whisper) 进行转录

curl -X POST https://api.eachlabs.ai/v1/prediction \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $EACHLABS_API_KEY" \
  -d '{
    "model": "wizper-with-timestamp",
    "version": "0.0.1",
    "input": {
      "audio_url": "https://example.com/audio.mp3",
      "language": "en",
      "task": "transcribe",
      "chunk_level": "segment"
    }
  }'

使用 Whisper 进行说话人分离

curl -X POST https://api.eachlabs.ai/v1/prediction \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $EACHLABS_API_KEY" \
  -d '{
    "model": "whisper-diarization",
    "version": "0.0.1",
    "input": {
      "file_url": "https://example.com/meeting.mp3",
      "num_speakers": 3,
      "language": "en",
      "group_segments": true
    }
  }'

使用 RVC v2 进行声音转换

curl -X POST https://api.eachlabs.ai/v1/prediction \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $EACHLABS_API_KEY" \
  -d '{
    "model": "rvc-v2",
    "version": "0.0.1",
    "input": {
      "input_audio": "https://example.com/vocals.wav",
      "rvc_model": "CUSTOM",
      "custom_rvc_model_download_url": "https://example.com/my-voice-model.zip",
      "pitch_change": 0,
      "output_format": "wav"
    }
  }'

合并音频与视频

curl -X POST https://api.eachlabs.ai/v1/prediction \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $EACHLABS_API_KEY" \
  -d '{
    "model": "ffmpeg-api-merge-audio-video",
    "version": "0.0.1",
    "input": {
      "video_url": "https://example.com/video.mp4",
      "audio_url": "https://example.com/narration.mp3",
      "start_offset": 0
    }
  }'

ElevenLabs 语音 ID

elevenlabs-text-to-speech 模型支持以下语音 ID。请直接传递原始 ID 字符串：

语音 ID	备注
`EXAVITQu4vr4xnSDxMaL`	默认语音
`9BWtsMINqrJLrRacOk9x`	—
`CwhRBWXzGAHq8TQ4Fs17`	—
`FGY2WhTYpPnrIDTdsKH5`	—
`JBFqnCBsd6RMkjVDRZzb`	—
`N2lVS1w4EtoT3dr4eOWO`	—
`TX3LPaxmHKxFdv7VOQHJ`	—
`XB0fDUnXU5powFXDhCwa`	—
`onwK4e9ZLuTAKqWW03F9`	—
`pFZP5JQG7iQjIQuC4Bku`	—

参数参考

各模型的完整参数详情，请参阅 references/MODELS.md。

技能包地址：https://github.com/openclaw/skills/tree/main/skills/eftalyurtseven/eachlabs-voice-audio/SKILL.md

28 次点击 ∙ 0 人收藏

登录后收藏

0 条回复