OA0

OA0 是一个探索 AI 的社区

现在注册

已注册用户请登录

名称： upstage-document-parse
描述： 使用 Upstage 文档解析 API 解析文档（PDF、图像、DOCX、PPTX、XLSX、HWP）。提取文本、表格、图像和带有边界框的布局元素。当用户要求解析、提取或分析文档内容、将文档转换为 Markdown/HTML，或从 PDF 和图像中提取结构化数据时使用。
主页： https://console.upstage.ai/api/document-digitization/document-parsing
元数据： {"openclaw":{"emoji":"📑","requires":{"bins":["curl"],"env":["UPSTAGE_API_KEY"]},"primaryEnv":"UPSTAGE_API_KEY"}}

Upstage 文档解析

使用 Upstage 文档解析 API 从文档中提取结构化内容。

支持格式

PDF（异步处理最多 1000 页）、PNG、JPG、JPEG、TIFF、BMP、GIF、WEBP、DOCX、PPTX、XLSX、HWP

安装

clawhub install upstage-document-parse

API 密钥设置

从 Upstage 控制台获取您的 API 密钥。
配置 API 密钥：

openclaw config set skills.entries.upstage-document-parse.apiKey "your-api-key"

或者添加到 ~/.openclaw/openclaw.json 文件中：

{
  "skills": {
    "entries": {
      "upstage-document-parse": {
        "apiKey": "your-api-key"
      }
    }
  }
}

使用示例

直接要求智能体解析您的文档：

"解析这个 PDF：~/Documents/report.pdf"
"解析：~/Documents/report.jpg"

同步 API（小型文档）

适用于小型文档（建议少于 20 页）。

参数

参数	类型	默认值	描述
`model`	字符串	必填	使用 `document-parse`（最新版）或 `document-parse-nightly`
`document`	文件	必填	要解析的文档文件
`mode`	字符串	`standard`	`standard`（侧重文本）、`enhanced`（复杂表格/图像）、`auto`
`ocr`	字符串	`auto`	`auto`（仅图像）或 `force`（始终进行 OCR）
`output_formats`	字符串	`['html']`	`text`、`html`、`markdown`（数组格式）
`coordinates`	布尔值	`true`	包含边界框坐标
`base64_encoding`	字符串	`[]`	需要 base64 编码的元素：`["table"]`、`["figure"]` 等
`chart_recognition`	布尔值	`true`	将图表转换为表格（Beta 功能）
`merge_multipage_tables`	布尔值	`false`	跨页合并表格（Beta 功能，若启用则最多 20 页）

基础解析

curl -X POST "https://api.upstage.ai/v1/document-digitization" \
  -H "Authorization: Bearer $UPSTAGE_API_KEY" \
  -F "document=@/path/to/file.pdf" \
  -F "model=document-parse"

提取 Markdown

curl -X POST "https://api.upstage.ai/v1/document-digitization" \
  -H "Authorization: Bearer $UPSTAGE_API_KEY" \
  -F "document=@report.pdf" \
  -F "model=document-parse" \
  -F "output_formats=['markdown']"

复杂文档的增强模式

curl -X POST "https://api.upstage.ai/v1/document-digitization" \
  -H "Authorization: Bearer $UPSTAGE_API_KEY" \
  -F "document=@complex.pdf" \
  -F "model=document-parse" \
  -F "mode=enhanced" \
  -F "output_formats=['html', 'markdown']"

为扫描文档强制 OCR

curl -X POST "https://api.upstage.ai/v1/document-digitization" \
  -H "Authorization: Bearer $UPSTAGE_API_KEY" \
  -F "document=@scan.pdf" \
  -F "model=document-parse" \
  -F "ocr=force"

将表格图像提取为 Base64

curl -X POST "https://api.upstage.ai/v1/document-digitization" \
  -H "Authorization: Bearer $UPSTAGE_API_KEY" \
  -F "document=@invoice.pdf" \
  -F "model=document-parse" \
  -F "base64_encoding=['table']"

响应结构

{
  "api": "2.0",
  "model": "document-parse-251217",
  "content": {
    "html": "<h1>...</h1>",
    "markdown": "# ...",
    "text": "..."
  },
  "elements": [
    {
      "id": 0,
      "category": "heading1",
      "content": { "html": "...", "markdown": "...", "text": "..." },
      "page": 1,
      "coordinates": [{"x": 0.06, "y": 0.05}, ...]
    }
  ],
  "usage": { "pages": 1 }
}

元素类别

paragraph、heading1、heading2、heading3、list、table、figure、chart、equation、caption、header、footer、index、footnote

异步 API（大型文档）

适用于最多 1000 页的文档。文档按每批 10 页进行处理。

提交请求

curl -X POST "https://api.upstage.ai/v1/document-digitization/async" \
  -H "Authorization: Bearer $UPSTAGE_API_KEY" \
  -F "document=@large.pdf" \
  -F "model=document-parse" \
  -F "output_formats=['markdown']"

响应：

{"request_id": "uuid-here"}

检查状态并获取结果

curl "https://api.upstage.ai/v1/document-digitization/requests/{request_id}" \
  -H "Authorization: Bearer $UPSTAGE_API_KEY"

响应包含每个批次的 download_url（有效期为 30 天）。

列出所有请求

curl "https://api.upstage.ai/v1/document-digitization/requests" \
  -H "Authorization: Bearer $UPSTAGE_API_KEY"

状态值

submitted：请求已接收
started：处理中
completed：准备就绪，可下载
failed：发生错误（检查 failure_message）

注意事项

结果存储 30 天
下载链接 15 分钟后过期（重新获取状态以获取新链接）
文档被分割成最多 10 页的批次

Python 使用示例

import requests

api_key = "up_xxx"

# 同步 API
with open("doc.pdf", "rb") as f:
    response = requests.post(
        "https://api.upstage.ai/v1/document-digitization",
        headers={"Authorization": f"Bearer {api_key}"},
        files={"document": f},
        data={"model": "document-parse", "output_formats": "['markdown']"}
    )
print(response.json()["content"]["markdown"])

# 异步 API（用于大型文档）
with open("large.pdf", "rb") as f:
    r = requests.post(
        "https://api.upstage.ai/v1/document-digitization/async",
        headers={"Authorization": f"Bearer {api_key}"},
        files={"document": f},
        data={"model": "document-parse"}
    )
request_id = r.json()["request_id"]

# 轮询结果
import time
while True:
    status = requests.get(
        f"https://api.upstage.ai/v1/document-digitization/requests/{request_id}",
        headers={"Authorization": f"Bearer {api_key}"}
    ).json()
    if status["status"] == "completed":
        break
    time.sleep(5)

LangChain 集成

from langchain_upstage import UpstageDocumentParseLoader

loader = UpstageDocumentParseLoader(
    file_path="document.pdf",
    output_format="markdown",
    ocr="auto"
)
docs = loader.load()

环境变量（替代方案）

您也可以将 API 密钥设置为环境变量：

export UPSTAGE_API_KEY="your-api-key"

使用技巧

对于复杂的表格、图表、图像，使用 mode=enhanced
使用 mode=auto 让 API 为每页自动选择模式
对于超过 20 页的文档，使用异步 API
对于扫描的 PDF 或图像，使用 ocr=force
merge_multipage_tables=true 可合并跨页表格（在增强模式下最多 20 页）
异步 API 的结果保留 30 天
服务器端超时：每个同步 API 请求 5 分钟
标准文档处理时间约 3 秒

技能包地址：https://github.com/openclaw/skills/tree/main/skills/upstage-deployment/upstage-document-parse/SKILL.md

21 次点击 ∙ 0 人收藏

登录后收藏

0 条回复

upstage-document-parse：高效文档解析工具