OA0

OA0 是一个探索 AI 的社区

现在注册

已注册用户请登录

OA0 › 技能包 › atl-mobile：通过 ATL 实现移动端浏览器及原生 App (iOS) 自动化

atl-mobile：通过 ATL 实现移动端浏览器及原生 App (iOS) 自动化

comet · 2026-02-08 20:23:15 · 17 次点击 · 0 条评论

名称： atl-browser
描述： 通过 ATL 实现移动浏览器和原生应用的自动化（iOS 模拟器）。在 iPhone/iPad 模拟器上导航、点击、截图，并自动化网页和原生应用任务。
元数据：
openclaw:
emoji: "📱"
requires:
bins: ["xcrun", "xcodebuild", "curl"]
install:
- id: "atl-clone"
kind: "shell"
command: "git clone https://github.com/JordanCoin/Atl ~/Atl"
label: "克隆 ATL 仓库"
- id: "atl-setup"
kind: "shell"
command: "~/.openclaw/skills/atl-browser/scripts/setup.sh"
label: "构建 ATL 并安装到模拟器"

ATL — 代理触控层

AI 代理与 iOS 之间的自动化层

ATL 为 iOS 模拟器提供基于 HTTP 的自动化，涵盖浏览器（移动版 Safari）和原生应用。可以将其理解为移动端的 Playwright。

🔀 双服务器架构：浏览器与原生

ATL 使用两个独立的服务器分别处理浏览器和原生应用的自动化：

服务器	端口	用途	关键命令
浏览器	`9222`	移动 Safari 中的网页自动化	`goto`, `markElements`, `clickMark`, `evaluate`
原生	`9223`	iOS 应用自动化（设置、通讯录等任意应用）	`openApp`, `snapshot`, `tapRef`, `find`

┌─────────────────────────────────────────────────────────────┐
│  浏览器服务器 (9222)     │     原生服务器 (9223)           │
│  (移动 Safari/WebView)  │     (通过 XCTest 操作 iOS 应用) │
│                         │                                 │
│  markElements + clickMark │     snapshot + tapRef         │
│  CSS 选择器              │     无障碍功能树               │
│  DOM 评估                │     元素引用                   │
│  点击、滑动、截图        │     点击、滑动、截图           │
└─────────────────────────────────────────────────────────────┘

为何需要两个端口？ 原生应用自动化需要 XCTest API（XCUIApplication、XCUIElement），这些 API 仅在 UI 测试包中可用。原生服务器作为一个 UI 测试运行，并对外暴露 HTTP API。

启动服务器

# 浏览器服务器（随 AtlBrowser 应用自动启动）
xcrun simctl launch booted com.atl.browser
curl http://localhost:9222/ping  # → {"status":"ok"}

# 原生服务器（作为 UI 测试运行）
cd ~/Atl/core/AtlBrowser
xcodebuild test -workspace AtlBrowser.xcworkspace \
  -scheme AtlBrowser \
  -destination 'id=<SIMULATOR_UDID>' \
  -only-testing:AtlBrowserUITests/NativeServer/testNativeServer &

# 等待其启动，然后：
curl http://localhost:9223/ping  # → {"status":"ok","mode":"native"}

端口快速参考

任务	端口	示例
浏览网站	9222	`curl localhost:9222/command -d '{"method":"goto",...}'`
打开原生应用	9223	`curl localhost:9223/command -d '{"method":"openApp",...}'`
截图（浏览器）	9222	`curl localhost:9222/command -d '{"method":"screenshot"}'`
截图（原生）	9223	`curl localhost:9223/command -d '{"method":"screenshot"}'`

📱 原生应用自动化（端口 9223）

原生自动化使用端口 9223，通过无障碍功能树自动化任何 iOS 应用——无需 DOM 或 JavaScript，直接与元素交互。

打开与关闭应用

# 通过 Bundle ID 打开应用
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"openApp","params":{"bundleId":"com.apple.Preferences"}}'
# → {"success":true,"result":{"bundleId":"com.apple.Preferences","mode":"native","state":"running"}}

# 检查当前应用状态
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"appState"}'
# → {"success":true,"result":{"mode":"native","bundleId":"com.apple.Preferences","state":"running"}}

# 关闭当前应用
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"closeApp"}'
# → {"success":true,"result":{"closed":true}}

常用 Bundle ID

应用	Bundle ID
设置	`com.apple.Preferences`
通讯录	`com.apple.MobileAddressBook`
计算器	`com.apple.calculator`
日历	`com.apple.mobilecal`
照片	`com.apple.mobileslideshow`
备忘录	`com.apple.mobilenotes`
提醒事项	`com.apple.reminders`
时钟	`com.apple.mobiletimer`
地图	`com.apple.Maps`
Safari	`com.apple.mobilesafari`

`snapshot` 命令

snapshot 返回无障碍功能树——包含所有可见元素及其属性和可点击的引用。

curl -s -X POST http://localhost:9223/command \
  -d '{"method":"snapshot","params":{"interactiveOnly":true}}' | jq '.result'

输出示例：

{
  "count": 12,
  "elements": [
    {
      "ref": "e0",
      "type": "cell",
      "label": "Wi-Fi",
      "value": "MyNetwork",
      "identifier": "",
      "x": 0,
      "y": 142,
      "width": 393,
      "height": 44,
      "isHittable": true,
      "isEnabled": true
    },
    {
      "ref": "e1",
      "type": "cell",
      "label": "Bluetooth",
      "value": "On",
      "identifier": "",
      "x": 0,
      "y": 186,
      "width": 393,
      "height": 44,
      "isHittable": true,
      "isEnabled": true
    },
    {
      "ref": "e2",
      "type": "button",
      "label": "Back",
      "value": null,
      "identifier": "Back",
      "x": 0,
      "y": 44,
      "width": 80,
      "height": 44,
      "isHittable": true,
      "isEnabled": true
    }
  ]
}

参数：
- interactiveOnly (布尔值，默认: false) — 仅返回可点击元素
- maxDepth (整数，可选) — 限制树遍历深度

`tapRef` 命令

根据最近一次 snapshot 中的引用点击元素：

# 首先获取快照
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"snapshot","params":{"interactiveOnly":true}}'

# 点击元素 e0（来自上例的 Wi-Fi 单元格）
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"tapRef","params":{"ref":"e0"}}'
# → {"success":true}

`find` 命令

通过文本查找元素并交互——无需手动解析快照：

# 查找并点击 "Wi-Fi"
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"find","params":{"text":"Wi-Fi","action":"tap"}}'
# → {"success":true,"result":{"found":true,"ref":"e0"}}

# 检查元素是否存在
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"find","params":{"text":"Bluetooth","action":"exists"}}'
# → {"success":true,"result":{"found":true,"ref":"e1"}}

# 查找并填写文本字段
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"find","params":{"text":"First name","action":"fill","value":"John"}}'

# 获取元素信息而不交互
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"find","params":{"text":"Cancel","action":"get"}}'
# → {"success":true,"result":{"found":true,"ref":"e5","element":{...}}}

参数：
- text (字符串) — 要搜索的文本（匹配标签、值或标识符）
- action (字符串) — 可选值：tap, fill, exists, get
- value (字符串，可选) — 要填写的文本（action:"fill" 时必需）
- by (字符串，可选) — 缩小搜索范围：label, value, identifier, type, 或 any（默认）

🔄 原生应用工作流示例

这是一个完整流程：打开设置应用，导航到 Wi-Fi，并截图：

# 1. 打开设置应用
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"openApp","params":{"bundleId":"com.apple.Preferences"}}'

# 2. 等待应用启动
sleep 1

# 3. 获取快照以查看可用元素
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"snapshot","params":{"interactiveOnly":true}}' | jq '.result.elements[:5]'

# 4. 查找并点击 Wi-Fi
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"find","params":{"text":"Wi-Fi","action":"tap"}}'

# 5. 等待导航完成
sleep 0.5

# 6. 对 Wi-Fi 设置页面截图
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"screenshot"}' | jq -r '.result.data' | base64 -d > /tmp/wifi-settings.png

# 7. 返回上一页（从左侧边缘向右滑动）
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"swipe","params":{"direction":"right"}}'

# 8. 关闭应用
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"closeApp"}'

辅助脚本版本

source ~/.openclaw/skills/atl-browser/scripts/atl-helper.sh

atl_openapp "com.apple.Preferences"
sleep 1
atl_find "Wi-Fi" tap
sleep 0.5
atl_screenshot /tmp/wifi-settings.png
atl_swipe right
atl_closeapp

💡 核心理念：无需视觉模型的自动化

ATL 的杀手级特性是无需视觉模型的空间理解：

┌─────────────────────────────────────────────────────────────┐
│  markElements + captureForVision = 完整的页面认知           │
└─────────────────────────────────────────────────────────────┘

1. markElements  → 为每个交互元素编号 [1] [2] [3]
2. captureForVision → 生成带文本层和元素坐标的 PDF
3. tap x=234 y=567 → 在精确位置进行像素级点击

为何重要：
- 无需调用视觉 API — 用于“查看”页面的 token 成本为零
- 更快 — 无需往返 GPT-4V/Claude Vision
- 确定性 — 同一页面每次产生相同的坐标
- 可靠 — 像素级坐标 vs 视觉模型解释

无需视觉的工作流

# 1. 标记元素（添加编号标签并存储坐标）
curl -s -X POST http://localhost:9222/command \
  -d '{"id":"1","method":"markElements","params":{}}'

# 2. 捕获带文本层的 PDF（机器可读，包含坐标）
curl -s -X POST http://localhost:9222/command \
  -d '{"id":"2","method":"captureForVision","params":{"savePath":"/tmp","name":"page"}}' \
  | jq -r '.result.path'
# → /tmp/page.pdf（文本可选，包含元素位置）

# 3. 通过标记标签获取特定元素的位置
curl -s -X POST http://localhost:9222/command \
  -d '{"id":"3","method":"getMarkInfo","params":{"label":5}}' | jq '.result'
# → {"label":5, "tag":"button", "text":"Add to Cart", "x":187, "y":432, "width":120, "height":44}

# 4. 在精确坐标处点击
curl -s -X POST http://localhost:9222/command \
  -d '{"id":"4","method":"tap","params":{"x":187,"y":432}}'

标记告诉你元素在哪里。PDF 告诉你元素是什么。两者结合 = 完整的页面理解。

🎯 问题升级阶梯

当自动化遇到阻碍时，按以下层级逐步升级处理：

┌─────────────────────────────────────────────────────────────┐
│  层级 1: 坐标（快速、廉价、无需 API 调用）                 │
│  markElements → getMarkInfo → tap x,y                      │
│                                                             │
│  ↓ 如果尝试 2-3 次后仍卡住...                              │
│                                                             │
│  层级 2: 视觉回退（通过截图理解当前状态）                  │
│  screenshot → 分析 UI → 识别阻碍（模态框等）               │
│                                                             │
│  ↓ 如果仍然卡住...                                         │
│                                                             │
│  层级 3: JS 注入（直接操作 DOM）                           │
│  evaluate → dispatchEvent → 强制交互                       │
└─────────────────────────────────────────────────────────────┘

何时升级

现象	可能原因	操作
点击成功但无变化	打开了模态框/覆盖层	截图 → 查找新按钮
购物车数量未更新	网站需要登录或有机器人检测	尝试使用事件的 JS 点击
滚动后找不到元素	标记是相对于页面而非视口的	通过 `evaluate` 使用 `getBoundingClientRect`
相同错误重复 3 次以上	UI 状态意外改变	截图查看实际状态

实际模式：电商结账

# 1. 搜索并查找商品
atl_goto "https://store.com/search?q=headphones"
atl_mark

# 2. 首先，关闭任何模态框/横幅（务必执行此操作）
# 查找：close, dismiss, continue, accept, no thanks, got it
CLOSE=$(atl_find "close")
[ -n "$CLOSE" ] && atl_click $CLOSE && sleep 1

# 3. 查找并点击“加入购物车”
ATC=$(atl_find "Add to cart")
atl_click $ATC

# 4. 等待，然后检查是否生效
sleep 2
atl_screenshot /tmp/after-click.png

# 5. 如果购物车未更新，查看截图
# 可能打开了“选择选项”模态框 - 查找新的“加入购物车”按钮
# 这就是视觉回退——你需要“看到”发生了什么

关键洞察：模态框改变一切

当你在 Target、Amazon 等网站上点击“加入购物车”时，它们通常会：
1. 打开“选择选项”模态框（尺寸、颜色、数量）
2. 显示追加销售（保护计划、配件）
3. 显示带有“查看购物车”或“继续购物”的确认信息

你最初的点击是成功的——只是没有截图你就看不到结果。

🚀 快速开始（30 秒）

# 1. 设置（启动模拟器，安装 ATL）
~/.openclaw/skills/atl-browser/scripts/setup.sh

# 2. 导航到某处
curl -s -X POST http://localhost:9222/command \
  -d '{"id":"1","method":"goto","params":{"url":"https://example.com"}}'

# 3. 标记元素（显示 [1], [2], [3] 标签）
curl -s -X POST http://localhost:9222/command \
  -d '{"id":"2","method":"markElements","params":{}}'

# 4. 截图
curl -s -X POST http://localhost:9222/command \
  -d '{"id":"3","method":"screenshot","params":{}}' | jq -r '.result.data' | base64 -d > /tmp/page.png

# 5. 点击元素 [1]
curl -s -X POST http://localhost:9222/command \
  -d '{"id":"4","method":"clickMark","params":{"label":1}}'

或使用辅助函数：

source ~/.openclaw/skills/atl-browser/scripts/atl-helper.sh
atl_goto "https://example.com"
atl_mark
atl_screenshot /tmp/page.png
atl_click 1

快速参考

基础 URL： http://localhost:9222

常用命令

```bash

检查 ATL 是否在运行

curl -s http://localhost:9222/ping

技能包地址：https://github.com/openclaw/skills/tree/main/skills/jordancoin/atl-mobile/SKILL.md

17 次点击 ∙ 0 人收藏

登录后收藏

0 条回复