OA0

OA0 是一个探索 AI 的社区

现在注册

已注册用户请登录

名称： regex-patterns
描述： 跨语言和用例的实用正则表达式模式。适用于验证输入（电子邮件、URL、IP）、解析日志行、从文本中提取数据、使用搜索替换重构代码，或调试正则表达式为何不匹配。
元数据： {"clawdbot":{"emoji":"🔤","requires":{"anyBins":["grep","python3","node"]},"os":["linux","darwin","win32"]}}

正则表达式模式

实用的正则表达式速查手册。涵盖 JavaScript、Python、Go 及命令行工具中用于验证、解析、提取和重构的模式。

使用场景

验证用户输入（电子邮件、URL、IP、电话号码、日期）
解析日志行或结构化文本
从字符串中提取数据（ID、数字、令牌）
代码中的搜索替换（重命名变量、更新导入）
过滤文件或命令输出中的行
调试未按预期匹配的正则表达式

快速参考

元字符

模式	匹配内容	示例
`.`	任意字符（换行符除外）	`a.c` 匹配 `abc`、`a1c`
`\d`	数字 `[0-9]`	`\d{3}` 匹配 `123`
`\w`	单词字符 `[a-zA-Z0-9_]`	`\w+` 匹配 `hello_123`
`\s`	空白字符 `[ \t\n\r\f]`	`\s+` 匹配空格/制表符
`\b`	单词边界	`\bcat\b` 匹配 `cat` 而非 `scatter`
`^`	行首	`^Error` 匹配以 Error 开头的行
`$`	行尾	`\.js$` 匹配以 .js 结尾的行
`\D`, `\W`, `\S`	取反：非数字、非单词、非空白

量词

模式	含义
`*`	0 次或多次（贪婪）
`+`	1 次或多次（贪婪）
`?`	0 次或 1 次（可选）
`{3}`	恰好 3 次
`{2,5}`	2 到 5 次之间
`{3,}`	3 次或更多
`*?`, `+?`	惰性匹配（尽可能少匹配）

分组与选择

模式	含义
`(abc)`	捕获组
`(?:abc)`	非捕获组
`(?P<name>abc)`	命名分组（Python）
`(?<name>abc)`	命名分组（JS/Go）
`a\\|b`	选择（a 或 b）
`[abc]`	字符类（a、b 或 c）
`[^abc]`	否定字符类（非 a、b、c）
`[a-z]`	范围

前后查找

模式	含义
`(?=abc)`	肯定型顺序环视（后面跟着 abc）
`(?!abc)`	否定型顺序环视（后面不跟着 abc）
`(?<=abc)`	肯定型逆序环视（前面是 abc）
`(?<!abc)`	否定型逆序环视（前面不是 abc）

验证模式

电子邮件

# 基础（覆盖 99% 的真实邮箱）
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

# 更严格（无连续点号，本地部分首尾无点号）
^[a-zA-Z0-9]([a-zA-Z0-9._%+-]*[a-zA-Z0-9])?@[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?(\.[a-zA-Z]{2,})+$

URL

# HTTP/HTTPS URL
https?://[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?(\.[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?)*(/[^\s]*)?

# 包含可选端口和查询参数
https?://[^\s/]+(/[^\s?]*)?(\?[^\s#]*)?(#[^\s]*)?

IP 地址

# IPv4
\b(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\b

# IPv4（简单版，允许无效地址如 999.999.999.999）
\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b

# IPv6（简化版）
(?:[0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}

电话号码

# 美国电话（多种格式）
(?:\+1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}
# 匹配：+1 (555) 123-4567, 555.123.4567, 5551234567

# 国际号码（E.164 格式）
\+[1-9]\d{6,14}

日期与时间

# ISO 8601 日期
\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])

# ISO 8601 日期时间
\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?(?:Z|[+-]\d{2}:\d{2})

# 美国日期格式（MM/DD/YYYY）
(?:0[1-9]|1[0-2])/(?:0[1-9]|[12]\d|3[01])/\d{4}

# 时间（HH:MM:SS，24 小时制）
(?:[01]\d|2[0-3]):[0-5]\d:[0-5]\d

密码（强度检查）

# 至少 8 个字符，包含 1 个大写字母、1 个小写字母、1 个数字、1 个特殊字符
^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%^&*()_+=-]).{8,}$

UUID

[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}

语义化版本

\bv?(\d+)\.(\d+)\.(\d+)(?:-([\w.]+))?(?:\+([\w.]+))?\b
# 捕获：主版本号、次版本号、修订号、预发布标签、构建元数据
# 匹配：1.2.3, v1.0.0-beta.1, 2.0.0+build.123

解析模式

日志行

# Apache/Nginx 访问日志
# 格式：IP - - [日期] "方法 /路径 HTTP/版本" 状态码 大小
grep -oP '(\S+) - - \[([^\]]+)\] "(\w+) (\S+) \S+" (\d+) (\d+)' access.log

# 提取 IP 和状态码
grep -oP '^\S+|"\s\K\d{3}' access.log

# 系统日志格式
# 格式：月 日 时:分:秒 主机名 进程[PID]: 消息
grep -oP '^\w+\s+\d+\s[\d:]+\s(\S+)\s(\S+)\[(\d+)\]:\s(.*)' syslog

# JSON 日志 — 提取字段
grep -oP '"level"\s*:\s*"\K[^"]+' app.log
grep -oP '"message"\s*:\s*"\K[^"]+' app.log

代码模式

# 查找函数定义（JavaScript/TypeScript）
grep -nP '(?:function\s+\w+|(?:const|let|var)\s+\w+\s*=\s*(?:async\s*)?\([^)]*\)\s*=>|(?:async\s+)?function\s*\()' src/*.ts

# 查找类定义
grep -nP 'class\s+\w+(?:\s+extends\s+\w+)?' src/*.ts

# 查找导入语句
grep -nP '^import\s+.*\s+from\s+' src/*.ts

# 查找 TODO/FIXME/HACK 注释
grep -rnP '(?:TODO|FIXME|HACK|XXX|WARN)(?:\([^)]+\))?:?\s+' src/

# 查找代码中遗留的 console.log
grep -rnP 'console\.(log|debug|info|warn|error)\(' src/ --include='*.ts' --include='*.js'

数据提取

# 从文件中提取所有电子邮件地址
grep -oP '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' file.txt

# 提取所有 URL
grep -oP 'https?://[^\s<>"]+' file.html

# 提取所有引号内的字符串
grep -oP '"[^"\\]*(?:\\.[^"\\]*)*"' file.json

# 提取数字（整数和小数）
grep -oP '-?\d+\.?\d*' data.txt

# 提取键值对（key=value）
grep -oP '\b(\w+)=([^\s&]+)' query.txt

# 提取话题标签
grep -oP '#\w+' posts.txt

# 提取十六进制颜色值
grep -oP '#[0-9a-fA-F]{3,8}\b' styles.css

语言特定用法

JavaScript

// 测试字符串是否匹配
const emailRegex = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;
emailRegex.test('user@example.com'); // true

// 使用捕获组提取
const match = '2026-02-03T12:30:00Z'.match(/(\d{4})-(\d{2})-(\d{2})/);
// match[1] = '2026', match[2] = '02', match[3] = '03'

// 命名分组
const m = 'John Doe, age 30'.match(/(?<name>[A-Za-z ]+), age (?<age>\d+)/);
// m.groups.name = 'John Doe', m.groups.age = '30'

// 查找所有匹配（matchAll 返回迭代器）
const text = 'Call 555-1234 or 555-5678';
const matches = [...text.matchAll(/\d{3}-\d{4}/g)];
// [{0: '555-1234', index: 5}, {0: '555-5678', index: 18}]

// 使用回调函数替换
'hello world'.replace(/\b\w/g, c => c.toUpperCase());
// 'Hello World'

// 使用命名分组替换
'2026-02-03'.replace(/(?<y>\d{4})-(?<m>\d{2})-(?<d>\d{2})/, '$<m>/$<d>/$<y>');
// '02/03/2026'

// 使用正则表达式分割
'one, two;  three'.split(/[,;]\s*/);
// ['one', 'two', 'three']

Python

import re

# 匹配（从字符串开头开始）
m = re.match(r'^(\w+)@(\w+)\.(\w+)$', 'user@example.com')
if m:
    print(m.group(1))  # 'user'

# 搜索（在任意位置查找第一个匹配）
m = re.search(r'\d{3}-\d{4}', 'Call 555-1234 today')
print(m.group())  # '555-1234'

# 查找所有匹配
emails = re.findall(r'[\w.+-]+@[\w.-]+\.\w{2,}', text)

# 命名分组
m = re.match(r'(?P<name>\w+)\s+(?P<age>\d+)', 'Alice 30')
print(m.group('name'))  # 'Alice'

# 替换
result = re.sub(r'\bfoo\b', 'bar', 'foo foobar foo')
# 'bar foobar bar'

# 使用回调函数替换
result = re.sub(r'\b\w', lambda m: m.group().upper(), 'hello world')
# 'Hello World'

# 编译以供复用（在循环中更快）
pattern = re.compile(r'\d{4}-\d{2}-\d{2}')
dates = pattern.findall(log_text)

# 多行和 DOTALL 模式
re.findall(r'^ERROR.*$', text, re.MULTILINE)  # ^ 和 $ 匹配行边界
re.search(r'start.*end', text, re.DOTALL)      # . 匹配换行符

# 详细模式（可读性强的复杂模式）
pattern = re.compile(r'''
    ^                   # 字符串开始
    (?P<year>\d{4})     # 年份
    -(?P<month>\d{2})   # 月份
    -(?P<day>\d{2})     # 日期
    $                   # 字符串结束
''', re.VERBOSE)

Go

import "regexp"

// 编译模式（无效正则表达式会引发 panic）
re := regexp.MustCompile(`\d{4}-\d{2}-\d{2}`)

// 匹配测试
re.MatchString("2026-02-03")  // true

// 查找第一个匹配
re.FindString("Date: 2026-02-03 and 2026-03-01")  // "2026-02-03"

// 查找所有匹配
re.FindAllString(text, -1)  // 返回所有匹配的 []string

// 捕获组
re := regexp.MustCompile(`(\w+)@(\w+)\.(\w+)`)
match := re.FindStringSubmatch("user@example.com")
// match[0] = "user@example.com", match[1] = "user", match[2] = "example"

// 命名分组
re := regexp.MustCompile(`(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})`)
match := re.FindStringSubmatch("2026-02-03")
for i, name := range re.SubexpNames() {
    if name != "" {
        fmt.Printf("%s: %s\n", name, match[i])
    }
}

// 替换
re.ReplaceAllString("foo123bar", "NUM")  // "fooNUMbar"

// 使用函数替换
re.ReplaceAllStringFunc(text, strings.ToUpper)

// 注意：Go 使用 RE2 语法 — 不支持前后查找

命令行（grep/sed）

# grep -P 使用 PCRE（Perl 兼容 — 功能完整）
# grep -E 使用扩展正则表达式（不支持前后查找）

# 查找匹配模式的行
grep -P '\d{3}-\d{4}' file.txt

# 仅提取匹配部分
grep -oP '\d{3}-\d{4}' file.txt

# 反向匹配（不匹配的行）
grep -vP 'DEBUG|TRACE' app.log

# sed 替换
sed 's/oldPattern/newText/g' file.txt         # 基础
sed -E 's/foo_([a-z]+)/bar_\1/g' file.txt     # 扩展模式，使用捕获组

# Perl 单行命令（功能最强大）
perl -pe 's/(?<=price:\s)\d+/0/g' file.txt    # Perl 支持逆序环视

搜索替换模式

代码重构

# 跨文件重命名变量
grep -rlP '\boldName\b' src/ | xargs sed -i 's/\boldName\b/newName/g'

# 将 var 转换为 const（JavaScript）
sed -i -E 's/\bvar\b/const/g' src/*.js

# 将单引号转换为双引号
sed -i "s/'/\"/g" src/*.ts

# 为对象属性添加尾随逗号
sed -i -E 's/^(\s+\w+:.+[^,])$/\1,/' config.json

# 更新导入路径
sed -i 's|from '\''../old-path/|from '\''../new-path/|g' src/*.ts

# 将 snake_case 转换为 camelCase（Python → JavaScript 命名）
perl -pe 's/_([a-z])/uc($1)/ge' file.txt

技能包地址：https://github.com/openclaw/skills/tree/main/skills/gitgoodordietrying/regex-patterns/SKILL.md

21 次点击 ∙ 0 人收藏

登录后收藏

0 条回复

regex-patterns：跨语言的实用正则表达式模式