OA0
OA0 是一个探索 AI 的社区
现在注册
已注册用户请  登录
社区运行状况
注册会员 1032
主题 361
模型 2962
技能包 6701
数据集 1026
论文 236
开源项目 319
模型 厂商 综合平均分 推理均值 编程均值 智能体编程均值 数学均值 数据分析均值 语言均值 指令遵循均值
GPT-5.4 Thinking xHigh Effort OpenAI
80.28
88.12
77.54
70.0
94.15
79.31
82.63
70.22
Gemini 3.1 Pro Preview High* Google
79.93
84.0
76.45
65.0
91.04
78.54
85.38
79.1
Claude 4.6 Opus Thinking High Effort Anthropic
76.33
88.67
78.18
61.67
89.32
69.89
83.27
63.31
Claude 4.5 Opus Thinking High Effort Anthropic
75.96
80.09
79.65
63.33
90.39
74.44
81.26
62.55
Claude 4.6 Sonnet Thinking Medium Effort Anthropic
75.47
84.77
79.27
60.0
86.99
77.95
76.1
63.22
GPT-5.2 High OpenAI
74.84
83.21
76.07
51.67
93.17
78.16
79.81
61.77
GPT-5.2 Codex OpenAI
74.3
77.71
83.62
51.67
88.77
78.2
73.68
66.45
GPT-5.1 Codex Max High OpenAI
73.98
83.65
80.68
53.33
83.22
70.12
76.48
70.38
Gemini 3 Pro Preview High Google
73.39
77.42
74.6
55.0
81.84
74.39
84.62
65.85
GPT-5.3 Codex High OpenAI
72.76
80.15
78.18
55.0
87.84
62.69
80.09
65.38
Gemini 3 Flash Preview High Google
72.4
74.55
73.9
40.0
84.17
74.77
84.56
74.86
GPT-5.1 High OpenAI
72.04
78.79
72.49
53.33
86.9
69.61
79.26
63.9
GPT-5 Pro OpenAI
70.48
81.69
72.11
51.67
86.17
57.04
80.69
63.96
Kimi K2.5 Thinking Moonshot AI
69.07
75.96
77.86
48.33
84.87
61.36
77.67
57.41
GLM 5 Z.AI
68.85
69.11
73.64
55.0
83.46
67.9
77.53
55.33
GPT-5.1 Codex OpenAI
68.61
81.98
71.78
53.33
79.58
60.75
69.48
63.39
Claude Sonnet 4.5 Thinking Anthropic
68.19
77.59
80.36
53.33
79.31
56.97
76.45
53.35
Grok 4.20 Beta xAI
67.96
75.28
66.09
43.33
87.06
62.86
77.72
63.39
GPT-5 Mini High OpenAI
65.91
68.32
68.2
46.67
82.2
55.2
75.52
65.27
DeepSeek V3.2 Thinking DeepSeek
62.2
77.17
64.62
40.0
85.03
50.0
70.41
48.19
Grok 4 xAI
62.02
79.13
73.13
30.0
83.02
63.38
76.39
29.07
Claude 4.1 Opus Thinking Anthropic
61.81
72.33
74.66
48.33
73.19
48.98
72.76
42.4
Gemini 3.1 Flash Lite Preview High Google
61.68
59.66
68.52
33.33
73.56
54.9
73.18
68.62
Kimi K2 Thinking Moonshot AI
61.59
63.49
67.44
38.33
81.1
52.29
66.45
62.03
Claude Haiku 4.5 Thinking Anthropic
61.32
61.68
72.81
41.67
77.53
59.3
66.45
49.78
Claude 4 Sonnet Thinking Anthropic
61.27
69.01
77.48
40.0
70.5
54.63
72.91
44.34
GPT-5.1 Codex Mini OpenAI
60.38
64.71
69.93
40.0
76.26
49.7
63.01
59.02
Minimax M2.5 Minimax
60.14
59.3
70.7
51.67
77.41
49.6
55.1
57.23
GPT-5.3 Instant OpenAI
59.99
63.12
78.63
28.33
72.41
48.02
70.0
59.4
Grok 4.1 Fast xAI
59.99
80.2
69.61
31.67
83.72
52.24
74.33
28.2
Claude 4.5 Opus Medium Effort Anthropic
59.1
53.21
78.51
63.33
66.32
45.54
78.66
28.11
DeepSeek V3.2 Exp Thinking DeepSeek
58.9
64.37
70.06
31.67
82.4
51.5
71.06
41.27
Gemini 2.5 Pro (Max Thinking) Google
58.33
70.81
75.69
33.33
68.32
51.62
75.5
33.07
GLM 4.7 Z.AI
58.09
59.73
73.13
41.67
76.02
55.17
65.23
35.66
GLM 4.6 Z.AI
55.19
62.06
71.02
35.0
81.13
51.95
58.99
26.19
Claude 4.1 Opus Anthropic
54.45
40.89
76.07
53.33
62.83
45.38
76.75
25.92
Claude Sonnet 4.5 Anthropic
53.69
42.29
76.07
48.33
62.62
47.0
76.0
23.52
Gemini 2.5 Flash (Max Thinking) (2025-09-25) Google
53.09
51.45
67.5
23.33
75.35
60.98
65.34
27.68
Qwen 3 235B A22B Thinking 2507 Alibaba
52.97
59.4
68.97
6.67
73.39
52.18
69.52
40.64
DeepSeek V3.2 DeepSeek
51.84
44.25
75.69
46.67
63.95
45.03
64.24
23.06
Claude 4 Sonnet Anthropic
50.98
39.67
80.74
38.33
60.36
44.07
71.01
22.68
Qwen 3 Next 80B A3B Thinking Alibaba
50.41
58.16
60.66
8.33
74.26
53.58
56.31
41.54
DeepSeek V3.2 Exp DeepSeek
49.85
45.5
73.19
36.67
64.38
44.26
65.6
19.33
GPT-5.2 No Thinking OpenAI
48.91
42.8
76.45
40.0
58.25
47.68
49.97
27.2
Qwen 3 235B A22B Instruct 2507 Alibaba
48.84
58.43
69.61
13.33
68.03
44.72
66.07
21.72
GPT-5 Nano High OpenAI
48.62
40.29
62.39
23.33
68.41
43.41
46.84
55.7
Qwen 3 Next 80B A3B Instruct Alibaba
48.35
54.75
68.2
10.0
70.18
49.78
66.34
19.19
Kimi K2 Instruct Moonshot AI
48.1
42.23
74.28
31.67
58.15
43.34
66.69
20.36
Gemini 2.5 Flash (Max Thinking) (2025-06-05) Google
47.74
44.64
66.03
16.67
68.75
47.31
62.27
28.5
GPT OSS 120b OpenAI
46.09
39.21
60.21
16.67
68.87
38.8
48.59
50.29
Claude Haiku 4.5 Anthropic
45.33
33.94
72.17
33.33
57.97
45.13
57.05
17.75
Grok Code Fast xAI
45.13
42.3
64.44
33.33
56.01
48.99
48.56
22.27
Qwen 3 32B Alibaba
43.56
48.25
66.03
3.33
67.44
46.54
55.54
17.77
GPT-5.1 No Thinking OpenAI
42.65
26.81
77.48
28.33
44.51
44.07
53.84
23.5
Gemini 2.5 Flash Lite (Max Thinking) (2025-06-17) Google
42.56
43.34
66.41
5.0
61.04
47.04
51.98
23.08
Gemini 2.5 Flash Lite (Max Thinking) (2025-09-25) Google
42.39
36.16
65.39
1.67
64.9
47.88
52.6
28.11
Devstral 2 Mistral
41.24
27.74
66.79
43.33
52.52
39.14
45.67
13.5
GLM 4.6V Z.AI
40.07
37.22
64.24
3.33
62.5
46.41
49.74
17.06
Grok 4.20 Beta (Non-Reasoning) xAI
39.7
25.63
58.54
38.33
45.52
43.48
42.04
24.35
Qwen 3 30B A3B Alibaba
39.01
36.68
48.88
1.67
65.35
44.92
54.47
21.11
Grok 4.1 Fast (Non-Reasoning) xAI
33.45
23.35
54.26
10.0
38.92
40.61
50.01
16.98
Trinity Large Preview Arcee
32.74
20.61
65.65
3.33
44.93
40.33
42.15
12.19
LiveBench 数据来源:LiveBench
LiveBench 发布说明:LiveBench 发布说明
关于 ·  帮助 ·  PING ·  隐私政策 ·  服务条款   
OA0 - Omni AI 0 一个探索 AI 的社区
沪ICP备2024103595号-2
耗时 16 ms
Developed with Cursor