OA0
OA0 是一个探索 AI 的社区
现在注册
已注册用户请  登录
社区运行状况
注册会员 1100
主题 846
模型 3026
技能包 13874
数据集 1047
论文 331
开源项目 532
模型 厂商 综合平均分 推理均值 编程均值 智能体编程均值 数学均值 数据分析均值 语言均值 指令遵循均值
GPT-5.4 Thinking xHigh Effort OpenAI
80.28
88.12
77.54
70.0
94.15
79.31
82.63
70.22
Gemini 3.1 Pro Preview High*
*5th rank in unseen questions across all categories
Google
79.93
84.0
76.45
65.0
91.04
78.54
85.38
79.1
Claude 4.7 Opus Thinking xHigh Effort Anthropic
76.91
87.69
82.09
60.0
93.1
78.26
77.91
59.34
Claude 4.6 Opus Thinking High Effort Anthropic
76.33
88.67
78.18
61.67
89.32
69.89
83.27
63.31
Claude 4.5 Opus Thinking High Effort Anthropic
75.96
80.09
79.65
63.33
90.39
74.44
81.26
62.55
Claude 4.6 Sonnet Thinking Medium Effort Anthropic
75.47
84.77
79.27
60.0
86.99
77.95
76.1
63.22
GPT-5.2 High OpenAI
74.84
83.21
76.07
51.67
93.17
78.16
79.81
61.77
GPT-5.2 Codex OpenAI
74.3
77.71
83.62
51.67
88.77
78.2
73.68
66.45
GPT-5.1 Codex Max High OpenAI
73.98
83.65
80.68
53.33
83.22
70.12
76.48
70.38
Gemini 3 Pro Preview High Google
73.39
77.42
74.6
55.0
81.84
74.39
84.62
65.85
GPT-5.3 Codex High OpenAI
72.76
80.15
78.18
55.0
87.84
62.69
80.09
65.38
Gemini 3 Flash Preview High Google
72.4
74.55
73.9
40.0
84.17
74.77
84.56
74.86
GPT-5.1 High OpenAI
72.04
78.79
72.49
53.33
86.9
69.61
79.26
63.9
Qwen 3.6 Plus Alibaba
70.85
75.83
78.18
55.0
83.72
69.91
74.99
58.34
GPT-5 Pro OpenAI
70.48
81.69
72.11
51.67
86.17
57.04
80.69
63.96
GPT-5.4 Nano xHigh OpenAI
70.13
81.05
72.14
49.12
91.27
67.64
62.47
67.2
Kimi K2.5 Thinking Moonshot AI
69.07
75.96
77.86
48.33
84.87
61.36
77.67
57.41
GLM 5 Z.AI
68.85
69.11
73.64
55.0
83.46
67.9
77.53
55.33
GPT-5.1 Codex OpenAI
68.61
81.98
71.78
53.33
79.58
60.75
69.48
63.39
Claude Sonnet 4.5 Thinking Anthropic
68.19
77.59
80.36
53.33
79.31
56.97
76.45
53.35
Grok 4.20 Beta xAI
67.96
75.28
66.09
43.33
87.06
62.86
77.72
63.39
GPT-5.4 Mini xHigh OpenAI
67.54
72.5
71.62
47.46
78.56
70.95
71.46
60.27
GPT-5 Mini High OpenAI
65.91
68.32
68.2
46.67
82.2
55.2
75.52
65.27
Minimax M2.7 Minimax
63.49
74.79
54.9
50.0
80.54
56.34
66.78
61.12
DeepSeek V3.2 Thinking DeepSeek
62.2
77.17
64.62
40.0
85.03
50.0
70.41
48.19
Grok 4 xAI
62.02
79.13
73.13
30.0
83.02
63.38
76.39
29.07
Claude 4.1 Opus Thinking Anthropic
61.81
72.33
74.66
48.33
73.19
48.98
72.76
42.4
Gemini 3.1 Flash Lite Preview High Google
61.68
59.66
68.52
33.33
73.56
54.9
73.18
68.62
Gemma 4 31B Google
61.62
59.42
60.33
40.0
73.94
58.76
71.34
67.58
Kimi K2 Thinking Moonshot AI
61.59
63.49
67.44
38.33
81.1
52.29
66.45
62.03
Claude Haiku 4.5 Thinking Anthropic
61.32
61.68
72.81
41.67
77.53
59.3
66.45
49.78
Claude 4 Sonnet Thinking Anthropic
61.27
69.01
77.48
40.0
70.5
54.63
72.91
44.34
GPT-5.1 Codex Mini OpenAI
60.38
64.71
69.93
40.0
76.26
49.7
63.01
59.02
Minimax M2.5 Minimax
60.14
59.3
70.7
51.67
77.41
49.6
55.1
57.23
GPT-5.3 Instant OpenAI
59.99
63.12
78.63
28.33
72.41
48.02
70.0
59.4
Grok 4.1 Fast xAI
59.99
80.2
69.61
31.67
83.72
52.24
74.33
28.2
Claude 4.5 Opus Medium Effort Anthropic
59.1
53.21
78.51
63.33
66.32
45.54
78.66
28.11
DeepSeek V3.2 Exp Thinking DeepSeek
58.9
64.37
70.06
31.67
82.4
51.5
71.06
41.27
Gemini 2.5 Pro (Max Thinking) Google
58.33
70.81
75.69
33.33
68.32
51.62
75.5
33.07
MiMo V2 Pro Xiaomi
58.14
69.7
68.85
30.0
76.96
49.21
69.07
43.22
GLM 4.7 Z.AI
58.09
59.73
73.13
41.67
76.02
55.17
65.23
35.66
GLM 4.6 Z.AI
55.19
62.06
71.02
35.0
81.13
51.95
58.99
26.19
Claude 4.1 Opus Anthropic
54.45
40.89
76.07
53.33
62.83
45.38
76.75
25.92
Claude Sonnet 4.5 Anthropic
53.69
42.29
76.07
48.33
62.62
47.0
76.0
23.52
Gemini 2.5 Flash (Max Thinking) (2025-09-25) Google
53.09
51.45
67.5
23.33
75.35
60.98
65.34
27.68
Qwen 3 235B A22B Thinking 2507 Alibaba
52.97
59.4
68.97
6.67
73.39
52.18
69.52
40.64
DeepSeek V3.2 DeepSeek
51.84
44.25
75.69
46.67
63.95
45.03
64.24
23.06
Claude 4 Sonnet Anthropic
50.98
39.67
80.74
38.33
60.36
44.07
71.01
22.68
Qwen 3 Next 80B A3B Thinking Alibaba
50.41
58.16
60.66
8.33
74.26
53.58
56.31
41.54
DeepSeek V3.2 Exp DeepSeek
49.85
45.5
73.19
36.67
64.38
44.26
65.6
19.33
GLM 5V Turbo Z.AI
49.62
56.11
73.9
3.33
70.41
54.13
62.28
27.2
GPT-5.2 No Thinking OpenAI
48.91
42.8
76.45
40.0
58.25
47.68
49.97
27.2
Qwen 3 235B A22B Instruct 2507 Alibaba
48.84
58.43
69.61
13.33
68.03
44.72
66.07
21.72
GPT-5 Nano High OpenAI
48.62
40.29
62.39
23.33
68.41
43.41
46.84
55.7
Qwen 3 Next 80B A3B Instruct Alibaba
48.35
54.75
68.2
10.0
70.18
49.78
66.34
19.19
Kimi K2 Instruct Moonshot AI
48.1
42.23
74.28
31.67
58.15
43.34
66.69
20.36
Gemini 2.5 Flash (Max Thinking) (2025-06-05) Google
47.74
44.64
66.03
16.67
68.75
47.31
62.27
28.5
GPT OSS 120b OpenAI
46.09
39.21
60.21
16.67
68.87
38.8
48.59
50.29
Claude Haiku 4.5 Anthropic
45.33
33.94
72.17
33.33
57.97
45.13
57.05
17.75
Grok Code Fast xAI
45.13
42.3
64.44
33.33
56.01
48.99
48.56
22.27
Qwen 3 32B Alibaba
43.56
48.25
66.03
3.33
67.44
46.54
55.54
17.77
GPT-5.1 No Thinking OpenAI
42.65
26.81
77.48
28.33
44.51
44.07
53.84
23.5
Gemini 2.5 Flash Lite (Max Thinking) (2025-06-17) Google
42.56
43.34
66.41
5.0
61.04
47.04
51.98
23.08
Gemini 2.5 Flash Lite (Max Thinking) (2025-09-25) Google
42.39
36.16
65.39
1.67
64.9
47.88
52.6
28.11
Devstral 2 Mistral
41.24
27.74
66.79
43.33
52.52
39.14
45.67
13.5
GLM 4.6V Z.AI
40.07
37.22
64.24
3.33
62.5
46.41
49.74
17.06
Grok 4.20 Beta (Non-Reasoning) xAI
39.7
25.63
58.54
38.33
45.52
43.48
42.04
24.35
Qwen 3 30B A3B Alibaba
39.01
36.68
48.88
1.67
65.35
44.92
54.47
21.11
Elephant Alpha OpenRouter
35.97
39.98
56.69
1.67
57.5
38.53
27.75
29.65
Grok 4.1 Fast (Non-Reasoning) xAI
33.45
23.35
54.26
10.0
38.92
40.61
50.01
16.98
Trinity Large Preview Arcee
32.74
20.61
65.65
3.33
44.93
40.33
42.15
12.19
Nemotron 3 Super 120B A12B NVIDIA
32.51
34.39
54.07
23.0
36.43
21.23
30.04
28.41
LiveBench 数据来源:LiveBench
LiveBench 发布说明:LiveBench 发布说明
关于 ·  帮助 ·  PING ·  隐私 ·  条款   
OA0 - Omni AI 0 一个探索 AI 的社区
沪ICP备2024103595号-2
耗时 19 ms
Developed with Cursor