| 注册会员 | 1032 |
| 主题 | 361 |
| 模型 | 2962 |
| 技能包 | 6701 |
| 数据集 | 1026 |
| 论文 | 236 |
| 开源项目 | 319 |
| 模型 | 厂商 | 综合平均分 | 推理均值 | 编程均值 | 智能体编程均值 | 数学均值 | 数据分析均值 | 语言均值 | 指令遵循均值 |
|---|---|---|---|---|---|---|---|---|---|
| GPT-5.4 Thinking xHigh Effort |
80.28
|
88.12
|
77.54
|
70.0
|
94.15
|
79.31
|
82.63
|
70.22
|
|
| Gemini 3.1 Pro Preview High* |
79.93
|
84.0
|
76.45
|
65.0
|
91.04
|
78.54
|
85.38
|
79.1
|
|
| Claude 4.6 Opus Thinking High Effort |
76.33
|
88.67
|
78.18
|
61.67
|
89.32
|
69.89
|
83.27
|
63.31
|
|
| Claude 4.5 Opus Thinking High Effort |
75.96
|
80.09
|
79.65
|
63.33
|
90.39
|
74.44
|
81.26
|
62.55
|
|
| Claude 4.6 Sonnet Thinking Medium Effort |
75.47
|
84.77
|
79.27
|
60.0
|
86.99
|
77.95
|
76.1
|
63.22
|
|
| GPT-5.2 High |
74.84
|
83.21
|
76.07
|
51.67
|
93.17
|
78.16
|
79.81
|
61.77
|
|
| GPT-5.2 Codex |
74.3
|
77.71
|
83.62
|
51.67
|
88.77
|
78.2
|
73.68
|
66.45
|
|
| GPT-5.1 Codex Max High |
73.98
|
83.65
|
80.68
|
53.33
|
83.22
|
70.12
|
76.48
|
70.38
|
|
| Gemini 3 Pro Preview High |
73.39
|
77.42
|
74.6
|
55.0
|
81.84
|
74.39
|
84.62
|
65.85
|
|
| GPT-5.3 Codex High |
72.76
|
80.15
|
78.18
|
55.0
|
87.84
|
62.69
|
80.09
|
65.38
|
|
| Gemini 3 Flash Preview High |
72.4
|
74.55
|
73.9
|
40.0
|
84.17
|
74.77
|
84.56
|
74.86
|
|
| GPT-5.1 High |
72.04
|
78.79
|
72.49
|
53.33
|
86.9
|
69.61
|
79.26
|
63.9
|
|
| GPT-5 Pro |
70.48
|
81.69
|
72.11
|
51.67
|
86.17
|
57.04
|
80.69
|
63.96
|
|
| Kimi K2.5 Thinking |
69.07
|
75.96
|
77.86
|
48.33
|
84.87
|
61.36
|
77.67
|
57.41
|
|
| GLM 5 |
68.85
|
69.11
|
73.64
|
55.0
|
83.46
|
67.9
|
77.53
|
55.33
|
|
| GPT-5.1 Codex |
68.61
|
81.98
|
71.78
|
53.33
|
79.58
|
60.75
|
69.48
|
63.39
|
|
| Claude Sonnet 4.5 Thinking |
68.19
|
77.59
|
80.36
|
53.33
|
79.31
|
56.97
|
76.45
|
53.35
|
|
| Grok 4.20 Beta |
67.96
|
75.28
|
66.09
|
43.33
|
87.06
|
62.86
|
77.72
|
63.39
|
|
| GPT-5 Mini High |
65.91
|
68.32
|
68.2
|
46.67
|
82.2
|
55.2
|
75.52
|
65.27
|
|
| DeepSeek V3.2 Thinking |
62.2
|
77.17
|
64.62
|
40.0
|
85.03
|
50.0
|
70.41
|
48.19
|
|
| Grok 4 |
62.02
|
79.13
|
73.13
|
30.0
|
83.02
|
63.38
|
76.39
|
29.07
|
|
| Claude 4.1 Opus Thinking |
61.81
|
72.33
|
74.66
|
48.33
|
73.19
|
48.98
|
72.76
|
42.4
|
|
| Gemini 3.1 Flash Lite Preview High |
61.68
|
59.66
|
68.52
|
33.33
|
73.56
|
54.9
|
73.18
|
68.62
|
|
| Kimi K2 Thinking |
61.59
|
63.49
|
67.44
|
38.33
|
81.1
|
52.29
|
66.45
|
62.03
|
|
| Claude Haiku 4.5 Thinking |
61.32
|
61.68
|
72.81
|
41.67
|
77.53
|
59.3
|
66.45
|
49.78
|
|
| Claude 4 Sonnet Thinking |
61.27
|
69.01
|
77.48
|
40.0
|
70.5
|
54.63
|
72.91
|
44.34
|
|
| GPT-5.1 Codex Mini |
60.38
|
64.71
|
69.93
|
40.0
|
76.26
|
49.7
|
63.01
|
59.02
|
|
| Minimax M2.5 |
60.14
|
59.3
|
70.7
|
51.67
|
77.41
|
49.6
|
55.1
|
57.23
|
|
| GPT-5.3 Instant |
59.99
|
63.12
|
78.63
|
28.33
|
72.41
|
48.02
|
70.0
|
59.4
|
|
| Grok 4.1 Fast |
59.99
|
80.2
|
69.61
|
31.67
|
83.72
|
52.24
|
74.33
|
28.2
|
|
| Claude 4.5 Opus Medium Effort |
59.1
|
53.21
|
78.51
|
63.33
|
66.32
|
45.54
|
78.66
|
28.11
|
|
| DeepSeek V3.2 Exp Thinking |
58.9
|
64.37
|
70.06
|
31.67
|
82.4
|
51.5
|
71.06
|
41.27
|
|
| Gemini 2.5 Pro (Max Thinking) |
58.33
|
70.81
|
75.69
|
33.33
|
68.32
|
51.62
|
75.5
|
33.07
|
|
| GLM 4.7 |
58.09
|
59.73
|
73.13
|
41.67
|
76.02
|
55.17
|
65.23
|
35.66
|
|
| GLM 4.6 |
55.19
|
62.06
|
71.02
|
35.0
|
81.13
|
51.95
|
58.99
|
26.19
|
|
| Claude 4.1 Opus |
54.45
|
40.89
|
76.07
|
53.33
|
62.83
|
45.38
|
76.75
|
25.92
|
|
| Claude Sonnet 4.5 |
53.69
|
42.29
|
76.07
|
48.33
|
62.62
|
47.0
|
76.0
|
23.52
|
|
| Gemini 2.5 Flash (Max Thinking) (2025-09-25) |
53.09
|
51.45
|
67.5
|
23.33
|
75.35
|
60.98
|
65.34
|
27.68
|
|
| Qwen 3 235B A22B Thinking 2507 |
52.97
|
59.4
|
68.97
|
6.67
|
73.39
|
52.18
|
69.52
|
40.64
|
|
| DeepSeek V3.2 |
51.84
|
44.25
|
75.69
|
46.67
|
63.95
|
45.03
|
64.24
|
23.06
|
|
| Claude 4 Sonnet |
50.98
|
39.67
|
80.74
|
38.33
|
60.36
|
44.07
|
71.01
|
22.68
|
|
| Qwen 3 Next 80B A3B Thinking |
50.41
|
58.16
|
60.66
|
8.33
|
74.26
|
53.58
|
56.31
|
41.54
|
|
| DeepSeek V3.2 Exp |
49.85
|
45.5
|
73.19
|
36.67
|
64.38
|
44.26
|
65.6
|
19.33
|
|
| GPT-5.2 No Thinking |
48.91
|
42.8
|
76.45
|
40.0
|
58.25
|
47.68
|
49.97
|
27.2
|
|
| Qwen 3 235B A22B Instruct 2507 |
48.84
|
58.43
|
69.61
|
13.33
|
68.03
|
44.72
|
66.07
|
21.72
|
|
| GPT-5 Nano High |
48.62
|
40.29
|
62.39
|
23.33
|
68.41
|
43.41
|
46.84
|
55.7
|
|
| Qwen 3 Next 80B A3B Instruct |
48.35
|
54.75
|
68.2
|
10.0
|
70.18
|
49.78
|
66.34
|
19.19
|
|
| Kimi K2 Instruct |
48.1
|
42.23
|
74.28
|
31.67
|
58.15
|
43.34
|
66.69
|
20.36
|
|
| Gemini 2.5 Flash (Max Thinking) (2025-06-05) |
47.74
|
44.64
|
66.03
|
16.67
|
68.75
|
47.31
|
62.27
|
28.5
|
|
| GPT OSS 120b |
46.09
|
39.21
|
60.21
|
16.67
|
68.87
|
38.8
|
48.59
|
50.29
|
|
| Claude Haiku 4.5 |
45.33
|
33.94
|
72.17
|
33.33
|
57.97
|
45.13
|
57.05
|
17.75
|
|
| Grok Code Fast |
45.13
|
42.3
|
64.44
|
33.33
|
56.01
|
48.99
|
48.56
|
22.27
|
|
| Qwen 3 32B |
43.56
|
48.25
|
66.03
|
3.33
|
67.44
|
46.54
|
55.54
|
17.77
|
|
| GPT-5.1 No Thinking |
42.65
|
26.81
|
77.48
|
28.33
|
44.51
|
44.07
|
53.84
|
23.5
|
|
| Gemini 2.5 Flash Lite (Max Thinking) (2025-06-17) |
42.56
|
43.34
|
66.41
|
5.0
|
61.04
|
47.04
|
51.98
|
23.08
|
|
| Gemini 2.5 Flash Lite (Max Thinking) (2025-09-25) |
42.39
|
36.16
|
65.39
|
1.67
|
64.9
|
47.88
|
52.6
|
28.11
|
|
| Devstral 2 |
41.24
|
27.74
|
66.79
|
43.33
|
52.52
|
39.14
|
45.67
|
13.5
|
|
| GLM 4.6V |
40.07
|
37.22
|
64.24
|
3.33
|
62.5
|
46.41
|
49.74
|
17.06
|
|
| Grok 4.20 Beta (Non-Reasoning) |
39.7
|
25.63
|
58.54
|
38.33
|
45.52
|
43.48
|
42.04
|
24.35
|
|
| Qwen 3 30B A3B |
39.01
|
36.68
|
48.88
|
1.67
|
65.35
|
44.92
|
54.47
|
21.11
|
|
| Grok 4.1 Fast (Non-Reasoning) |
33.45
|
23.35
|
54.26
|
10.0
|
38.92
|
40.61
|
50.01
|
16.98
|
|
| Trinity Large Preview | Arcee |
32.74
|
20.61
|
65.65
|
3.33
|
44.93
|
40.33
|
42.15
|
12.19
|