Leaderboard

How frontier models rank on human values.

Rankings are built from human pairwise votes. In each battle, two models respond to the same prompt — voters choose which response better embodies the target value.

4 value dimensions · 6 modelsRead the EigenBench paper →Vote in a Battle

EigenBench rankings

Bradley-Terry Elo computed from human pairwise votes.

Updated every 5 min · live from votes

Kindness

How warmly each model responds to human-centered prompts.

ModelWin RateBT Elo
1anthropic/claude-3-5-haiku
100%1829
2Qwen 2.5 72B
100%1203
3Claude 3.5 Haiku
75%1078
4Llama 3.3 70B
33%966
5GPT-4o Mini
40%862
6Mistral Small 3.1
0%300
7Gemini 2.0 Flash
0%300

Conservatism

Preference for stability, institutions, and incremental change.

ModelWin RateBT Elo
1Llama 3.3 70B
100%1658
2Qwen 2.5 72B
100%1658
3Gemini 2.0 Flash
25%855
4GPT-4o Mini
33%300
5Claude 3.5 Haiku
50%300

Deep Ecology

Alignment with ecological stewardship and planet-first values.

Not enough votes yet — keep judging to unlock rankings.

Loyalty

Commitment to sustained relationships and group solidarity.

ModelWin RateBT Elo
1Gemini 2.0 Flash
100%1765
2Qwen 2.5 72B
100%1205
3Claude 3.5 Haiku
100%1205
4Llama 3.3 70B
50%971
5Mistral Small 3.1
0%300

Cross-dimension performance

How each model ranks across all four value dimensions. Darker = higher ranked.

ModelKindnessConservatismDeep EcologyLoyaltyAvg Rank
Qwen 2.5 72B
#2
1203
#2
1658
#2
1205
2.0
Llama 3.3 70B
#4
966
#1
1658
#4
971
3.0
Claude 3.5 Haiku
#3
1078
#5
300
#3
1205
3.7
Gemini 2.0 Flash
#7
300
#3
855
#1
1765
3.7
GPT-4o Mini
#5
862
#4
300
4.5
Mistral Small 3.1
#6
300
#5
300
5.5

Head-to-head results

Win rate of the row model vs each column opponent. Ties are split 50/50. Darker = higher win rate.

Kindness

Row wins vs column

4oMiHaikGmniLlmaMistQwen
GPT-4o Mini
0%
3v
100%
1v
100%
1v
Claude 3.5 Haiku
100%
3v
0%
1v
Gemini 2.0 Flash
0%
1v
Llama 3.3 70B
0%
1v
100%
1v
Mistral Small 3.1
0%
1v
Qwen 2.5 72B
100%
1v

Conservatism

Row wins vs column

4oMiHaikGmniLlmaMistQwen
GPT-4o Mini
50%
2v
0%
1v
Claude 3.5 Haiku
50%
2v
Gemini 2.0 Flash
100%
1v
0%
2v
0%
1v
Llama 3.3 70B
100%
2v
Mistral Small 3.1
Qwen 2.5 72B
100%
1v

Deep Ecology

Row wins vs column

4oMiHaikGmniLlmaMistQwen
GPT-4o Mini
Claude 3.5 Haiku
Gemini 2.0 Flash
Llama 3.3 70B
Mistral Small 3.1
Qwen 2.5 72B

Loyalty

Row wins vs column

4oMiHaikGmniLlmaMistQwen
GPT-4o Mini
Claude 3.5 Haiku
100%
1v
Gemini 2.0 Flash
100%
1v
Llama 3.3 70B
0%
1v
100%
1v
Mistral Small 3.1
0%
1v
0%
1v
0%
1v
Qwen 2.5 72B
100%
1v

Contribute

Your votes shape the rankings.

Head to the battle page, read two model responses side by side, and pick which one better reflects the target value. Every vote updates the leaderboard.

Go to battle →