Leaderboard

How frontier models rank on human values.

Rankings are built from human pairwise votes. In each battle, two models respond to the same prompt — voters choose which response better embodies the target value. Bradley-Terry scoring converts those votes into Elo ratings across four dimensions.

4 value dimensions · 6 modelsRead the EigenBench paper →Vote in a Battle

EigenBench rankings

Bradley-Terry Elo computed from human pairwise votes.

Updated every 5 min · live from votes

Kindness

How warmly each model responds to human-centered prompts.

ModelWin RateBT Elo
1anthropic/claude-3-5-haiku
100%1620
2GPT-4o Mini
100%1620
3Llama 3.3 70B
0%300
4Mistral Small 3.1
0%300

Conservatism

Preference for stability, institutions, and incremental change.

ModelWin RateBT Elo
1Llama 3.3 70B
200%1741
2Gemini 2.0 Flash
50%300
3GPT-4o Mini
50%300
4Claude 3.5 Haiku
100%300

Deep Ecology

Alignment with ecological stewardship and planet-first values.

Not enough votes yet — keep judging to unlock rankings.

Loyalty

Commitment to sustained relationships and group solidarity.

ModelWin RateBT Elo
1Qwen 2.5 72B
100%1570
2Claude 3.5 Haiku
100%1570
3Mistral Small 3.1
0%300

Cross-dimension performance

How each model ranks across all four value dimensions. Darker = higher ranked.

ModelKindnessConservatismDeep EcologyLoyaltyAvg Rank
Qwen 2.5 72B
#1
1570
1.0
Gemini 2.0 Flash
#2
300
2.0
Llama 3.3 70B
#3
300
#1
1741
2.0
GPT-4o Mini
#2
1620
#3
300
2.5
Claude 3.5 Haiku
#4
300
#2
1570
3.0
Mistral Small 3.1
#4
300
#3
300
3.5

Head-to-head results

Win rate of the row model vs each column opponent. Ties are split 50/50. Darker = higher win rate.

Kindness

Row wins vs column

4oMiHaikGmniLlmaMistQwen
GPT-4o Mini
100%
1v
100%
1v
Claude 3.5 Haiku
Gemini 2.0 Flash
Llama 3.3 70B
0%
1v
Mistral Small 3.1
0%
1v
Qwen 2.5 72B

Conservatism

Row wins vs column

4oMiHaikGmniLlmaMistQwen
GPT-4o Mini
50%
2v
0%
1v
Claude 3.5 Haiku
50%
2v
Gemini 2.0 Flash
100%
1v
0%
2v
Llama 3.3 70B
100%
2v
Mistral Small 3.1
Qwen 2.5 72B

Deep Ecology

Row wins vs column

4oMiHaikGmniLlmaMistQwen
GPT-4o Mini
Claude 3.5 Haiku
Gemini 2.0 Flash
Llama 3.3 70B
Mistral Small 3.1
Qwen 2.5 72B

Loyalty

Row wins vs column

4oMiHaikGmniLlmaMistQwen
GPT-4o Mini
Claude 3.5 Haiku
100%
1v
Gemini 2.0 Flash
Llama 3.3 70B
Mistral Small 3.1
0%
1v
0%
1v
Qwen 2.5 72B
100%
1v

Contribute

Your votes shape the rankings.

Head to the battle page, read two model responses side by side, and pick which one better reflects the target value. Every vote updates the leaderboard.

Go to battle →