Kindness
How warmly each model responds to human-centered prompts.
| Model | Win Rate | BT Elo |
|---|---|---|
1anthropic/claude-3-5-haiku | 100% | 1620 |
2GPT-4o Mini | 100% | 1620 |
3Llama 3.3 70B | 0% | 300 |
4Mistral Small 3.1 | 0% | 300 |
Leaderboard
Rankings are built from human pairwise votes. In each battle, two models respond to the same prompt — voters choose which response better embodies the target value. Bradley-Terry scoring converts those votes into Elo ratings across four dimensions.
Bradley-Terry Elo computed from human pairwise votes.
Updated every 5 min · live from votes
How warmly each model responds to human-centered prompts.
| Model | Win Rate | BT Elo |
|---|---|---|
1anthropic/claude-3-5-haiku | 100% | 1620 |
2GPT-4o Mini | 100% | 1620 |
3Llama 3.3 70B | 0% | 300 |
4Mistral Small 3.1 | 0% | 300 |
Preference for stability, institutions, and incremental change.
| Model | Win Rate | BT Elo |
|---|---|---|
1Llama 3.3 70B | 200% | 1741 |
2Gemini 2.0 Flash | 50% | 300 |
3GPT-4o Mini | 50% | 300 |
4Claude 3.5 Haiku | 100% | 300 |
Alignment with ecological stewardship and planet-first values.
Not enough votes yet — keep judging to unlock rankings.
Commitment to sustained relationships and group solidarity.
| Model | Win Rate | BT Elo |
|---|---|---|
1Qwen 2.5 72B | 100% | 1570 |
2Claude 3.5 Haiku | 100% | 1570 |
3Mistral Small 3.1 | 0% | 300 |
How each model ranks across all four value dimensions. Darker = higher ranked.
| Model | Kindness | Conservatism | Deep Ecology | Loyalty | Avg Rank |
|---|---|---|---|---|---|
| Qwen 2.5 72B | — | — | — | #1 1570 | 1.0 |
| Gemini 2.0 Flash | — | #2 300 | — | — | 2.0 |
| Llama 3.3 70B | #3 300 | #1 1741 | — | — | 2.0 |
| GPT-4o Mini | #2 1620 | #3 300 | — | — | 2.5 |
| Claude 3.5 Haiku | — | #4 300 | — | #2 1570 | 3.0 |
| Mistral Small 3.1 | #4 300 | — | — | #3 300 | 3.5 |
Win rate of the row model vs each column opponent. Ties are split 50/50. Darker = higher win rate.
Row wins vs column
| 4oMi | Haik | Gmni | Llma | Mist | Qwen | |
|---|---|---|---|---|---|---|
| GPT-4o Mini | — | — | — | 100% 1v | 100% 1v | — |
| Claude 3.5 Haiku | — | — | — | — | — | — |
| Gemini 2.0 Flash | — | — | — | — | — | — |
| Llama 3.3 70B | 0% 1v | — | — | — | — | — |
| Mistral Small 3.1 | 0% 1v | — | — | — | — | — |
| Qwen 2.5 72B | — | — | — | — | — | — |
Row wins vs column
| 4oMi | Haik | Gmni | Llma | Mist | Qwen | |
|---|---|---|---|---|---|---|
| GPT-4o Mini | — | 50% 2v | 0% 1v | — | — | — |
| Claude 3.5 Haiku | 50% 2v | — | — | — | — | — |
| Gemini 2.0 Flash | 100% 1v | — | — | 0% 2v | — | — |
| Llama 3.3 70B | — | — | 100% 2v | — | — | — |
| Mistral Small 3.1 | — | — | — | — | — | — |
| Qwen 2.5 72B | — | — | — | — | — | — |
Row wins vs column
| 4oMi | Haik | Gmni | Llma | Mist | Qwen | |
|---|---|---|---|---|---|---|
| GPT-4o Mini | — | — | — | — | — | — |
| Claude 3.5 Haiku | — | — | — | — | — | — |
| Gemini 2.0 Flash | — | — | — | — | — | — |
| Llama 3.3 70B | — | — | — | — | — | — |
| Mistral Small 3.1 | — | — | — | — | — | — |
| Qwen 2.5 72B | — | — | — | — | — | — |
Row wins vs column
| 4oMi | Haik | Gmni | Llma | Mist | Qwen | |
|---|---|---|---|---|---|---|
| GPT-4o Mini | — | — | — | — | — | — |
| Claude 3.5 Haiku | — | — | — | — | 100% 1v | — |
| Gemini 2.0 Flash | — | — | — | — | — | — |
| Llama 3.3 70B | — | — | — | — | — | — |
| Mistral Small 3.1 | — | 0% 1v | — | — | — | 0% 1v |
| Qwen 2.5 72B | — | — | — | — | 100% 1v | — |
Contribute
Head to the battle page, read two model responses side by side, and pick which one better reflects the target value. Every vote updates the leaderboard.