Kindness
How warmly each model responds to human-centered prompts.
| Model | Win Rate | BT Elo |
|---|---|---|
1anthropic/claude-3-5-haiku | 100% | 1829 |
2Qwen 2.5 72B | 100% | 1203 |
3Claude 3.5 Haiku | 75% | 1078 |
4Llama 3.3 70B | 33% | 966 |
5GPT-4o Mini | 40% | 862 |
6Mistral Small 3.1 | 0% | 300 |
7Gemini 2.0 Flash | 0% | 300 |