Back to Benchmarks
π»
Coding
Algorithms, debugging, code review
9 modelsWeekly updates
Task Examples
Example Tasks in This Category
Easy
URL Parser
Parse a URL string and extract its components: protocol, domain, path, and query parameters.
Easy
FizzBuzz
Classic programming exercise: print numbers 1-15, replacing multiples of 3 with 'Fizz', multiples of 5 with 'Buzz', and multiples of both with 'FizzBuzz'.
Hard
Debug Stack Trace
Analyze a stack trace and identify the root cause of an error.
Model Rankings
View Methodology β| Rank | Model | Score | Price/1M | Tasks | |
|---|---|---|---|---|---|
| π₯ | Qwen3 235B | 94.7 | $0.60 | 12 | |
| π₯ | Qwen3 Max | 94.0 | $1.60 | 12 | |
| π₯ | DeepSeek R1 | 93.8 | $2.19 | 12 | |
| 4 | Claude 3.5 Haiku | 93.5 | $4.00 | 12 | |
| 5 | GPT-4o | 93.2 | $10.00 | 12 | |
| 6 | Gemini 2.0 Flash | 92.7 | $0.40 | 12 | |
| 7 | Claude 3.5 Sonnet | 92.5 | $15.00 | 12 | |
| 8 | GPT-4o Mini | 92.3 | $0.60 | 12 | |
| 9 | Llama 3.3 70B | 92.0 | $0.40 | 12 |