Benchmarks
Docs
Start Benchmarking
Tags
Benchmarks
Benchmarks
18
🏢
abderrahmane-br
humaneval
Updated 3 days ago
🏢
xiangyi-li
BIRD-critiq
Updated 4 days ago
🏢
xiangyi-li
OS-World
Updated 12 days ago
🏢
Bench-Flow
Swebench
Updated 12 days ago
🏢
xiangyi-li
rare
Updated 14 days ago
🏢
holmansneyderc
automation
Updated 14 days ago
🏢
BenchFlow
rarebench
Updated 14 days ago
🏢
BenchFlow
rare
Updated 14 days ago
🏢
xiangyi-li
rarebench
Updated 14 days ago
🏢
BenchFlow
medqa-cs
Updated 14 days ago
🏢
BenchFlow
Swebench
Updated 16 days ago
🏢
BenchFlow
MMLU-PRO
Updated 16 days ago
🏢
BenchFlow
Bird
Updated 16 days ago
🏢
BenchFlow
webcanvas
Updated 16 days ago
🏢
BenchFlow
webarena
Updated 16 days ago
🏢
xiangyi-li
webarena
Updated 16 days ago
🏢
Bench-Flow
webarena-original
Updated 18 days ago
🏢
Bench-Flow
webarena
Updated 18 days ago
agent
code
embedding
general
long-context
performance
vision