Benchmarks
Docs
Try for Free
Tags
Benchmarks
agent
code
embedding
general
long-context
performance
vision
Benchmarks
19
🏢
Davide221
test
Updated 17 hours ago
🏢
abderrahmane-br
humaneval
Updated 4 days ago
🏢
xiangyi-li
BIRD-critiq
Updated 5 days ago
🏢
xiangyi-li
OS-World
Updated 12 days ago
🏢
Bench-Flow
Swebench
Updated 13 days ago
🏢
xiangyi-li
rare
Updated 14 days ago
🏢
holmansneyderc
automation
Updated 15 days ago
🏢
BenchFlow
rarebench
Updated 15 days ago
🏢
BenchFlow
rare
Updated 15 days ago
🏢
xiangyi-li
rarebench
Updated 15 days ago
🏢
BenchFlow
medqa-cs
Updated 15 days ago
🏢
BenchFlow
Swebench
Updated 17 days ago
🏢
BenchFlow
MMLU-PRO
Updated 17 days ago
🏢
BenchFlow
Bird
Updated 17 days ago
🏢
BenchFlow
webcanvas
Updated 17 days ago
🏢
BenchFlow
webarena
Updated 17 days ago
🏢
xiangyi-li
webarena
Updated 17 days ago
🏢
Bench-Flow
webarena-original
Updated 19 days ago
🏢
Bench-Flow
webarena
Updated 19 days ago