Hub
Docs
Try for Free
Benchmark Hub
Featured Benchmarks
VibeCode Arena
🏢
BenchFlow
code
Pokemon Gym
🏢
BenchFlow
reasoning
JFK Arena
🏢
BenchFlow
retrieval
PaperBench
🏢
OpenAI
agent
WebArena
🏢
Carnegie Mellon University
agent
SWE-Bench
🏢
Princeton NLP
code
RareBench
🏢
chenxz1111
knowledge
Bird-SQL
🏢
AlibabaResearch
code
MedQA-CS
🏢
Bio-NLP
knowledge
WebCanvas
🏢
iMeanAI
agent
MMLU-Pro
🏢
TIGER-AI-Lab
knowledge
All Benchmarks
agent
code
commonsense
embedding
general
knowledge
language
long-context
multimodal
performance
reasoning
retrieval
safety
tool-calling
vision
All Benchmarks
63
🏢
alhridoy
test
Updated 19 days ago
0
🏢
BenchFlow
simple-qa
Updated 20 days ago
1
🏢
BenchFlow
TaskBench
agent
tool-calling
multimodal
Updated a month ago
0
🏢
BenchFlow
EQBench
reasoning
language
commonsense
...
Updated a month ago
0
🏢
BenchFlow
TauBench
agent
tool-calling
reasoning
...
Updated a month ago
0
🏢
BenchFlow
AIME2024
knowledge
performance
reasoning
Updated a month ago
0
🏢
BenchFlow
OSWorld
agent
tool-calling
multimodal
Updated a month ago
0
🏢
BenchFlow
BIGBenchHard
commonsense
reasoning
knowledge
Updated a month ago
0
🏢
BenchFlow
MGSM
reasoning
language
commonsense
...
Updated a month ago
0
🏢
BenchFlow
GSM8K
knowledge
commonsense
language
Updated a month ago
0
🏢
BenchFlow
WMDP
safety
performance
knowledge
...
Updated a month ago
0
🏢
BenchFlow
SecQA
knowledge
performance
reasoning
...
Updated a month ago
1
🏢
BenchFlow
Mind2Web
agent
tool-calling
Updated a month ago
0
🏢
BenchFlow
AssistantBench
agent
tool-calling
reasoning
Updated a month ago
0
🏢
BenchFlow
MBPP
code
Updated a month ago
0
🏢
BenchFlow
DS-1000
code
Updated a month ago
0
🏢
BenchFlow
APPS
code
Updated a month ago
0
🏢
BenchFlow
HELMET
long-context
Updated a month ago
0
🏢
BenchFlow
Loft
long-context
Updated a month ago
0
🏢
BenchFlow
BabiLong
long-context
Updated a month ago
0
🏢
BenchFlow
InfiniteBench
long-context
Updated a month ago
0
🏢
BenchFlow
MMGenBench
vision
multimodal
reasoning
Updated a month ago
0
🏢
BenchFlow
StableToolBench
tool-calling
agent
Updated a month ago
0
🏢
BenchFlow
Router-Bench
agent
Updated a month ago
0
🏢
BenchFlow
Nexus-Bench
agent
tool-calling
Updated a month ago
0
🏢
BenchFlow
Hotpotqa
reasoning
language
Updated a month ago
0
🏢
BenchFlow
MMOCR
vision
Updated a month ago
0
🏢
BenchFlow
Beir
retrieval
Updated a month ago
0
🏢
BenchFlow
CodeXGLUE
code
performance
Updated a month ago
0
🏢
BenchFlow
BigBench
general
Updated a month ago
0
🏢
BenchFlow
Alexarena
agent
multimodal
Updated a month ago
0
🏢
BenchFlow
MEGABench
multimodal
performance
Updated a month ago
0
🏢
BenchFlow
MobileAIBench
performance
code
Updated a month ago
0
🏢
BenchFlow
Spec-Bench
tool-calling
performance
language
Updated a month ago
0
🏢
BenchFlow
TruthfulQA
safety
Updated a month ago
0
🏢
BenchFlow
SuperGLUE
language
reasoning
Updated a month ago
0
🏢
BenchFlow
MMLU
reasoning
knowledge
Updated a month ago
0
🏢
BenchFlow
HumanEval
code
Updated a month ago
0
🏢
BenchFlow
HellaSwag
reasoning
commonsense
Updated a month ago
0
🏢
BenchFlow
HELM
performance
reasoning
safety
Updated a month ago
0
🏢
BenchFlow
LegalBench
reasoning
knowledge
Updated a month ago
0
🏢
BenchFlow
Agentbench
agent
reasoning
Updated a month ago
0
🏢
BenchFlow
SWE-bench-Multimodal
agent
code
tool-calling
Updated a month ago
0
🏢
BenchFlow
MLE-bench
agent
code
knowledge
Updated a month ago
0
🏢
Lilaoba
test
Updated a month ago
0
🏢
Allen
test
Updated a month ago
0
🏢
BenchFlow
PokemonGym
agent
tool-calling
vision
...
Updated 2 months ago
0
🏢
Davide221
test
Updated 2 months ago
0
🏢
abderrahmane-br
humaneval
Updated 2 months ago
1
🏢
xiangyi-li
BIRD-critiq
Updated 2 months ago
0
🏢
xiangyi-li
OS-World
Updated 2 months ago
0
🏢
xiangyi-li
rare
Updated 2 months ago
0
🏢
holmansneyderc
automation
Updated 2 months ago
0
🏢
BenchFlow
rarebench
knowledge
general
Updated 2 months ago
0
🏢
BenchFlow
rare
Updated 2 months ago
0
🏢
xiangyi-li
rarebench
Updated 2 months ago
0
🏢
BenchFlow
medqa-cs
knowledge
general
reasoning
Updated 2 months ago
0
🏢
BenchFlow
Swebench
agent
code
Updated 2 months ago
0
🏢
BenchFlow
MMLU-PRO
general
knowledge
language
Updated 2 months ago
0
🏢
BenchFlow
Bird
tool-calling
code
agent
Updated 2 months ago
0
🏢
BenchFlow
webcanvas
agent
tool-calling
vision
Updated 2 months ago
0
🏢
BenchFlow
webarena
agent
tool-calling
vision
Updated 2 months ago
0
🏢
xiangyi-li
webarena
Updated 2 months ago
0