Hub
Docs
Try for Free
xiangyi-li
/
webarena
mirrored 10 minutes ago
Benchmark Card
Files and versions
Leaderboard
like
0
main
evaluation_harness
__init__.py
181 B
evaluators.py
13.3 kB
helper_functions.py
7.57 kB
add comment
2 years ago
Shuyan Zhou
Update README.md
daee18d
release commit
2 years ago
use fuzzy_match for UA tasks and update ua eval prompt
2 years ago