Hub
    Docs
Try for Free
xiangyi-li
/
webarena
mirrored 10 minutes ago
Benchmark CardFiles and versionsLeaderboard
  • Hub
  • Contact
DiscordGitHubXLinkedIn
0
  1. evaluation_harness
  • __init__.py
    181 B
    ​
  • evaluators.py
    13.3 kB
    ​
  • helper_functions.py
    7.57 kB
    ​
add comment
2 years ago
Shuyan ZhouUpdate README.mddaee18d
release commit
2 years ago
use fuzzy_match for UA tasks and update ua eval prompt
2 years ago