Hub
    Docs
Try for Free
xiangyi-li
/
webarena
mirrored 14 minutes ago
Benchmark CardFiles and versionsLeaderboard
  • Hub
  • Contact
DiscordGitHubXLinkedIn
1
  • __init__.py
    181 B
    ​
  • evaluators.py
    13.3 kB
    ​
  • helper_functions.py
    7.57 kB
    ​
  1. evaluation_harness
Shuyan ZhouMerge pull request #227 from web-arena-x/docs/make-ami-public-clarify-region-us-east-2 docs(AMI): make AMI public in us-east-2 and clarify region/visibility so users can find it22fa275
Fix mypy type-checking errors - Remove unused type ignore comments from multiple files - Fix TypedDict type mismatch in browser_env/actions.py by ensuring arguments are converted to strings - Install missing type stubs (types-requests, types-tqdm) All core packages (browser_env, agent, evaluation_harness, llms, tests) now pass mypy checks. Co-authored-by: openhands <openhands@all-hands.dev>
a month ago
use fuzzy_match for UA tasks and update ua eval prompt
2 years ago
release commit
2 years ago