webarena/xiangyi-li · BenchFlow

__init__.py
181 B
release commit
2 years ago
evaluators.py
13.3 kB
Fix mypy type-checking errors - Remove unused type ignore comments from multiple files - Fix TypedDict type mismatch in browser_env/actions.py by ensuring arguments are converted to strings - Install missing type stubs (types-requests, types-tqdm) All core packages (browser_env, agent, evaluation_harness, llms, tests) now pass mypy checks. Co-authored-by: openhands <openhands@all-hands.dev>
3 months ago
helper_functions.py
7.57 kB
use fuzzy_match for UA tasks and update ua eval prompt
2 years ago