Hub
    Docs
Try for Free
xiangyi-li
/
OS-World
mirrored 13 minutes ago
Benchmark CardFiles and versionsLeaderboard
  • Hub
  • Contact
DiscordGitHubXLinkedIn
0
  • README.md
    7.76 kB
    ​
  • __init__.py
    108 B
    ​
  • getters
    -
    ​
  • metrics
    -
    ​
Increase timeout for page load stability in Chrome evaluator - Updated the timeout for the page load state from 10 seconds to 60 seconds to ensure better stability during page processing. - Removed redundant retry mechanisms from the active tab checks to streamline the code while maintaining existing functionality. - Enhanced logging to provide clearer insights into the page loading process. These changes aim to improve the reliability of the Chrome evaluator without altering the core logic.
15 days ago
ver Dec22nd re-organized the evaluator structure to improve the extensibility
2 years ago
feat: enhance image comparison functionality in gimp.py - Added resizing logic to handle images of different sizes before comparison, ensuring consistent evaluation. - Implemented mode conversion to ensure both images are in the same format for accurate comparison. - Enhanced structure check by MSE to support conversion of numpy arrays to PIL Images, improving compatibility. - Maintained existing logic while improving robustness and accuracy of image comparison methods.
4 days ago
yuanmengqifeat: enhance run_coact.py with logging and configuration options - Added logging configuration to capture runtime logs in both file and console with adjustable log levels. - Introduced new command-line arguments for provider name, region, and client password to improve flexibility and security. - Updated process_task function to accommodate new parameters, ensuring compatibility with existing logic. - Modified prompt templates in coding_agent.py and cua_agent.py to use the client password placeholder for enhanced security. 84f407a
  1. /
  2. desktop_env
  3. evaluators
Clean code; Add todos in desktop_env README
a year ago