Hub
    Docs
Try for Free
xiangyi-li
/
OS-World
mirrored 9 minutes ago
Benchmark CardFiles and versionsLeaderboard
  • Hub
  • Contact
DiscordGitHubXLinkedIn
0
  • .envrc
    207 B
    ​
  • .gitignore
    2.64 kB
    ​
  • .mise.toml
    81 B
    ​
  • ACCOUNT_GUIDELINE.md
    9.3 kB
    ​
  • CONTRIBUTION.md
    -
    ​
  • LICENSE
    11.3 kB
    ​
  • PROXY_GUIDELINE.md
    5.98 kB
    ​
  • PUBLIC_EVALUATION_GUIDELINE.md
    14.7 kB
    ​
  • README.md
    15.5 kB
    ​
  • ROADMAP.md
    2.31 kB
    ​
  • assets
    -
    ​
  • desktop_env
    -
    ​
  • evaluation_examples
    -
    ​
  • lib_run_single.py
    8.52 kB
    ​
  • main.py
    3.22 kB
    ​
  • mm_agents
    -
    ​
  • monitor
    -
    ​
  • requirements.txt
    759 B
    ​
  • run.py
    11.7 kB
    ​
  • run_coact.py
    14.8 kB
    ​
  • run_multienv.py
    20.4 kB
    ​
  • run_multienv_aguvis.py
    13.5 kB
    ​
  • run_multienv_claude.py
    20.7 kB
    ​
  • run_multienv_gta1.py
    19.8 kB
    ​
  • run_multienv_o3.py
    19.9 kB
    ​
  • run_multienv_openaicua.py
    20.6 kB
    ​
  • run_multienv_opencua.py
    22.3 kB
    ​
  • run_multienv_qwen25vl.py
    20.4 kB
    ​
  • run_multienv_uitars.py
    20.5 kB
    ​
  • run_multienv_uitars15_v1.py
    22.5 kB
    ​
  • run_multienv_uitars15_v2.py
    20.9 kB
    ​
  • setup.py
    2.82 kB
    ​
  • show_result.py
    3.27 kB
    ​
[completely optional] direnv+mise autosetup (#87) Makes life a lot easier in my experience.
9 months ago
[completely optional] direnv+mise autosetup (#87) Makes life a lot easier in my experience.
9 months ago
feat&refactor: add proxy setup functionality and update .gitignore for proxy config file
2 months ago
add GDrive guideline
2 months ago
yuanmengqifeat: enhance run_coact.py with logging and configuration options - Added logging configuration to capture runtime logs in both file and console with adjustable log levels. - Introduced new command-line arguments for provider name, region, and client password to improve flexibility and security. - Updated process_task function to accommodate new parameters, ensuring compatibility with existing logic. - Modified prompt templates in coding_agent.py and cua_agent.py to use the client password placeholder for enhanced security. 84f407a
Clean Code; Refactor README
a year ago
PROXY_GUIDELINE.md Updates by Changyu Pang from Tsinghua (#41) * fix proxy readme * Add logs directory with .gitignore * Update PROXY_GUIDELINE.md
a year ago
fix: remove unnecessary sleep and observation retrieval in run_single_example function
6 days ago
Update LICENSE
a year ago
fix: correct IP address return logic in AWSProvider - Reverted the return value in the AWSProvider class to use private IP address instead of public IP address. - Ensured that the logic remains intact while addressing the specific requirement for VNC access.
17 hours ago
feat: refactor run_multienv_qwen25vl.py and qwen25vl_agent.py for improved logging and task management - Introduced signal handling for graceful shutdown of environments and processes. - Enhanced logging configuration to support dynamic log levels and structured output. - Updated argument parsing to include new parameters for model selection and task execution. - Refactored task distribution logic to streamline environment task management. - Improved error handling during task execution and environment cleanup. - Adjusted Qwen25VLAgent initialization to support new model and thought prefix options. - Reduced max tries for LLM calls to optimize performance.
9 days ago
feat: add client password argument to multiple agents and scripts - Introduced `--client_password` argument in `run_multienv_aguvis.py`, `run_multienv_claude.py`, and `run_multienv_gta1.py` for enhanced security and flexibility. - Updated agent classes (`PromptAgent`, `AguvisAgent`, `GTA1Agent`) to accept and utilize `client_password` for improved configuration. - Modified evaluation guidelines to reflect the new client password requirement. - Ensured existing logic remains intact while enhancing functionality for better user experience.
4 days ago
feat: add client password argument to multiple agents and scripts - Introduced `--client_password` argument in `run_multienv_aguvis.py`, `run_multienv_claude.py`, and `run_multienv_gta1.py` for enhanced security and flexibility. - Updated agent classes (`PromptAgent`, `AguvisAgent`, `GTA1Agent`) to accept and utilize `client_password` for improved configuration. - Modified evaluation guidelines to reflect the new client password requirement. - Ensured existing logic remains intact while enhancing functionality for better user experience.
4 days ago
feat: add client password argument to multiple agents and scripts - Introduced `--client_password` argument in `run_multienv_aguvis.py`, `run_multienv_claude.py`, and `run_multienv_gta1.py` for enhanced security and flexibility. - Updated agent classes (`PromptAgent`, `AguvisAgent`, `GTA1Agent`) to accept and utilize `client_password` for improved configuration. - Modified evaluation guidelines to reflect the new client password requirement. - Ensured existing logic remains intact while enhancing functionality for better user experience.
4 days ago
refactor: update command in JSON example to use placeholder for client password - Replaced the hardcoded password in the command with a placeholder `{CLIENT_PASSWORD}` for improved security and flexibility. - Ensured that the overall structure of the JSON remains unchanged while enhancing the example's usability.
17 hours ago
feat: enhance logging and signal handling in run_multienv_claude.py - Refactored logging configuration to support dynamic log levels via command-line arguments, allowing for better control over log verbosity. - Introduced a new signal handler for graceful shutdown of environments and processes, improving robustness during termination. - Added functionality to save command-line arguments to a JSON file for better traceability of execution parameters. - Maintained existing logic while enhancing the overall structure and error handling capabilities of the script.
4 days ago
fix: update Flask port configuration to support environment variable - Modified the Flask application to allow the port to be set via the `FLASK_PORT` environment variable, defaulting to 8080 if not specified. - Ensured existing application logic remains unchanged while enhancing configurability for deployment environments.
4 days ago
VirtualBox (#46) * Initailize aws support * Add README for the VM server * Refactor OSWorld for supporting more cloud services. * Initialize vmware and aws implementation v1, waiting for verification * Initlize files for azure, gcp and virtualbox support * Debug on the VMware provider * Fix on aws interface mapping * Fix instance type * Refactor * Clean * Add Azure provider * hk region; debug * Fix lock * Remove print * Remove key_name requirements when allocating aws vm * Clean README * Fix reset * Fix bugs * Add VirtualBox and Azure providers * Add VirtualBox OVF link * Raise exception on macOS host * Init RAEDME for VBox * Update VirtualBox VM download link * Update requirements and setup.py; Improve robustness on Windows * Fix network adapter * Go through on Windows machine * Add default adapter option * Fix minor error --------- Co-authored-by: Timothyxxx <384084775@qq.com> Co-authored-by: XinyuanWangCS <xywang626@gmail.com> Co-authored-by: Tianbao Xie <47296835+Timothyxxx@users.noreply.github.com>
a year ago
Enhance Public Evaluation Guidelines by adding new images for AWS setup and monitoring instructions. Included additional contact information for leaderboard updates and error reporting. Ensured clarity and usability for users while preserving existing content structure.
10 days ago
docs: add acknowledgements section in README.md - Included a new section to acknowledge institutions and students who contributed feedback and participated in fixes. - Enhanced recognition of collaborative efforts in the project while maintaining the existing structure of the README.
18 hours ago
Fix minor problems when aggragating the results (#106)
8 months ago
feat: enhance run_coact.py with logging and configuration options - Added logging configuration to capture runtime logs in both file and console with adjustable log levels. - Introduced new command-line arguments for provider name, region, and client password to improve flexibility and security. - Updated process_task function to accommodate new parameters, ensuring compatibility with existing logic. - Modified prompt templates in coding_agent.py and cua_agent.py to use the client password placeholder for enhanced security.
16 hours ago
feat: enhance run_coact.py with logging and configuration options - Added logging configuration to capture runtime logs in both file and console with adjustable log levels. - Introduced new command-line arguments for provider name, region, and client password to improve flexibility and security. - Updated process_task function to accommodate new parameters, ensuring compatibility with existing logic. - Modified prompt templates in coding_agent.py and cua_agent.py to use the client password placeholder for enhanced security.
16 hours ago
Uitars/dev (#291) * use aws pub ip * os task fix: set the default dim screen time to be 300s * add all the uitars agents: 1. run_multienv_uitars.py: Qwen2VL-based UITARS models 2. run_multienv_uitars15_v1.py: UITARS1.5-7B 3. run_multienv_uitars15_v2.py: SeedVL1.5 thining/non-thinking --------- Co-authored-by: Jiaqi <dengjiaqi@moonshot.cn>
21 hours ago
Uitars/dev (#291) * use aws pub ip * os task fix: set the default dim screen time to be 300s * add all the uitars agents: 1. run_multienv_uitars.py: Qwen2VL-based UITARS models 2. run_multienv_uitars15_v1.py: UITARS1.5-7B 3. run_multienv_uitars15_v2.py: SeedVL1.5 thining/non-thinking --------- Co-authored-by: Jiaqi <dengjiaqi@moonshot.cn>
21 hours ago
Uitars/dev (#291) * use aws pub ip * os task fix: set the default dim screen time to be 300s * add all the uitars agents: 1. run_multienv_uitars.py: Qwen2VL-based UITARS models 2. run_multienv_uitars15_v1.py: UITARS1.5-7B 3. run_multienv_uitars15_v2.py: SeedVL1.5 thining/non-thinking --------- Co-authored-by: Jiaqi <dengjiaqi@moonshot.cn>
21 hours ago
Uitars/dev (#291) * use aws pub ip * os task fix: set the default dim screen time to be 300s * add all the uitars agents: 1. run_multienv_uitars.py: Qwen2VL-based UITARS models 2. run_multienv_uitars15_v1.py: UITARS1.5-7B 3. run_multienv_uitars15_v2.py: SeedVL1.5 thining/non-thinking --------- Co-authored-by: Jiaqi <dengjiaqi@moonshot.cn>
21 hours ago
feat: add run_multienv_o3.py script for multi-environment evaluation - Introduced a new script `run_multienv_o3.py` to facilitate end-to-end evaluation across multiple environments. - Implemented command-line argument parsing for various configurations including environment settings, logging levels, and AWS parameters. - Integrated signal handling for graceful shutdown of environments and processes. - Enhanced logging capabilities for better traceability during execution. - Maintained existing logic from previous scripts while introducing new functionalities for improved evaluation processes.
4 days ago
feat: add run_multienv_o3.py script for multi-environment evaluation - Introduced a new script `run_multienv_o3.py` to facilitate end-to-end evaluation across multiple environments. - Implemented command-line argument parsing for various configurations including environment settings, logging levels, and AWS parameters. - Integrated signal handling for graceful shutdown of environments and processes. - Enhanced logging capabilities for better traceability during execution. - Maintained existing logic from previous scripts while introducing new functionalities for improved evaluation processes.
4 days ago
feat: add run_multienv_o3.py script for multi-environment evaluation - Introduced a new script `run_multienv_o3.py` to facilitate end-to-end evaluation across multiple environments. - Implemented command-line argument parsing for various configurations including environment settings, logging levels, and AWS parameters. - Integrated signal handling for graceful shutdown of environments and processes. - Enhanced logging capabilities for better traceability during execution. - Maintained existing logic from previous scripts while introducing new functionalities for improved evaluation processes.
4 days ago
fix some multi_apps tasks (#245) * fix chrome * fix some multi_apps tasks. * fix some multiapps tasks * fix some multiapps tasks --------- Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
21 days ago
Refactoring VMware Integration and Implementing AWS Support (#44) * Initailize aws support * Add README for the VM server * Refactor OSWorld for supporting more cloud services. * Initialize vmware and aws implementation v1, waiting for verification * Initlize files for azure, gcp and virtualbox support * Debug on the VMware provider * Fix on aws interface mapping * Fix instance type * Refactor * Clean * hk region; debug * Fix lock * Remove print * Remove key_name requirements when allocating aws vm * Clean README --------- Co-authored-by: XinyuanWangCS <xywang626@gmail.com>
a year ago
Merge pull request #264 from yuanmengqi/main Improve the parallel logic
15 days ago
Wxy/opencua (#290) * OpenCUA Agent code base * update url * debug, modify url input * debug opencua * show result * debug agent history overlap * modify opencua agent; add comment lines * update parallel; clean code; use sleep 3s * ui-tars-0717 * update detail * add system password to system prompt * add running command
21 hours ago