OS-World/xiangyi-li · BenchFlow

.envrc
207 B
.gitignore
2.74 kB
.mise.toml
81 B
ACCOUNT_GUIDELINE.md
9.3 kB
CONTRIBUTION.md
-
LICENSE
11.3 kB
PROXY_GUIDELINE.md
5.98 kB
PUBLIC_EVALUATION_GUIDELINE.md
14.7 kB
README.md
14.8 kB
ROADMAP.md
2.31 kB
assets
-
desktop_env
-
evaluation_examples
-
lib_run_single.py
18.7 kB
lib_run_single_mobileagent_v3.py
2.68 kB
logs
-
main.py
3.22 kB
mm_agents
-
monitor
-
pyproject.toml
1.71 kB
quickstart.py
1.86 kB
requirements.txt
875 B
run.py
11.7 kB
run_autoglm.py
22.6 kB
run_autoglm_v.py
23.9 kB
run_coact.py
15.2 kB
run_maestro.py
21.4 kB
run_multienv.py
20.4 kB
run_multienv_agi.py
20.8 kB
run_multienv_aguvis.py
13.5 kB
run_multienv_autoglm.py
23.1 kB
run_multienv_autoglm_v.py
10.7 kB
run_multienv_aworldguiagent.py
28.6 kB
run_multienv_claude.py
20.7 kB
run_multienv_gta1.py
23.9 kB
run_multienv_mano.py
21.9 kB
run_multienv_mobileagent_v3.py
14 kB
run_multienv_o3.py
19.9 kB
run_multienv_openaicua.py
20.6 kB
run_multienv_opencua.py
22.8 kB
run_multienv_owl.py
12.9 kB
run_multienv_qwen25vl.py
20.4 kB
run_multienv_qwen3vl.py
18.7 kB
run_multienv_uipath.py
20.4 kB
run_multienv_uitars.py
20.5 kB
run_multienv_uitars15_v1.py
22.5 kB
run_multienv_uitars15_v2.py
20.9 kB
setup.py
3.3 kB
show_result.py
3.27 kB
uv.lock
951 kB

.envrc
207 B
[completely optional] direnv+mise autosetup (#87) Makes life a lot easier in my experience.
a year ago
.gitignore
2.74 kB
support qwen3vl agent (#336) Co-authored-by: root <ludunjie1219@github.com>
a month ago
.mise.toml
81 B
[completely optional] direnv+mise autosetup (#87) Makes life a lot easier in my experience.
a year ago
ACCOUNT_GUIDELINE.md
9.3 kB
add GDrive guideline
5 months ago
CONTRIBUTION.md
-
Clean Code; Refactor README
2 years ago
LICENSE
11.3 kB
Update LICENSE
2 years ago
PROXY_GUIDELINE.md
5.98 kB
PROXY_GUIDELINE.md Updates by Changyu Pang from Tsinghua (#41) * fix proxy readme * Add logs directory with .gitignore * Update PROXY_GUIDELINE.md
a year ago
PUBLIC_EVALUATION_GUIDELINE.md
14.7 kB
feat: add client password argument to multiple agents and scripts - Introduced `--client_password` argument in `run_multienv_aguvis.py`, `run_multienv_claude.py`, and `run_multienv_gta1.py` for enhanced security and flexibility. - Updated agent classes (`PromptAgent`, `AguvisAgent`, `GTA1Agent`) to accept and utilize `client_password` for improved configuration. - Modified evaluation guidelines to reflect the new client password requirement. - Ensured existing logic remains intact while enhancing functionality for better user experience.
3 months ago
README.md
14.8 kB
Add new section in README for OSWorld-MCP project
10 hours ago
ROADMAP.md
2.31 kB
Refactoring VMware Integration and Implementing AWS Support (#44) * Initailize aws support * Add README for the VM server * Refactor OSWorld for supporting more cloud services. * Initialize vmware and aws implementation v1, waiting for verification * Initlize files for azure, gcp and virtualbox support * Debug on the VMware provider * Fix on aws interface mapping * Fix instance type * Refactor * Clean * hk region; debug * Fix lock * Remove print * Remove key_name requirements when allocating aws vm * Clean README --------- Co-authored-by: XinyuanWangCS <xywang626@gmail.com>
a year ago
assets
-
Enhance Public Evaluation Guidelines by adding new images for AWS setup and monitoring instructions. Included additional contact information for leaderboard updates and error reporting. Ensured clarity and usability for users while preserving existing content structure.
3 months ago
desktop_env
-
Update setup.py for version bump and dependency adjustments - Bump version from 1.0.0 to 1.0.1 - Update numpy dependency to allow versions >=1.26 and <3 - Adjust pandas dependency to allow versions >=2.2 and <2.3 - Add new __init__.py file in the docker provider directory
7 days ago
evaluation_examples
-
Add safe browsing feature to Chrome evaluator - Implemented `get_enable_safe_browsing` function to retrieve safe browsing settings based on the operating system. - Updated the `__init__.py` to include the new function. - Modified JSON examples to reflect the change from enabling enhanced safety browsing to enabling safe browsing. - Added necessary commands in the JSON examples for setting up preferences for safe browsing.
25 days ago
lib_run_single.py
18.7 kB
oswrold agent wrapper for trained v7 (#360)
13 days ago
lib_run_single_mobileagent_v3.py
2.68 kB
add support for mobile agent v3 (#328) * add support for mobile agent v3 * add mobile_agent * add support for mobile agent v3
2 months ago
logs
-
Update OpenCV dependency to headless version in requirements and setup files - Replaced 'opencv-python' with 'opencv-python-headless' in both requirements.txt and setup.py to reduce unnecessary GUI dependencies. - Added a new .gitkeep file in the logs directory to ensure it is tracked in version control. - Maintained existing code logic while improving dependency management.
2 months ago
main.py
3.22 kB
fix some multi_apps tasks (#245) * fix chrome * fix some multi_apps tasks. * fix some multiapps tasks * fix some multiapps tasks --------- Co-authored-by: yuanmengqi <yuanmengqi@mail.ustc.edu.cn>
4 months ago
mm_agents
-
oswrold agent wrapper for trained v7 (#360)
13 days ago
monitor
-
fix: update Flask port configuration to support environment variable - Modified the Flask application to allow the port to be set via the `FLASK_PORT` environment variable, defaulting to 8080 if not specified. - Ensured existing application logic remains unchanged while enhancing configurability for deployment environments.
3 months ago
pyproject.toml
1.71 kB
Add multiple new modules and tools to enhance the functionality and extensibility of the Maestro project (#333) * Added a **pyproject.toml** file to define project metadata and dependencies. * Added **run\_maestro.py** and **osworld\_run\_maestro.py** to provide the main execution logic. * Introduced multiple new modules, including **Evaluator**, **Controller**, **Manager**, and **Sub-Worker**, supporting task planning, state management, and data analysis. * Added a **tools module** containing utility functions and tool configurations to improve code reusability. * Updated the **README** and documentation with usage examples and module descriptions. These changes lay the foundation for expanding the Maestro project’s functionality and improving the user experience. Co-authored-by: Hiroid <guoliangxuan@deepmatrix.com>
2 months ago
quickstart.py
1.86 kB
Update default path_to_vm argument to None in quickstart.py for improved flexibility
2 months ago
requirements.txt
875 B
Add ui agent (#343) * add uipath agent * readme update
a month ago
run.py
11.7 kB
Add consistent scores validation (#368) * Add consistent scores validation * revert osworld_run_maestro.py changes
2 days ago
run_autoglm.py
22.6 kB
Add consistent scores validation (#368) * Add consistent scores validation * revert osworld_run_maestro.py changes
2 days ago
run_autoglm_v.py
23.9 kB
Add consistent scores validation (#368) * Add consistent scores validation * revert osworld_run_maestro.py changes
2 days ago
run_coact.py
15.2 kB
feat: enhance run_coact.py and related agents with improved task handling and configuration - Updated TASK_DESCRIPTION in run_coact.py to clarify task-solving steps and requirements. - Modified configuration parameters for provider name and client password for better security and flexibility. - Enhanced OrchestratorUserProxyAgent to include user instruction in the auto-reply and improved screenshot handling. - Adjusted coding_agent.py to ensure proper verification of results before saving changes. - Improved CUA agent prompts to maintain application state and handle user instructions more effectively. - Ensured existing code logic remains unchanged while enhancing functionality and usability.
3 months ago
run_maestro.py
21.4 kB
Merge branch 'main' of github.com:xlang-ai/OSWorld
a month ago
run_multienv.py
20.4 kB
feat: add run_multienv_o3.py script for multi-environment evaluation - Introduced a new script `run_multienv_o3.py` to facilitate end-to-end evaluation across multiple environments. - Implemented command-line argument parsing for various configurations including environment settings, logging levels, and AWS parameters. - Integrated signal handling for graceful shutdown of environments and processes. - Enhanced logging capabilities for better traceability during execution. - Maintained existing logic from previous scripts while introducing new functionalities for improved evaluation processes.
3 months ago
run_multienv_agi.py
20.8 kB
oswrold agent wrapper for trained v7 (#360)
13 days ago
run_multienv_aguvis.py
13.5 kB
feat: add client password argument to multiple agents and scripts - Introduced `--client_password` argument in `run_multienv_aguvis.py`, `run_multienv_claude.py`, and `run_multienv_gta1.py` for enhanced security and flexibility. - Updated agent classes (`PromptAgent`, `AguvisAgent`, `GTA1Agent`) to accept and utilize `client_password` for improved configuration. - Modified evaluation guidelines to reflect the new client password requirement. - Ensured existing logic remains intact while enhancing functionality for better user experience.
3 months ago
run_multienv_autoglm.py
23.1 kB
fix multienv bug (#327)
2 months ago
run_multienv_autoglm_v.py
10.7 kB
Add autoglm-os-9b-v (#344) * update for autoglm-v * Update run_autoglm.py --------- Co-authored-by: hanyullai <hanyullai@outlook.com>
a month ago
run_multienv_aworldguiagent.py
28.6 kB
update aworldguiAgent code (#342)
a month ago
run_multienv_claude.py
20.7 kB
feat: enhance logging and signal handling in run_multienv_claude.py - Refactored logging configuration to support dynamic log levels via command-line arguments, allowing for better control over log verbosity. - Introduced a new signal handler for graceful shutdown of environments and processes, improving robustness during termination. - Added functionality to save command-line arguments to a JSON file for better traceability of execution parameters. - Maintained existing logic while enhancing the overall structure and error handling capabilities of the script.
3 months ago
run_multienv_gta1.py
23.9 kB
init public release (#350)
24 days ago
run_multienv_mano.py
21.9 kB
support mano agent (#338) Co-authored-by: Fei Hu <molanhand@users.noreply.github.com>
a month ago
run_multienv_mobileagent_v3.py
14 kB
add support for mobile agent v3 (#328) * add support for mobile agent v3 * add mobile_agent * add support for mobile agent v3
2 months ago
run_multienv_o3.py
19.9 kB
feat: add run_multienv_o3.py script for multi-environment evaluation - Introduced a new script `run_multienv_o3.py` to facilitate end-to-end evaluation across multiple environments. - Implemented command-line argument parsing for various configurations including environment settings, logging levels, and AWS parameters. - Integrated signal handling for graceful shutdown of environments and processes. - Enhanced logging capabilities for better traceability during execution. - Maintained existing logic from previous scripts while introducing new functionalities for improved evaluation processes.
3 months ago
run_multienv_openaicua.py
20.6 kB
Merge pull request #264 from yuanmengqi/main Improve the parallel logic
3 months ago
run_multienv_opencua.py
22.8 kB
OpenCUA-72B (#354) * use aws pub ip * os task fix: set the default dim screen time to be 300s * OpenCUA-72B * update password * update * update * update opencua72b agent * change provider ip --------- Co-authored-by: Jiaqi <dengjiaqi@moonshot.cn>
18 days ago
run_multienv_owl.py
12.9 kB
Add support for GUI-Owl agent (#318) * add run_multienv_owl.py * add owl_agent.py
2 months ago
run_multienv_qwen25vl.py
20.4 kB
feat: refactor run_multienv_qwen25vl.py and qwen25vl_agent.py for improved logging and task management - Introduced signal handling for graceful shutdown of environments and processes. - Enhanced logging configuration to support dynamic log levels and structured output. - Updated argument parsing to include new parameters for model selection and task execution. - Refactored task distribution logic to streamline environment task management. - Improved error handling during task execution and environment cleanup. - Adjusted Qwen25VLAgent initialization to support new model and thought prefix options. - Reduced max tries for LLM calls to optimize performance.
3 months ago
run_multienv_qwen3vl.py
18.7 kB
support aliyun eval of qwen3vl
14 days ago
run_multienv_uipath.py
20.4 kB
Add ui agent (#343) * add uipath agent * readme update
a month ago
run_multienv_uitars.py
20.5 kB
Uitars/dev (#291) * use aws pub ip * os task fix: set the default dim screen time to be 300s * add all the uitars agents: 1. run_multienv_uitars.py: Qwen2VL-based UITARS models 2. run_multienv_uitars15_v1.py: UITARS1.5-7B 3. run_multienv_uitars15_v2.py: SeedVL1.5 thining/non-thinking --------- Co-authored-by: Jiaqi <dengjiaqi@moonshot.cn>
3 months ago
run_multienv_uitars15_v1.py
22.5 kB
Uitars/dev (#291) * use aws pub ip * os task fix: set the default dim screen time to be 300s * add all the uitars agents: 1. run_multienv_uitars.py: Qwen2VL-based UITARS models 2. run_multienv_uitars15_v1.py: UITARS1.5-7B 3. run_multienv_uitars15_v2.py: SeedVL1.5 thining/non-thinking --------- Co-authored-by: Jiaqi <dengjiaqi@moonshot.cn>
3 months ago
run_multienv_uitars15_v2.py
20.9 kB
Uitars/dev (#291) * use aws pub ip * os task fix: set the default dim screen time to be 300s * add all the uitars agents: 1. run_multienv_uitars.py: Qwen2VL-based UITARS models 2. run_multienv_uitars15_v1.py: UITARS1.5-7B 3. run_multienv_uitars15_v2.py: SeedVL1.5 thining/non-thinking --------- Co-authored-by: Jiaqi <dengjiaqi@moonshot.cn>
3 months ago
setup.py
3.3 kB
Update setup.py for version bump and dependency adjustments - Bump version from 1.0.0 to 1.0.1 - Update numpy dependency to allow versions >=1.26 and <3 - Adjust pandas dependency to allow versions >=2.2 and <2.3 - Add new __init__.py file in the docker provider directory
7 days ago
show_result.py
3.27 kB
Fix minor problems when aggragating the results (#106)
a year ago
uv.lock
951 kB
Add multiple new modules and tools to enhance the functionality and extensibility of the Maestro project (#333) * Added a **pyproject.toml** file to define project metadata and dependencies. * Added **run\_maestro.py** and **osworld\_run\_maestro.py** to provide the main execution logic. * Introduced multiple new modules, including **Evaluator**, **Controller**, **Manager**, and **Sub-Worker**, supporting task planning, state management, and data analysis. * Added a **tools module** containing utility functions and tool configurations to improve code reusability. * Updated the **README** and documentation with usage examples and module descriptions. These changes lay the foundation for expanding the Maestro project’s functionality and improving the user experience. Co-authored-by: Hiroid <guoliangxuan@deepmatrix.com>
2 months ago