- 1.59 kB -  
- 522 B -  
- 123 kB -  
- 5.46 kB -  
- 1.25 kB -  
- 1.11 kB -  
- 7.03 kB -  
- 1.5 kB -  
- 21.6 kB -  
- 709 B -  
- 3.69 kB -  
- 1.08 kB -  
TimothyxxxAdd new section in README for OSWorld-MCP project
8365edc
Clean code; Refactor environment to pass screenshot content instead of path
2 years ago
Add safe browsing feature to Chrome evaluator
- Implemented `get_enable_safe_browsing` function to retrieve safe browsing settings based on the operating system.
- Updated the `__init__.py` to include the new function.
- Modified JSON examples to reflect the change from enabling enhanced safety browsing to enabling safe browsing.
- Added necessary commands in the JSON examples for setting up preferences for safe browsing.
a month ago
Add safe browsing feature to Chrome evaluator
- Implemented `get_enable_safe_browsing` function to retrieve safe browsing settings based on the operating system.
- Updated the `__init__.py` to include the new function.
- Modified JSON examples to reflect the change from enabling enhanced safety browsing to enabling safe browsing.
- Added necessary commands in the JSON examples for setting up preferences for safe browsing.
a month ago
fix: Enhance error handling and logging across multiple evaluators
- Added logging for file retrieval and error handling in file.py, improving robustness during file operations.
- Implemented checks for file existence and parsing errors in general.py, enhancing reliability in JSON/YAML processing.
- Improved table comparison logic in table.py with detailed error logging for sheet loading and cell value reading.
- Enhanced metrics evaluation in slides.py with additional checks for paragraph and run counts, ensuring thorough comparison.
- Updated utils.py to include file existence checks and detailed error logging during cell value reading.
4 months ago
feat: enhance VM wallpaper retrieval and image similarity checks
- Added logging to the VM wallpaper retrieval function to capture errors and warnings related to content retrieval and file creation.
- Implemented checks for None, empty, and invalid content types to ensure robustness in wallpaper handling.
- Enhanced the SSIM structure check function with size validation and improved error handling for image processing.
- Added logging for image size discrepancies and exceptions during SSIM computation to aid in debugging.
These changes improve error handling and logging, ensuring better maintainability and reliability of the evaluators.
3 months ago
Support Docker VM manager and provider (#75)
* Add docker provider framework
* Update VM download link
* Add stop container
* Update docker manager & provider
* Update
* Update
* Update provider
a year ago
Finish loading the vscode examples v1; Improve on the infra: Add accessibility tree into the observation; Add activate window function, etc
2 years ago
[Feature] Initialize and Implement Aguvis Evaluation on OSWorld (#98)
* Initialize Aguvis eval on OSWorld
* Debug
* Debug
* v1, internal version
* Add experiments script
* Fix minor bugs
* Update new endpoint
* Update ip
* Update
* Update
* Update
* Update
* Update
* Update
* Update
* Update
* Fix model name
* Fix docker close issues; update prompting
* Fix missed
* Fix the default port to avoid crashing on examples like '_update_browse_history_setup'
* Fix server and chromium ports in setup
* Revert and add missed dependency
* Add VLC port for docker
* Update
* Clean
---------
Co-authored-by: Tianbao Xie <tianbaoxie@U-492FC39R-0217.local>
Co-authored-by: FredWuCZ <fredwucz@outlook.com>
a year ago
Fix minor errors in vscode and gimp about path and postconfig
2 years ago
add multi-app examples
2 years ago
Check and fix on Chrome tasks
- Added `pytz` dependency to `requirements.txt` for timezone handling.
- Introduced `get_macys_product_url_parse` function to replace the old `get_url_path_parse` for better clarity and maintain backward compatibility.
- Enhanced logging throughout the `get_active_tab_html_parse` and `get_rule_relativeTime` functions for improved debugging and traceability.
- Updated JSON examples to reflect changes in expected keys and added new fields for better evaluation context.
- Removed deprecated execution commands from JSON examples to streamline the evaluation process.
4 months ago
update multi-apps
2 years ago