Hub
    Docs
Try for Free
xiangyi-li
/
OS-World
mirrored 9 minutes ago
Benchmark CardFiles and versionsLeaderboard
  • Hub
  • Contact
DiscordGitHubXLinkedIn
0
  • __init__.py
    1.56 kB
    ​
  • calc.py
    522 B
    ​
  • chrome.py
    121 kB
    ​
  • file.py
    5.46 kB
    ​
  • general.py
    1.25 kB
    ​
  • gimp.py
    1.11 kB
    ​
  • impress.py
    7.03 kB
    ​
  • info.py
    1.5 kB
    ​
  • misc.py
    21.6 kB
    ​
  • replay.py
    709 B
    ​
  • vlc.py
    3.69 kB
    ​
  • vscode.py
    1.08 kB
    ​
Check and fix on Chrome tasks - Added `pytz` dependency to `requirements.txt` for timezone handling. - Introduced `get_macys_product_url_parse` function to replace the old `get_url_path_parse` for better clarity and maintain backward compatibility. - Enhanced logging throughout the `get_active_tab_html_parse` and `get_rule_relativeTime` functions for improved debugging and traceability. - Updated JSON examples to reflect changes in expected keys and added new fields for better evaluation context. - Removed deprecated execution commands from JSON examples to streamline the evaluation process.
a month ago
Check and fix on Chrome tasks - Added `pytz` dependency to `requirements.txt` for timezone handling. - Introduced `get_macys_product_url_parse` function to replace the old `get_url_path_parse` for better clarity and maintain backward compatibility. - Enhanced logging throughout the `get_active_tab_html_parse` and `get_rule_relativeTime` functions for improved debugging and traceability. - Updated JSON examples to reflect changes in expected keys and added new fields for better evaluation context. - Removed deprecated execution commands from JSON examples to streamline the evaluation process.
a month ago
yuanmengqifeat: enhance run_coact.py with logging and configuration options - Added logging configuration to capture runtime logs in both file and console with adjustable log levels. - Introduced new command-line arguments for provider name, region, and client password to improve flexibility and security. - Updated process_task function to accommodate new parameters, ensuring compatibility with existing logic. - Modified prompt templates in coding_agent.py and cua_agent.py to use the client password placeholder for enhanced security. 84f407a
  1. /
  2. evaluators
  3. desktop_env
  4. getters
Increase timeout for page load stability in Chrome evaluator - Updated the timeout for the page load state from 10 seconds to 60 seconds to ensure better stability during page processing. - Removed redundant retry mechanisms from the active tab checks to streamline the code while maintaining existing functionality. - Enhanced logging to provide clearer insights into the page loading process. These changes aim to improve the reliability of the Chrome evaluator without altering the core logic.
16 days ago
Clean code; Refactor environment to pass screenshot content instead of path
a year ago
fix: Enhance error handling and logging across multiple evaluators - Added logging for file retrieval and error handling in file.py, improving robustness during file operations. - Implemented checks for file existence and parsing errors in general.py, enhancing reliability in JSON/YAML processing. - Improved table comparison logic in table.py with detailed error logging for sheet loading and cell value reading. - Enhanced metrics evaluation in slides.py with additional checks for paragraph and run counts, ensuring thorough comparison. - Updated utils.py to include file existence checks and detailed error logging during cell value reading.
20 days ago
Fix minor errors in vscode and gimp about path and postconfig
2 years ago
Support Docker VM manager and provider (#75) * Add docker provider framework * Update VM download link * Add stop container * Update docker manager & provider * Update * Update * Update provider
10 months ago
add multi-app examples
2 years ago
update multi-apps
a year ago
feat: enhance VM wallpaper retrieval and image similarity checks - Added logging to the VM wallpaper retrieval function to capture errors and warnings related to content retrieval and file creation. - Implemented checks for None, empty, and invalid content types to ensure robustness in wallpaper handling. - Enhanced the SSIM structure check function with size validation and improved error handling for image processing. - Added logging for image size discrepancies and exceptions during SSIM computation to aid in debugging. These changes improve error handling and logging, ensuring better maintainability and reliability of the evaluators.
17 days ago
[Feature] Initialize and Implement Aguvis Evaluation on OSWorld (#98) * Initialize Aguvis eval on OSWorld * Debug * Debug * v1, internal version * Add experiments script * Fix minor bugs * Update new endpoint * Update ip * Update * Update * Update * Update * Update * Update * Update * Update * Fix model name * Fix docker close issues; update prompting * Fix missed * Fix the default port to avoid crashing on examples like '_update_browse_history_setup' * Fix server and chromium ports in setup * Revert and add missed dependency * Add VLC port for docker * Update * Clean --------- Co-authored-by: Tianbao Xie <tianbaoxie@U-492FC39R-0217.local> Co-authored-by: FredWuCZ <fredwucz@outlook.com>
9 months ago
Finish loading the vscode examples v1; Improve on the infra: Add accessibility tree into the observation; Add activate window function, etc
2 years ago