mirrored 8 minutes ago
0
Tianbao XieServer setup readme revision (#108) * Initialize * add note for resolution * Organize * draft version and todos * ver Nov24th supplemented socat installation and switching off automatic suspend and screen-off * Finish Tianbao todos * Finish Tianbao todos * Fix typos * update font install * Finish Xiaochuan's Part * Finish Xiaochuan's Part update * Update README.md * Fix format --------- Co-authored-by: zdy023 <zdy004007@126.com> Co-authored-by: tsuky_chen <3107760494@qq.com> Co-authored-by: Jason Lee <lixiaochuan20@gmail.com> Co-authored-by: Siheng Zhao <77528902+sihengz02@users.noreply.github.com>afba17b
# Evaluation examples

Here we put the data examples to benchmark the ability of agents when interacting with GUI.
The examples are stored in `./examples` where each data item formatted as:

```
{
    "id": "uid", # unique id
    "snapshot": "snapshot_id", # the snapshot id of the environment, with some data already there and apps already opened, or just desktop
    "instruction": "natural_language_instruction", # the natural language instruction of the task, what we want the agent to do
    "source": "website_url", # where we know this example, some forum, or some website, or some paper
    "config": {xxx}, # the scripts to setup the donwload and open files actions, as the initial state of a task
    # (coming in next project) "trajectory": "trajectory_directory", # the trajectory directory, which contains the action sequence file, the screenshots and the recording video
    "related_apps": ["app1", "app2", ...], # the related apps, which are opened during the task
    "evaluator": "evaluation_dir", # the directory of the evaluator, which contains the evaluation script for this example

}
```

The `./trajectories` file contains the annotated trajectories for each data item in `./examples` for finishing the task.

For now, it is under construction, and only tested on Windows 10. Please:
- Modify the path accordingly to run the evaluation;
- Remind us if some parts are overfit to our environment.