Research News Docs Agents

Documentation

BenchFlow

Getting started Concepts Authoring tasks Skill evals Progressive disclosure Sandbox hardening Use cases

reference

CLI reference Python API

Other docs

SkillsBench →

Home DocsBenchFlow

BenchFlow documentation

The runtime that ships SkillsBench, ClawsBench, and verified ACP agents.

Getting started
Install BenchFlow and run your first agent against a verifiable task.
Concepts
Tasks, harnesses, agents, environments, scenes, verifiers — the core BenchFlow vocabulary.
Authoring tasks
How to write a verifiable BenchFlow task end-to-end.
Skill evals
Run skill evals through the BenchFlow runtime — what gets measured and how.
Progressive disclosure
Lifecycle for environments that reveal information across rounds.
Sandbox hardening
How BenchFlow sandboxes prevent oracle leakage and other failure modes.
Use cases
What BenchFlow is used for: evals, post-training, dataset curation.
CLI reference
BenchFlow command-line reference.
Python API
BenchFlow Python SDK reference.

BenchFlow

A frontier environment lab for AI agents. SkillsBench, ClawsBench, and the BenchFlow runtime — open source.

Projects

SkillsBench
ClawsBench
BenchFlow runtime
HuggingFace org

Site

Research
News
Docs
Verified agents
About

Ecosystem

Agent Skills ’26 workshop
ClawsBench paper (arXiv)
GitHub org
Discord

© 2026 BenchFlow · A frontier environment lab

xiangyi@benchflow.ai