BenchFlow
ResearchNewsDocsAgents

Documentation

SkillsBench

Getting StartedContributing

Other docs

BenchFlow →
HomeDocsSkillsBench

SkillsBench documentation

The first benchmark for whether agents can use procedural skills.

  • Getting Started

    How to run SkillsBench — evaluate your coding agent's ability to use domain-specific skills.

  • Contributing

    How to contribute tasks to SkillsBench — the first benchmark that tests whether agent skills can improve agent performance.

BenchFlow

A frontier environment lab for AI agents. SkillsBench, ClawsBench, and the BenchFlow runtime — open source.

Projects

  • SkillsBench
  • ClawsBench
  • BenchFlow runtime
  • HuggingFace org

Site

  • Research
  • News
  • Docs
  • Verified agents
  • About

Ecosystem

  • Agent Skills ’26 workshop
  • ClawsBench paper (arXiv)
  • GitHub org
  • Discord

© 2026 BenchFlow · A frontier environment lab

xiangyi@benchflow.ai