The first benchmark for whether agents can use procedural skills.
Getting Started
How to run SkillsBench — evaluate your coding agent's ability to use domain-specific skills.
Contributing
How to contribute tasks to SkillsBench — the first benchmark that tests whether agent skills can improve agent performance.