BenchFlow

RL Environments for Coding Agents

Real-world coding tasks from production TypeScript repositories. Train agents on actual engineering problems via PR Mirroring.

1.2M+

Combined Stars

30

Repositories

100%

From Real PRs

Validated Task Instances

Human-reviewed training tasks with verified fail-to-pass test coverage

8 tasks4 repositories

How It Works

PR Mirroring creates realistic coding tasks from real engineering work

Real GitHub PR
LM Reverses Changes
Tests Fail
Human Review
Training Task