DataTrainingSynthetic
Harbor DataGen
Synthetic Data Generation
2025

Harbor DataGen provides synthetic data generation pipelines specifically designed for training terminal-based AI agents. Powered by the TerminalGym environment, it generates diverse, realistic terminal interaction sequences that can be used for supervised fine-tuning and reinforcement learning.
The system produces high-quality training examples spanning common developer workflows — file manipulation, git operations, debugging sessions, and deployment tasks — ensuring trained agents develop robust, transferable skills.