codex-long-running-harness

codex-long-running-harness is a Codex-first development harness for tasks that are too large for a single conversational pass. It treats long-horizon coding as an iterative system problem: plan the work, execute a sprint, evaluate the output, checkpoint the result, then continue.

Why it matters

Long-running software work needs explicit state, not just longer prompts.
Evaluation has to be part of the loop, otherwise the system drifts silently.
Progress should be inspectable after each sprint, with artifacts that survive restarts.

Core workflow

goal -> planner -> sprint backlog -> generator -> evaluator -> snapshot -> next sprint

What this project emphasizes

Sprintized execution: large coding goals are decomposed into bounded work cycles.
Evaluator-driven iteration: quality gates are part of the runtime, not an afterthought.
Recoverability: intermediate results and benchmark snapshots make it possible to resume work instead of restarting from scratch.
Research-grade traceability: each iteration can be compared, audited, and improved systematically.

Why it is highlighted here

This repo is a stronger signal than an ordinary agent demo because it focuses on the real bottleneck: how to make an AI coding system work over time without becoming opaque or fragile.