Why a harness
This guide explains the problem Flow exists to solve. You will learn:
- Why large language models are unpredictable on framework code
- Why you cannot fix the model, only the context you give it
- What a harness is and how it changes the outcome
Overview
When you ask an AI coding agent to write AdonisJS code, it produces something plausible almost instantly. Whether that code is correct for your version of AdonisJS is a separate question, and a harder one.
The agent draws on patterns from its training data. That data is a blend of many frameworks, many versions of each, and many years of community code. The agent cannot tell you which patterns are current and which are three major versions out of date. It will mix them with full confidence, and you often discover the problem only when the code fails (or, worse, when it quietly does the wrong thing and passes).
Understanding why this happens, and what actually fixes it, is the reasoning behind everything Flow does.
Large language models are black boxes
A large language model (LLM) is the engine inside every AI coding agent. It is also a black box.
This is not a figure of speech. Even the researchers who build these models cannot fully explain why a specific output appeared. The model is billions of numerical parameters with no human-readable record of what it knows or how it decides. Working out what happens inside is an active scientific frontier. Anthropic's interpretability research, for example, describes the effort as building a "microscope" to map a model's internal concepts and trace its reasoning, precisely because none of that is visible from the outside.
The practical consequence for you is direct. You cannot reach into the model and correct its knowledge of AdonisJS. There is no setting for "use Lucid v22, not the patterns you absorbed from 2021." You cannot debug it, patch it, or pin it to a version. The model is fixed, opaque, and beyond your reach.
You cannot fix the model, only its context
What you can control is the context: the text the model reads at the moment it generates a response. The context is your system instructions, the files in the conversation, the tool results, and the prompt itself. The model pattern-matches on whatever is in that window. Change the context, and you change the output.
Curating that context well is its own discipline, increasingly called context engineering. The principle is simple. If the model is going to pattern-match on whatever sits in front of it, then put the right things in front of it: authoritative documentation instead of guesswork, a concrete plan instead of an open-ended request, the project's real conventions instead of a generic average of the internet.
Context engineering is the only reliable lever you have. The model stays a black box. The context is yours to shape.
What a harness is
A harness is the structured scaffolding you build around the model so that its context is engineered for you, every time, without you assembling it by hand.
The word is borrowed from testing. A test harness wraps code you do not fully trust so you can exercise it safely and catch failures early. An AI harness does the same for a coding agent. It has three parts:
- Reference knowledge the agent reads instead of recalling. Current, version-matched documentation removes the guesswork that stale training data introduces.
- A workflow that forces planning and review before any code is written. A reviewed plan is a far better context for the build step than a one-line request.
- Guardrails that detect when the agent drifts from the approved plan and stop it, rather than letting a small wrong turn compound into a broken feature.
The model is still a black box. The harness does not change that. It makes the agent's output reliable anyway, by controlling everything around the box.
How Flow applies this
Flow is a harness built specifically for AdonisJS. Each part of the harness maps to something Flow installs:
- The knowledge base is the reference knowledge. Curated AdonisJS docs, matched to the package versions your app uses, that the agent reads before it writes.
- The spec-driven workflow is the planning-and-review workflow. It breaks a change into gated steps, each one approved before the next.
- The engineering skills are encoded guardrails for specific design decisions the agent would otherwise make inconsistently.
The result is that you keep using the same coding agent, but its output stops being a gamble.
Further reading
These sources from Anthropic explain the underlying ideas in more depth:
- Mapping the Mind of a Large Language Model introduces the interpretability work behind the "black box" framing.
- Tracing the thoughts of a large language model shows how opaque a model's internal reasoning is, even to its creators.
- Effective context engineering for AI agents covers the practice of curating what a model sees.