Quint Code 5.0 — decisions as living project infrastructure.

March 20, 2026

I was recently on a podcast where we talked about FPF, vibe-coding, decision engineering, and why “steering” AI agents is anything but an intuitive skill. We didn’t get through all the topics, and I promised the listeners a detailed follow-up post. Here it is.

And while we were talking, I realized it was time to resurrect Quint Code. So I did — and fell in love with this tool all over again. But let’s take it from the top.

Code got cheap. Decisions didn’t.

Everyone is vibe-coding right now. And that’s wonderful — even when it’s just a pet project, a weekend tool, or a hackathon prototype. But no, with varying degrees of success people are already building fairly large software systems with AI. The devil is in the details, sure, but one thing is absolutely certain: anyone still dismissing AI-assisted engineering while refusing to actually use these tools has hopelessly, irreversibly fallen behind.

If you’re also interested not just in the intuitive thrill of vibe-coding but in AI-assisted engineering proper, you’ve already run into this more than once: when we bear responsibility for the code, when actual business depends on it — just “vibing” is not a strategy at all.

Here’s the thing — code has always been a small part of the work in IT. Truly competent engineers were never just engineers; to a significant degree they were also managers: modeling the domain, understanding what they were building and for whom, organizing people. There was simply a gap between those who did this kind of work and those who just got grunt work dumped on them.

AI is now eliminating this masking gap, and doing it at a ruthlessly fast pace. It’s become more obvious: you need to make the right decisions, not just write or generate code.

”Humans steer. Agents execute.”

Just recently (though by singularity standards this is ancient history), OpenAI dropped the harness engineering post — about how they “purely vibe-coded” a large repo. Zero lines of human-written code. Sounds incredibly cool!!!

But I absolutely agree with Anatoly Levenchuk that this article needs an enormous disclaimer: “these stunts are performed by well-trained engineers — do not attempt without proper training.”

The article is woefully short on information about what specific skills are needed for this effective “steering.” Execution got cheaper and much faster — and it’s precisely steering that became the bottleneck.

After reading the harness engineering article, you might get the impression that this is easy to replicate — just add some integrations, cover it with tests, and off you go…

AI agents, when steered poorly, still frequently make junior-level mistakes. The upside is they make them fast — so we don’t find out about these mistakes a week before release, but almost immediately. This is where the fans of “just cover it with testsss, it can do it all on its own!!! I don’t even review tests anymore! It’s all automatic!” come in.

Yeah, but… I find it hard to believe you’ve never once had to actually review generated code, or verify behavior. Because sometimes you still have to. Sometimes! But when that “sometimes” hits, we immediately land flat on our face, because it turns out we suddenly need to possess certain hands-on competencies. Sure, you can keep going in circles asking AI for help, but do you actually know what to ask? This kind of fumbling can eat up an unforgivable amount of time.

So we’re in a situation where we still need to:

Not lose focus, not forget to check in time on what’s actually happening in our AI factory.
Be competent in actual development to verify those results
Be competent in management and systems thinking to avoid making junior-level mistakes ourselves in decomposition and work distribution across agents

Intuitive approaches don’t work well here. They barely work at all.

Three classes of problems

Anatoly Levenchuk in his recent posts very clearly defined three classes of problems we’re dealing with right now:

Class 1: “Keep spinning until the tests pass.” There’s a clear correctness criterion, and verifying the answer is cheaper than finding it. Math, algorithms, code with tests. AI agents already handle this class autonomously — they generate variants, run them through checks, headbutt their way to the finish! This is rapidly commoditizing: value shifts from the ability to solve to the ability to frame the problem. You don’t still think framing a problem is “just writing a prompt,” right?

Class 2: Verification costs more than the solution. Real engineering systems — distributed services, platforms, products with users. Here, green tests prove… nothing :) A system can pass all checks while still being brittle, unable to survive under load, silently drifting from requirements. “We built it” — but did we build the right thing? Verification requires expert judgment, it’s more expensive and slower than the development itself, and it can’t be fully automated.

Class 3: The problem statement itself is the problem. Neither the solution nor the criteria for verifying it are defined — because the problem itself isn’t defined. What to build, for whom, why — all of this needs to be discovered. The only verifier here is reality: the market, users, consequences. You can only verify results by probing — shipping the product into the real world.

Vibe-coding works exclusively in Class 1. Classes 2 and 3 demand a grammar for thinking, reliable methods of reasoning and sustaining attention throughout the entire work cycle.

FPF in one sentence

FPF (First Principles Framework) — a set of engineering principles, a grammar for thinking, or more precisely — an operating system for thinking! FPF structures your reasoning so that it’s hard to miss something important. It’s not an encyclopedia, not a methodology, not tied to any tool. A set of powerful rules. A set of first-principles, SoTA principles!

Quint Code 5.0: what it is and why the interface was rewritten from scratch

Quint Code is a decision engineering system for AI assistants. A set of tools that transforms conversation with your favorite AI agent into a structured engineering process.

Version 5.0 is a complete redesign. The previous architecture (ADI cycle with phases, hypotheses, L0/L1/L2 levels) was right in spirit but wrong in form. Too rigid, too ceremonious — context got overloaded instantly. For medium-sized tasks, the old QC brought nothing but pain and suffering, which is why I myself hadn’t used it for the last 2.5 months!

During that time I was using FPF in ChatGPT and Claude Desktop — works well, very well in fact if you just add the specification to the chat or project context (of course it gets indexed under the hood). But sometimes I specifically needed an agent with filesystem and code access that would understand FPF and could work using its principles at least to some degree. So I made several attempts — like slicing the spec into a skillpack — that was a really painful journey. A skillpack with hooks… I buried that attempt very quickly :)

What changed in Quint Code?

The main shift: from “phase machine of hypotheses” to “problem → decision → decision recorded as a contract.”

In v4, there was a rigid cycle: initialization → hypotheses → verification → audit → decision. This worked for one specific flow but was inflexible. Far from all tasks fit into that conveyor.

In v5 — seven tools, each self-contained. You can use one. You can use all of them. In any order — the system itself suggests what to do next. But if you want to get the most out of engineered reasoning — there’s a protocol, and we’ll go through it below. You can just install it and not use anything explicitly — it’ll quietly start recording knowledge on its own and invoke q-reason when it encounters complex tasks. Already a win! Set it and forget it — the project evolves.

Seven tools

1. /q-note — capture micro-decisions on the fly.

The lightest thing there is. “We decided to use X for Y because latency is critical.” Recorded, linked to files, automatically expires after 90 days. If three months later X is no longer relevant — the system will remind you, and we’ll review; if needed — update the decision/specification! Living documentation, damn it!

q-note is handy because the vast majority of decisions in projects are exactly like this. Small, fast, but six months later nobody remembers why. And we needed some simple tool that wouldn’t drag you through long reasoning cycles but also wouldn’t lose important project information — information about decisions.

2. /q-frame — define the problem before rushing to solve it.

This is where Quint Code already differs most from “just ask AI.” Instead of “how do we do X?” — “what problem are we solving? what are the constraints? how will we know we’ve solved it?”

This isn’t pointless formality. Formality is never pointless! This is that golden moment where 90% of projects could save weeks or months of rework — because without a clear problem statement, you’re building “something,” and you find out that “something” isn’t what was needed at all only when it’s too late, when rewriting will be very painful. Of course, right now Quint Code only frames problems within a single project, but I can already see how this could expand to team development and integrations with various team/company knowledge bases — but that’s a long roadmap.

3. /q-char — define comparison criteria before you see the options.

Here we try to establish selection criteria before generating and comparing variants. This is protection against retrofitting — when you pick criteria to match a variant you already “liked.”

Each comparison criterion has a role:

constraint — hard limit, must be satisfied (latency < 100ms)
target — what we’re optimizing (throughput)
observation — what we monitor but do NOT optimize (Anti-Goodhart!)

Why observation? Goodhart’s Law: when a metric becomes a goal, it ceases to be a good metric. If you optimize throughput — you might silently kill reliability. Observation says: “keep an eye on this, but don’t chase it.”

4. /q-explore — generate genuinely different variants, not three variations of the same thing!

Solution variants should differ in kind, not in degree. “Redis vs Memcached vs in-memory cache” — those are three variations of one approach. “Cache vs CDN optimization vs query redesign” — those are three genuinely different variants.

For each variant, you must identify the weakest link. What limits the quality of this approach? This isn’t about “cons” in the spirit of a standard SWOT — it’s about what will break first! And this is a continuation of the same “story” that begins during /q-frame — with each step of the full cycle we refine, think, converge on a better decision.

5. /q-compare — comparison on the Pareto front.

Now that criteria are defined upfront — apply them to the variants.

Here we look at the Pareto front. We’re not picking “the best option” but rather a set of options where each one can be (and typically will be) better than the rest along one dimension but worse along another.

What remains is your conscious, deliberate choice.

6. /q-decide — record the decision as a contract.

What we get is not a note in Notion, not a frozen ADR file that nobody reads.

We get a contract with four components:

Problem statement — why were we deciding in the first place?
Decision/Contract — what we chose, invariants (what must ALWAYS hold true?), preconditions, postconditions
Rationale — why this and not something else? We preserve the comparison table, the weakest link, evidence requirements
Consequences — rollback plan (under what circumstances do we roll back? How? What’s the blast radius if things do go sideways?), plus formal triggers for review, affected files

This isn’t ceremony for ceremony’s sake — it’s a specification that works. Three months from now, when a new engineer comes along and asks “why are we using X?” — there’s a very good chance they’ll find a comprehensive answer!

I wish there was full magic and Quint Code could track any conditions and their changes, but it’s already trying hard — there’s a staleness timer (Until when is this decision considered current? When should we, if not revisit it, at least verify it’s still valid?), and file-tracking is nearly ready.

7. /q-refresh — decisions age, and that’s normal!

Yes, I already started on this in the previous point — any decision has a shelf life. A year-old benchmark, documentation for an outdated version, a decision made under different load, a decision made simply… six months ago. Quint Code strives to track this automatically.

Every decision has a computed trust score — R_eff. It’s a calculated metric: the minimum across all evidence (weakest link principle — the strength of a chain equals the strength of its weakest link), with penalties for context (evidence from a different project is worth less than from the same one).

R_eff > 0.5 — decision is healthy
R_eff < 0.5 — decision is stale, needs review
R_eff < 0.3 — AT RISK, requires immediate attention

When you run /q-refresh — it shows what’s gone stale and offers options: extend (with justification), replace with a new decision, or deprecate. And then it goes and actually investigates what went stale. We’ve got an AI agent here, not a mere chatbot!

Three modes of operation

Quick — for obvious decisions, but where we still want strong reasoning from AI! Define the problem (frame) → decide → record (decide).

That’s the minimum that gives you all the useful artifacts — sweet!

Full — for architectural decisions, and really any decisions where we’re not sure. I recommend listening to your “intuition” more often — far more often than you’d think, your “intuitive” decisions aren’t the best ones! It’s actually fascinating to stress-test your “intuitive” decisions through the full Quint Code cycle:

Frame → Characterize → Explore → Compare → Decide.

Auto — /q-reason. This command is a distillation of the FPF core + the full FPF available as an FTS5 index accessible via CLI — the agent can load any part of FPF on demand! You can also invoke q-reason on any task — the effect will be exclusively positive. In fact it’s not a slash-command but a skill, just like /q-note, so your agent will try to use both at the right time. If you simply give the agent a task and ask it to “think,” this skill will almost always get invoked. This is what the old Quint Code was missing — intellectual horsepower available quickly and nearly for free.

Where FPF isn’t needed, and where it’s indispensable

Don’t bring a sledgehammer to hang a picture frame, friends!

Slow, engineered thinking is NOT needed for:

“Painting a UI button red”
Fixing a bug in an endpoint where you need to check field X instead of Y
Writing a CRUD endpoint that’s the same as an existing one but “slightly different”
Any other task where the solution is obvious and the cost of error ≈ 0

For any trivial task, Quint Code will tell you right at the /q-frame stage: u

– “Dude, this is trivial — let’s just do it and record the decision.”

But slow, engineered thinking IS needed when the task:

Is split across people, teams, or AI agents
Has delayed, noisy, or expensive feedback from the real world (see the problem classes earlier in this post)
Involves trade-offs between speed, quality, and risk that we need to make explicit — the cost of a wrong call is high
You “feel” that existing, familiar approaches will break down or fail to handle this task (or you’ve already watched them fail on similar ones)
The cost of error is above zero at all (and you won’t find out for six months)

Based on all of this, I concluded that FPF is needed in any real, living production environment, which means I need to lower the cost of using the specification and make as many benefits as possible accessible to the broadest number of developers. Which means — time to resurrect Quint Code!

Expectation vs reality

People think code will get better on its own if we just keep hammering agents in circles or throw buzzwordy applied technologies at it. But no. Only decisions get better. This is about upstream quality. Code is always downstream. If the decision is right, the code (even AI-generated) may vary in “applied quality,” but it will most likely stay within the bounds of the right decision — you see what I mean?

If the decision is wrong — no linter and no RLHF loop will save you.

What’s next: Quint Code Roadmap

Codebase Awareness — in progress right now

The biggest problem with the current version: Quint knows about decisions but is blind to the connection between code and specifications. But I already spoiled that this is nearly solved while I was writing this post!

We’ve addressed this problem at three levels:

Level A — File Drift Detection: when the code under a decision changes — the decision is automatically flagged as potentially stale. QC deterministically identifies WHAT changed (file hashes, diff), and your agent checks and evaluates — is this change IMPORTANT or cosmetic.

Level B — Module Coverage Map: Quint Code tries to understand your project’s structure — what modules exist, which ones are covered by decisions, and which are blind spots. “Module payments/ — 12 files, zero decisions. This is your riskiest blind spot.” Honestly, I’m thrilled about this feature.

Level C — Dependency Impact Analysis: And finally, when module A changes, Quint Code will at some point see this in the dependency graph and flag decisions for modules B and C that depend on A. This is serious cross-module architectural analysis. No tool does this right now — IDEs show code, ADR tools show decisions (and even those might be stale). Nobody connects the two. Quint Code has just started trying to do this; the feature is raw, but even in this form it’s simply magnificent. I’ll do my best to make it as stable as possible.

What else is on the horizon — though no guarantees we’ll get there

A few ideas I’m exploring:

Cross-Project Decision Memory — ~/.quint/global/, a personal knowledge base that grows across projects. You decided “PostgreSQL over MySQL” on project A with evidence for why. On project B, an analogous problem comes up — Quint Code says: “You’ve already made this decision, here are the arguments. But let’s double-check anyway, because this is a different project — different context, trust is reduced, but the idea might still be solid!” In short — every project makes your agent smarter.

Adversarial Verification — before recording a decision, a second pass: counterarguments, checking for self-referential evidence (when the agent cites its own conclusions as proof), attacking the declared weakest link. If the decision survives — it gets verified status. If not — it goes back for rework.

Adversarial Verification is already in the dev branch; the minimal implementation turned out to be trivial — just tweaking the prompt of the /q-decide command.

Team Decision Sync — multiple engineers collaborating on a single .quint/ database. When Alice and her agent are working on a caching strategy — Bob and his agent will see it. When Bob’s evidence disproves a decision — it automatically goes stale. This is probably the biggest “feature” — obviously it would require building an entire platform around Quint Code.

Decision-as-Infrastructure — decisions stop being documentation in code and move to the infrastructure level. You could build CI/CD gates based on R_eff. PR checks that show which decisions are affected by changes! Essentially, this is a move toward automatically turning test results into evidence that updates the project’s decision base. But this too is a big “feature” — platform territory again.

Conclusion

I’m not sure how to wrap this up — the ambitions are grand, and I really want to make sure this tool actually works, and not just in my own hands.

Just install Quint Code and give it a try — the full cycle or the simplest mode. I’d love to hear your feedback and suggestions!