Crucible Code — Systems Thinking for Programmers
December 13, 2025
I often see the same problem: engineers spend weeks searching for a solution not because the task is hard, but because there’s no structure for thinking about it. Or there’s no way to focus on that task due to too much routine work.
Context gets scattered, hypotheses are generated chaotically, tested by gut feeling (“seems to work”, “this should just work”), and either never written down or lost in chats, messy drafts, endless “Save for later” in Slack.
The result — either jumping to the first solution that “sounds reasonable” (hello, bias), or analysis paralysis.
This isn’t always, and not necessarily, a skills problem. It’s a problem of missing infrastructure and system for thinking.
I was inspired by the First Principles Framework and made crucible-code to try to give this infrastructure to a wider range of developers.
Last Sunday I attended a seminar at Engineers-Managers Workshop where Anatoly Levenchuk was talking about FPF.
What is it? It’s a powerful systems thinking framework — the essence of best practices in systems thinking and first principles. It’s complex, extensive, and incredibly cool.
Anatoly mentioned that even LLMs, which can’t fit the entire FPF in context (it’s 3,348,537 characters, roughly 837,000 tokens), work surprisingly well if you just attach the FPF specification file to a ChatGPT chat.
Anatoly emphasized that he deliberately doesn’t try to “simplify” FPF to code or utilities, because the target audience is much wider than software engineers.
But I’m a programmer :) So I thought — why not?
Meet Crucible Code, a tiny FPF-based tool for an engineer’s daily work (and not just programmers!) — on GitHub
What Is It (in simple terms)
Crucible-code is a set of commands for the Claude Code CLI agent that turn it into a methodical systems architect.
It guides you through the ADI cycle:
- Abduction (Hypothesis generation) — throw out options, from boring to radical.
- Deduction (Logical check) — find contradictions without writing a single line of code.
- Induction (Empirical check) — tests, benchmarks, research.
The output isn’t a piece of code, but a Decision Rationale Record (DRR) — a document that says: “We chose solution X because facts Y and Z indicate this solution fits. We rejected solution W because tests showed risk N. We need to revisit this decision if load grows 10x.”
It’s just written with some metadata and a bit more formality.
We’ll return to this cycle and look at it in more detail later.
One of the Key Principles: External Transformer
The idea is this: a system cannot transform itself. It needs an external agent.
In our case — that’s YOU.
Claude Code generates hypotheses, finds evidence, writes plans. But we make the decision at each step. The human here is that external transformer. So Crucible Code does everything to keep the user from relaxing and skipping uncomfortable questions.
Systems Thinking in Disguise
The first version was just a set of steps, mostly ADI, without additional FPF formalities.
But in version 2.1.0 I went further and hid the complex FPF ontology “under the hood”. The beauty is that you don’t need to know FPF to use it to some degree — the command cycle in Crucible Code itself makes you follow the principles.
The cycle looks something like this:
1. Agentic Init & Context Awareness
When you run /fpf:0-init, Claude Code creates a directory structure for FPF work. It also scans your repository to understand context. Yes, like a regular init in agent CLIs, but you’ll likely be taken through an interview to ground the project context even more:
— “I see Python and Django. What’s our budget? Expected scale (100 rps or 100k rps)? What are the hard constraints?”
The result is written to .fpf/context.md. After that, any hypothesis is checked not in a vacuum, but with this context in mind.
Claude Code will say: “This idea is really cool, but to train such a neural network we’d need a GPU, and the context says ‘low budget‘“.
2. Strict Separation: Method vs Work
FPF has a Strict Distinction principle. In the tool it’s implemented like this: any hypothesis is now split into two parts:
- The Method (Design-Time) — Plan. Recipe. What do we actually want to do?
- The Validation (Run-Time) — Evidence. Tests. Logs. What actually happened.
Working with FPF makes it much harder to confuse “I figured out how to do it” with “I proved it works”.
3. NQD (Novelty, Quality, Diversity)
When you run fpf-1-hypothesize, Crucible Code doesn’t just throw a few hypotheses for research — it must classify them:
— Conservative: Proven, reliable solution. — Novel: “Modern”, enthusiastic approach. — Radical: Risky but potentially powerful solution.
This protects against getting stuck on what you already know.
The ADI Cycle Up Close
Phase 1: Abduction — “What are the options? What do we do with all this?”
Command /fpf-1-hypothesize.
Instead of jumping to code the first solution that comes to mind, we generate a space of options. Claude Code will suggest 3-5 hypotheses of varying novelty and complexity.
Phase 2: Deduction — “Where’s the logical hole?”
Command /fpf-2-check.
We’re not writing implementation yet, nooo, that’s still far off :) At this stage we test hypotheses for logical soundness (together with the LLM).
— “If we start building microservices (hypothesis), and we have 2 juniors on the team (context), 1 mid-level and no infrastructure engineers — we’ll die at the devops stage.”
The hypothesis is rejected before you spent 2 weeks setting up Kubernetes. And then a year and a half suffering from rapidly growing architecture and infrastructure complexity.
Phase 3: Induction — “Talk is cheap. Show me the numbers.”
Commands /fpf-3-test (local empirical tests) and /fpf-3-research (external search, like WebSearch and WebFetch on the internet).
Hypotheses that survived deduction we test in battle. Write a prototype. Run a benchmark. Find documentation. If we feel there’s not enough evidence to work with hypotheses, we can use fpf-3-research, gather external information, and then return to checks via /fpf-3-test more thoroughly.
Phase 4: Audit — “The weakest link doesn’t mean useless”
Command /fpf-4-audit — This is the only optional command you can skip and go straight to step five, but I strongly recommend never skipping it, especially if you’re working on a really complex decision, and/or a decision that could significantly affect your system’s quality.
This phase applies the WLNK (Weakest Link) principle: confidence in the entire solution equals confidence in the weakest piece of evidence. If the benchmark is great but the documentation clearly says “This is a raw API, don’t use it in production” — the solution is probably unreliable.
Audit is the bias-check and WLNK analysis phase. Here Claude checks:
- Are there cognitive biases in the choice?
- Does confidence in the solution match the weakest link in the evidence?
- Congruence — how applicable is external data to your context?
Phase 5: Decision — “Lock it in”
Command /fpf-5-decide
Claude gathers everything together: context, hypotheses, evidence, risks. But you — the “external transformer” — must choose the winner.
This is where that Design Rationale Record I mentioned gets created.
That’s basically it! Besides the DRR, the .fpf project directory keeps collected evidence that will be useful in the future — either for new FPF cycles, or… say, to remember decisions, or write good project documentation.
Real Case: Legacy Monolith, Operational and Technical Debt
The task right now: there’s a legacy monolith and debt in technical and operational decisions. Nothing criminal, typical startup. The load on the team and myself is also “typical” for startups.
After a month and a half of work, I generally understand the main problem areas, but it’s not entirely clear what to tackle first. And the choice must be made so that the effect is maximum — the improvement should be multiplicative.
Although the decision on what to tackle had already been made, I decided to test Crucible Code on this exact problem. The task seemed to have a fairly fuzzy context and clearly went beyond just one monolith.
Ran /fpf-1-hypothesize. Claude suggested:
- H1: Fix tests and add them to CI (Conservative)
- H2: Full refactoring to DDD (Radical)
- H3: Implement stronger static analysis (Novel)
/fpf-2-check immediately threw out H2. We can’t slow down development or train the current team on design patterns, because DDD requires high expertise that (according to context) isn’t there yet.
/fpf-3-test: Claude ran existing unit tests. Turns out tests run fast (2s) but fail. Quick setup and run of static analyzer (H3) showed 350+ errors.
In the end, at /fpf-5-decide we chose a hybrid of H1 + H3 (Fix current tests, write new ones and add them to CI + static analyzer baseline that we won’t allow to rise above (by violation count) in future codebase changes). H2 (refactoring) was postponed.
The whole thing took ~25 minutes.
You might say “Well, it was probably obvious what to do anyway, right?”. As I said earlier, yes — after looking at the code and processes I was leaning toward solving the bottleneck with testing and linters. But I was only leaning. After the FPF session the choice was rock solid, backed by evidence and documented.
Using artifacts from Crucible Code I created a Story task in Jira and soon it will go into work (to Claude Code itself, for example).
Without this system and artifacts we’d still be scratching our heads.
Why Not SuperClaude, or Some AgentOS?
There are many “combines” like SuperClaude or AgentOS that stuff the context with fat prompts or skills, create 15 “specialized” subagents and try to “think for you”, automate everything. That’s more about vibe coding — “Dear Claude Code, make it pretty”.
Has its place, but that place is very far from engineering.
Crucible Code is an exoskeleton, not an autopilot.
1. Minimal Overhead.
Your regular work with Claude Code stays regular. FPF only activates on command. Yes, in the Crucible Code repo I share my CLAUDE.md, but more just as an example. I’m pretty sure Crucible Code will work well without any additional instructions.
2. No Magic.
You see every step, you think together with Claude Code.
3. Structured Memory.
The session.md file stores the state of your shared thinking during the session itself. Additionally — all hypotheses, their statuses, DRR records, evidences are stored. All connected through metadata in files. Even if Claude (or you) “forgets” the dialog context, it will re-read documents in .fpf and remember: “Ah, we rejected MongoDB because we don’t have an admin”.
By the way, Crucible Code has additional commands not related to the ADI cycle:
/fpf-status — where are we now?
Shows the current cycle phase, list of hypotheses with their statuses (L0/L1/L2/invalid), and suggests the logical next step. Useful when you return to a task after a break, lost the session or “lost the thread”.
It’s more than expected that working on a complex task you’ll get stuck at the /fpf-3-* stage for several hours and forget what that stage even was in Crucible Code.
Or you need to continue on Monday… Crucible doesn’t just drop breadcrumbs, it leaves whole loaves of bread behind.
By the way, I’ve had cases where after
/fpf-3-testthere were great scripts that after/fpf-5-decidetransparently moved into the main code. Claude Code while working through the Crucible Code cycle still follows your instructions in CLAUDE.md regarding coding style and requirements.
/fpf-query — search through the accumulated knowledge base.
You can search by hypotheses, evidence, past decisions. Filter by confidence level (L0, L1, L2, invalid). For example: “show all verified evidence about caching, we researched something about this last month” or “what hypotheses did we throw out last week on task XXX-123 and why”
/fpf-decay — check evidence freshness.
Evidence ages: a year-old benchmark, documentation for an outdated version, links to an article about a deprecated API. This command shows which evidence needs to be re-checked or marked as invalid. After all, a decision made on stale data is a decision made blind.
When to Use Crucible Code
I’ll say it: these commands can become an invaluable helper in these cases:
- Architectural decisions (database, queue, state management).
- Complex bugs where the cause isn’t obvious.
- Team disputes (need an argued choice) — copy disputes from Slack to a file and set Crucible Code on them.
- When the cost of error is high.
Crucible Code and FPF are completely unnecessary for:
- “Paint the button red”.
- “Write a sorting function”.
- “Fix the bug in the webhooks endpoint, need to check field X not Y”.
- And other obvious tasks. Please don’t use a cannon to kill a sparrow.
Quick Start
- Grab the repository:
git clone https://github.com/m0n0x41d/crucible-code.git
./install.sh /path/to/your/project
or globally (commands will be copied to ~/.claude/):
./install.sh -g
- Initialize (Claude will scan the project and ask a couple of questions):
claude
/fpf-0-init
- Start thinking about your task systematically:
/fpf-1-hypothesize "It seems we shouldn't use redis for queues because..."
Thinking can be an engineering process. You can debug it, optimize it, and version it. Give it a try!
p.s. Oh, you already guessed that Crucible Code can be used not just for software engineering? And that nothing stops you from using MCPs and other tricks in the ADI cycle? You’re welcome ;)