By Kunal Tangri, Co-founder and COO of Farsight · Published June 25, 2026
AI labs and investors have arrived at the same conclusion from opposite ends of the industry. Anthropic's When AI Builds Itself reports that models can now execute almost anything that can be specified, and names the remaining human advantage plainly: judgment. Choosing which problems matter, which results to trust, when an approach is a dead end. Sarah Guo's The Untrainable maps the same boundary from the investor's side. When anyone can check the answer, every provider gets it right, and the only thing left to compete on is price; that is why every task that can be measured is on its way to commodity. The value that survives sits where nobody can check the answer from outside. She calls it private correctness: the sense of what counts as good that differs from firm to firm and person to person, invisible to any public benchmark. Her advice to builders is to get inside a domain and start "writing down what good means" there, because someone is going to.
The diagnosis is right. The prescription most of the industry has adopted is wrong.
That prescription is "capture institutional knowledge," and in practice it always means the same thing: index the firm's documents and retrieve over them. Every knowledge-management wave for thirty years has made this move, and the AI wave is making it again at enormous scale. It fails for a reason that is obvious once stated: a document is the output of judgment with the judgment removed. A final deck is clean. So is an executed agreement. Every decision that produced them was erased by the act of finishing. Retrieval over finished work can tell you what a firm concluded. It cannot tell you how the firm decides, which is the only thing that was ever scarce.
Firms feel this gap even when they can't name it, which is why the older method keeps getting revived: interview the experts, write the playbook, distill the partner's wisdom into a wiki. This fails more politely but just as completely, because expertise is inherently inarticulate. Senior people cannot fully explain their own taste; they demonstrate it. Ask a rainmaker why one narrative wins a pitch and another loses and you'll get a plausible story. Watch what she actually changes in the deck at 11pm before the meeting and you'll get the truth.
That's the principle the whole approach is missing: judgment is revealed, not stated. What people do is truer than what they say. Interviews lie; edits don't.
Where judgment actually lives
Consider what happens to a deliverable in a professional firm. An analyst, or increasingly an AI system, produces a first draft. Then the draft meets seniority. A vice president restructures the returns page. A managing director kills a section, reorders the story, strikes one framing of the company and replaces it with another, circles a number and writes "source?" The draft goes around again. Then the final version ships to the client, the mandate is won or lost, and everyone moves on.
Look at what just happened in information terms. The distance between the first draft and the final one is a dense record of decisions: what was kept, what was killed, who made each call, and what it cost to be wrong. Every profession already has a name for the artifact that carries this record. Engineers call it the diff. Lawyers call it the redline. Bankers call it the markup. Three names for the same thing, with one telling difference: engineers kept theirs. Version control preserves every diff ever written, author and rationale attached, and the code review lives forever next to the code. The experiment has already run. The one profession that kept its diffs is the first profession AI learned to do well, and decades of preserved diffs are part of what it learned from. Law and finance never built the equivalent. Getting your draft back covered in comments is how judgment has been transmitted in those fields for a century, and the artifact is discarded on every turn. The judgment is in the diff.
Notice what feeds those edits. The MD who strikes one framing and replaces it with another may be drawing on a decade of conversations with that exact corp dev team, on what the client lingered over in the last pitch, on how a room once reacted to a number that looked fine on paper. None of that knowledge exists in any document. Neither does the voiceover that will carry the deck, or the Q&A the team prepares for, which shapes the page as much as anything on it. The deck was never the whole job, and any banker will tell you so. But the markup is where the invisible work becomes visible: relationship knowledge, room sense, and years of pattern recognition, expressed as concrete decisions about what ships.
Rank the signals a machine could learn judgment from by what was at stake when each was produced. At the bottom sit web text and thumbs-ratings: costless to make, with nothing riding on them. A rung up is the commissioned markup, where an AI lab pays experts to mark up practice documents: real expertise, but nothing ships and nothing is at stake. At the top sits the signal no one has collected: the edit made to a document about to go to a client under the firm's name, with a relationship and a fee on the line. Each rung up, every byte carries more judgment, because the cost of being wrong rises with it. Call the top decision-grade signal. The AI industry trains on data with little decision-grade signal in it; in professional work, the signal richest in judgment is the one no one has kept.

Never collected, because today the markup is exhaust. The diff between draft one and the version that ships exists for a few days, scattered across email threads and tracked changes, and then it evaporates the moment the final version goes out. Every deck shipped is a judgment event whose signal disappears at send. Multiply that across every deliverable, every team, every year, and you get the strange accounting of the modern professional firm: its most valuable data asset is produced daily at the highest levels of the organization and retained at a rate of approximately zero. A firm's judgment walks out the door every day with the deliverables that ship, and for the last time when the people who hold it retire.
The next system
Enterprise software has a canon, and it is short. Systems of record told you what is true: the ledger, the CRM. Systems of engagement put that truth in front of people. The last decade's "systems of intelligence" promised to learn from the record, and mostly delivered search, because the record was all they had to learn from.
The next system in the canon keeps something none of the others can see. Systems of record hold what the firm knows. The system of judgment holds what the firm decides is good.
A system of judgment does three things. It keeps revealed judgment at the decision layer: the markup on real deliverables survives instead of evaporating, so the diff between first draft and final draft becomes a durable asset instead of exhaust. Nothing new is recorded; what was always produced is finally retained. It stays current, because judgment is not a constant you write down once. Taste has a half-life. Market regimes turn, leadership changes, and what a partner accepted two years ago gets struck today. Any approach that freezes judgment, whether a one-time fine-tune or an expert interview, is training on stale taste from the day it ships; a system of judgment keeps pace with the standard as it moves, drift included. And it compiles what it learns. Walk any bank's archive and you'll find the same slide rebuilt hundreds of times a year, the buyer profile or the returns page, where only the content changes deal to deal. When judgment repeats that often, it stops needing a frontier model to re-derive it every time. Recurring judgment hardens first into preferences a system can apply, then into evaluations a system can be scored against, and finally into plain software that runs deterministically. At that point a unit of the firm's judgment executes at near-zero marginal cost, forever.
The learning mechanism is not exotic. It is the oldest one in professional life. For centuries, judgment has transferred exactly one way: an apprentice does the work, a master marks it up, and the apprentice internalizes the delta and iterates. A system of judgment adds a new apprentice to that loop: a machine sits in the apprentice's chair, alongside the juniors, not instead of them. This matters for a question every firm is now asking: if AI produces the first drafts, how do juniors ever become seniors? The honest answer is that juniors never learned much from producing the first draft. They learned from watching it come back marked up. A firm whose corrections are kept and explained has more apprenticeship signal than any firm in history, for its people and its machines at once. The apprenticeship loop doesn't break under AI. Undocumented judgment breaks it. Documented judgment compounds it.
One caveat belongs in the open. A system of judgment is a mirror. It will keep a firm's blind spots as faithfully as its taste: the risk aversion, the deference to whoever signs off last. That is an argument for the system, not against it. A bias written down is a bias you can finally see, and judgment made legible is judgment you can finally argue with. The status quo is the same bias operating invisibly, with no record to audit.
Writing down what good means
Here is why this is the strategic high ground rather than good plumbing. The engine of Guo's argument is that whatever can be verified gets commoditized: the compiler made coding trainable, the test suite made it grindable, and the work fell toward whoever has the most compute. What resists is work whose correctness can't be checked from outside, the territory she calls the untrainable, and it is most of what professional firms sell. A system of judgment is a private verifier built for exactly that territory. It converts "what makes a deliverable good here," a question only a few senior people could answer, into a standard a system can be measured against: what share of generated work survives to the final version untouched (acceptance rate), how fast the distance between first draft and final draft shrinks as the system learns (edit-decay), which content and which sources actually live to ship (survival rate), and how much of the firm's recurring work has hardened into deterministic code (codification rate).
Those numbers do not exist anywhere today. Ask any bank or any law firm for its acceptance rate. The number cannot be produced, because no firm has ever kept its redlines. The first systems able to compute these metrics will hold something beyond a product advantage: the standing to define how this entire class of work gets measured. In every field, what counts as good is about to be written down and made machine-readable, because the economics demand it. The only open question is the one that always decides a standard: who holds the pen.
The judgment is already there, in every firm, exercised every day at the highest levels of the organization and thrown away every night. The redline was always the most valuable document in the building. We just never kept it.
We're building this for financial services first, where the distance between a first draft and a final draft decides who wins the mandate. If you want to know what your firm's acceptance rate is, we should talk.



.png)