Quaestio Sexta

06

Now, Never, Not Yet

Where Architecture Intersects with Outcomes

Every CFO knows the numbers must balance. Assets equal liabilities plus equity—always, precisely, without interpretation. Introduce probability into that equation and you haven't saved time, you've created risk that time savings don't justify.

Yet, that same CFO's team spends hours drafting board memos, synthesizing market research, and preparing executive summaries. Work that demands clarity and coherence, but tolerates revision. Work where "draft" and "final" are separated by human judgment.

One task demands determinism. The other rewards speed.

Some work has right answers. Other work has better drafts. Some processes cannot tolerate error. Others improve through iteration. AI belongs in the second category—and introduces unacceptable risk in the first.

So, which work fits which category? Where does AI deliver value today? Where does its fundamental design create permanent mismatch regardless of advancement? And where should you monitor the frontier without betting operations on it?

Insights to Expect

Why domains like content creation are AI's strongest business case today
How deterministic processes expose AI's fundamental mismatch
When error tolerance determines deployment viability
Which emerging capability verticals matter
Why understanding your work's "error budget" clarifies AI deployment decisions

Like this approach?

Subscribe for a pragmatic brief a week.

Succinct, cited, and immediately actionable.

The Framework: Now, Never, Not Yet

Understanding AI's boundaries requires three distinct categories:

Now represents work where AI's probabilistic nature matches task structure. High-volume content creation, research synthesis, pattern recognition—places where speed to draft creates value and human review catches errors. These use cases are deployable at scale.

Never represents work where AI's architecture fundamentally conflicts with requirements. Not because models aren't sophisticated enough, but because introducing probability into deterministic systems, or removing humans from judgment-dependent decisions, creates structural mismatch. These are use cases where the tool doesn’t fit the task.

Not Yet represents work where capability exists but reliability doesn't. Research frontiers where continued training, emerging architectures, and enabling technologies will extend what's practical, but production readiness remains unclear. These are use cases which require monitoring, selective piloting, and future consideration, but not present-day full deployment.

These categories aren't about AI's power, but rather matching tools to work. Understanding which territory your use case occupies accelerates deployment by eliminating wasted effort.

Now: Where AI Delivers Today

The "Now" category works because the structure of the work matches the structure of the model.

Why Copywriting Works

Consider marketing copy—product announcements, email campaigns, internal communications. This work has always operated through iteration: writers produce drafts, editors refine them, stakeholders review them. The process assumes revision. Professional writers never expected first drafts to be final.

Recall from A Clear Definition: AI systems learn patterns from data rather than following explicitly programmed rules for each scenario. From The Mechanics of Intelligence, we learned that models are fundamentally probabilistic, not deterministic—they predict likely continuations based on learned patterns. Every output is a statistical best guess, not a calculated certainty.

This architecture fits copywriting perfectly. The model generates a statistically plausible draft; humans verify it matches intent and accuracy. When a marketing team reviews AI-generated copy, they're doing what they always did—checking claims, adjusting tone, ensuring brand alignment. The model hasn't eliminated verification; it's eliminated the blank page.

Why training structure enables this: As explored in Data, Patterns, and Feedback, models trained on massive text corpora learned the statistical structure of language—what words follow what contexts, what tone fits what scenarios, what structures appear in marketing versus technical writing. When you ask a model to draft a product announcement, it synthesizes patterns from millions of similar examples in its training data, combining learned structures into novel compositions.

The fit works because the work already had an "error budget" built in. Copywriting tolerates imperfection in first drafts because revision is expected. AI accelerates draft creation; humans ensure accuracy.

The Structural Pattern

This same pattern explains why classification tasks (spam filtering, fraud flagging), code assistance (autocomplete, boilerplate generation), and research synthesis work in production today.

"Now" territory shares three characteristics:

Probabilistic outputs meet revision workflows: The work already assumes drafts need review or acceptability of false positives. AI's statistical nature doesn't change that. It accelerates the first step and subsequent iterations.
Training data coverage: The work type appears extensively in training corpora. Models learned relevant patterns because similar tasks fill their training sets.
Error tolerance in feedback loops: Mistakes get caught before harm because humans review outputs or because false positives cost less than false negatives.

If a use case exhibits these traits—revision-tolerant work, extensive training data coverage, human verification built into workflow—it aligns current work processes with AI architecture. The structure of the work matches the structure of the model.

Never: The Fundamental Mismatch

Some tasks resist AI not because models lack sophistication, but because their requirements directly conflict with how models operate.

How Financial Reporting Exposes the Architecture Problem

Consider a corporate balance sheet. Assets must equal liabilities plus equity. Tax calculations follow explicit formulas defined by law. Every figure must reconcile to source documents through traceable arithmetic. Even with explanatory footnotes, the balance sheet must still mathematically reconcile—the 'why' may vary, but the numbers must balance precisely. This is a deterministic process where there is exactly one right answer.

But recall from Token by Token: models generate outputs through next-token prediction. Each number, each decimal, each figure is predicted based on what statistically follows in similar contexts. So, for models trained on financial reports, they learn that balance sheets contain numbers in certain formats, that columns sum to totals, and that certain ratios appear frequently. Yet, they learn the appearance of correctness, not the rules of calculation.

When a model generates "Total Assets: $86,753,091," that figure emerged from pattern prediction, not arithmetic. The model predicted a plausible-looking number based on other numbers in the document. Whether it actually equals the sum of individual line items? That's not what the architecture optimizes for.

From The Mechanics of Intelligence, we learned models adjust weights to minimize prediction error on training data—to make outputs look like the patterns they've seen. They weren't trained to perform deterministic calculation; they were trained to predict plausible continuations.

The structural conflict: Introducing probability where precision is required doesn't make the task harder—it makes the tool wrong for the job. You can't eliminate the probabilistic risk with more training data or larger models. The optimization target itself conflicts with the task requirement. The model optimizes for coherence (does this look like a balance sheet?), not accuracy (do these numbers actually reconcile?). It may even force reconciliation, but the plug in an AI model is even more dangerous than Kevin Malone’s “Keleven”. While it might get you home by seven, that doesn’t make it accurate.

As we explored in The Closed-Book Test, we know models express confidence through output probabilities. A model can generate "$86,753,091" with 99% confidence while being completely wrong, because confidence measures pattern fit, not factual accuracy. High confidence on a hallucinated figure looks identical to high confidence on a correct calculation.

How verification destroys value: Financial reporting's determinism isn't just about accuracy—it's about efficiency. Rule-based calculation produces verifiable outputs automatically. With AI-generated figures, you must manually verify every number, trace every calculation, validate every relationship and you have to execute that validation every time. In pursuit of speed and outcome optimization, AI implementation falls short. You haven't automated; you've added probabilistic prediction to a process that demands deterministic proof.

Some may argue: "Accept 95% accuracy, verify the rest." But financial statements don't work that way. One incorrect figure cascades through dependent calculations—depreciation affects asset values, which affect equity, which affect ratios, which affect covenant compliance. You can't spot-check. You must be right every time. So, while AI may be a tool, it's not the tool for this (these) use cases. There are better, more efficient, and more accurate tools for the job.

The Pattern Across "Never" Land

This same structural mismatch explains why certain other domains resist reliable AI deployment:

Judicial rulings: Pattern matching cannot distinguish between genuine precedent and fabricated citations with equal fluency, let alone proper application of law.
Mission-critical infrastructure: Distributed parameter activation provides no explainable decision chain when systems fail

The "Never" category exists because task requirements demand capabilities orthogonal to what pattern-matching architectures provide:

Deterministic output but AI provides probability
Zero hallucination tolerance but architecture optimizes coherence over truth
Causal reasoning but training teaches correlation
Explainable audit trails but decisions distribute across millions of parameters

These aren't training problems—they're architecture problems. More data won't convert statistical prediction into guaranteed accuracy. Larger models won't stop optimizing for coherence. Better training won't create causal reasoning from correlation learning.

If your use case requires deterministic precision, zero-error tolerance, causal inference, or traceable decisions, deploying AI without verification architecture that maintains human accountability creates more risk than value. The tool's structure fundamentally conflicts with the task's requirements.

Not Yet: The Active Frontier

Research advances target the architectural limitations just explored, but advancement in labs doesn't equal reliability in production.

Where Capability Exists But Reliability Doesn't: Complex Data Analysis

Consider strategic business intelligence—analyzing sales patterns across regions, identifying emerging customer segments, correlating marketing spend with revenue impact. These analyses require:

Pattern recognition across multiple variables
Insight generation from incomplete or noisy data
Hypothesis formation about causal relationships
Communication of findings in business language

Current models excel at parts of this work. They identify statistical correlations, generate natural language explanations, and synthesize information across documents. The capability to perform exploratory data analysis exists and research labs demonstrate it regularly.

But deployment reveals the reliability gap. Models confidently report correlations that don't exist in the data. They generate insights that sound sophisticated but rest on statistical artifacts. They hallucinate data points to complete patterns, making fabricated specifics indistinguishable from actual findings.

Why this is "Not Yet", not "Never": Unlike financial reporting, where deterministic requirements fundamentally conflict with probabilistic architecture, complex data analysis could work within AI's capabilities—if models reliably distinguished between genuine patterns and statistical noise, if they flagged uncertainty appropriately, if they grounded claims in verifiable data.

The capability exists. Companies pilot AI-assisted analysis tools. Analysts use them for initial exploration. But trusting outputs without comprehensive validation remains risky. When an AI system reports "Q3 sales in the Southeast region declined 12% primarily due to competitor pricing," verifying that claim requires checking:

Whether the 12% figure is accurate
Whether the decline was region-specific or broader
Whether competitor pricing correlation is real or coincidental
Whether other factors were ignored or misweighted

This verification often takes as long as performing the analysis manually, eliminating the efficiency gain.

Why this differs from "Never": The gap isn't architectural impossibility—it's engineering reliability. Emerging techniques target these exact problems:

Retrieval-Augmented Generation (RAG) grounds analysis in actual data by retrieving specific records before generating insights (Lewis et al., 2020). Instead of predicting plausible figures, the model cites verifiable data points.
Chain-of-thought reasoning makes inference steps explicit, allowing validation of logical progression (Wei et al., 2022). Rather than jumping to conclusions, the model shows intermediate reasoning.
Tool use and structured outputs let models invoke calculation functions and format findings in verifiable structures rather than generating numbers through prediction.

These aren't guaranteed solutions—they're active research frontiers with domain-specific success rates. For complex data analysis, some organizations report reliable results with heavy human oversight. Others find verification burden still outweighs value.

The critical distinction: This is "Not Yet" because the path forward exists and the demand for accuracy is not 100%. Here, the Pareto principle applies: acceptable risk tolerance makes partial automation valuable despite imperfect reliability. RAG, reasoning architectures, and tool integration mitigate specific failure modes. Success depends on the applied domain, data quality, and acceptable error rates. Unlike the "Never" category, where architectural conflict is fundamental, the challenge is engineering maturity and acceptable risk-return tradeoffs.

The "Not Yet" Strategy

For complex analysis and similar frontier applications:

Pilot in low-stakes scenarios with comprehensive validation
Use AI for idea generation and initial exploration, not final conclusions
Measure verification time against manual analysis time
Scale only when the specific domain demonstrates reliable performance
Monitor research developments but don't bet operations on uncertain timelines

The gap between research capability and production reliability remains wide. Deploy cautiously.

Diagnostic Questions to Determine Fit

Strategic deployment starts with structural questions, not capability assessments.

"Now" opportunities:

Does the work have an error budget?
- If human review is standard practice, AI accelerates the workflow
- If errors are unacceptable, AI introduces verification burden
Was this work in training data?
- Models excel at tasks resembling training corpus
- Novel domains or proprietary processes expect poor performance

"Never" use cases:

Does architecture match requirement?
- Deterministic tasks need rule engines, not probability
- Causal reasoning needs different structures than correlation learning
- Legal accuracy needs verification, not prediction
Can you detect and correct failures cheaply?
- If errors are obvious and correction is fast, deployment may work (i.e. recategorize as "Not Yet" or "Now")
- If errors hide in plausible outputs and correction requires extensive validation, avoid deployment

"Not Yet" possibilities:

Do emerging techniques target your specific limitation?
- Is RAG, reasoning, or tool use addressing your failure mode?
- Do research papers demonstrate success in your domain?
Can you pilot with acceptable risk?
- Low-stakes scenarios with comprehensive validation
- Clear success metrics and timeline checkpoints
- Budget for monitoring research developments

Understanding these structural fits and misfits accelerates value by eliminating wasted effort on unsuitable use cases. Organizations succeeding with AI aren't chasing capabilities; they're matching tools to tasks based on structural compatibility.

Implementation Framework:

Now: Deploy with human-in-the-loop review, monitor usage patterns, measure time-to-draft versus time-to-final as ROI.

Never: Redirect resources to rule-based systems or human expertise. Document why AI was rejected to prevent recurring evaluation cycles.

Not Yet: Allocate small pilot budgets (5-10% of potential deployment cost). Set explicit success criteria. Reassess quarterly based on both internal results and published research.

Final Thoughts

Work structure determines AI fit, not AI capability: Revision-tolerant work with human review naturally accommodates probabilistic outputs. Deterministic requirements fundamentally conflict with pattern-prediction architecture.
Training data informs boundaries: Models excel where training data extensively covered the domain. They struggle where work is novel, proprietary, or requires reasoning beyond learned correlations.
Hallucination is architectural: Token-by-token prediction optimizes coherence over truth. More training data creates more fluent hallucinations, not more accurate outputs. Grounding techniques mitigate but don't eliminate this structural behavior.
"Not Yet" requires engineering maturity, not just capability: RAG and reasoning research target architectural limitations. Success is domain-specific and uncertain. Monitor developments but pilot cautiously.

Over the last six articles, we've explored the "how" behind AI, unlocking the strategic clarity which accelerates value discovery. Understanding how AI actually works reveals why certain use cases succeed immediately, why others will never fit without human oversight, and where future development may extend boundaries. Organizations matching tool structure to work structure capture value while competitors chase unsuitable applications.

With the foundation clear, we'll shift our focus next from understanding mechanics to deploying and optimizing within them—the techniques, architectures, governance, and decision frameworks that translate knowledge into value.

Did you enjoy this article?

Share it with a friend and don't forget to subscribe.

Links and References

Lewis, P., et al. — Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (NeurIPS, 2020)
Wei, J., et al. — Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (NeurIPS, 2022)

Questions, suggestions, or future topics you'd like to see covered? Let me know.

Share feedback

5500 Military Trail, Ste 22 #412, Jupiter, FL 33458
Unsubscribe · Preferences

Summa Intelligentiae

Clear AI thinking, grounded in business reality

Nov 22 • 10 min read

Summa Intelligentiae: Now, Never, Not Yet