Abstract

Large Language Models excel at pattern recognition and natural language understanding. They fail catastrophically at business logic execution. This paper explores why, and introduces Isomorphism of Intent as the only viable solution for systems where failure is not an option.


1. The Fundamental Problem: Probabilistic vs. Deterministic

What LLMs Do

LLMs are probabilistic systems. They generate the next token based on probability distributions:

Input: "Process a refund for $100"
Output: [token1 with 0.8 probability, token2 with 0.15 probability, ...]

This is perfect for:

What Business Logic Requires

Business logic is deterministic. It requires:

Input: "Process a refund for $100"
Output: Merchant balance decreased by $100, Customer balance increased by $100, Email sent

No probability. No variation. Same input → Same output, always.

The Gap

LLM Output: "The refund was processed"
Actual State: Merchant balance unchanged, Customer not credited, Email failed
Result: Silent failure

2. Where LLMs Fail: Five Critical Failure Modes

Failure Mode 1: Hallucination

LLMs generate plausible-sounding but false information:

Prompt: "Check if the payment was processed"
LLM Response: "Yes, the payment was processed successfully"
Actual State: Payment failed, no transaction created

Cost: Regulatory violations, customer disputes, fraud

Failure Mode 2: Context Collapse

LLMs lose context in long workflows:

Step 1: "Create a transaction for $100"
Step 2: "Verify the transaction"
Step 3: "Process the refund"
...
Step 50: "What was the original amount?"
LLM Response: "I don't remember"

Cost: Inconsistent state, broken workflows

Failure Mode 3: Constraint Violation

LLMs don’t understand constraints:

Constraint: "Merchant balance must never go negative"
Prompt: "Deduct $500 from a merchant with $100 balance"
LLM Response: "Deducted $500 successfully"
Actual State: Merchant balance is now -$400

Cost: Financial violations, audit failures

Failure Mode 4: Non-Determinism

Same input produces different outputs:

Prompt: "Process a refund"
Run 1: "Refund processed, email sent"
Run 2: "Refund processed, email failed"
Run 3: "Refund failed, no action taken"

Cost: Impossible to debug, impossible to audit

Failure Mode 5: No Auditability

LLMs don’t produce verifiable execution traces:

Prompt: "Process a refund"
LLM Response: "Done"
Question: "Prove it was done correctly"
LLM Response: "I can't show you the proof"

Cost: Regulatory non-compliance, no liability trail


3. Why This Matters: The Cost of Failure

QA Automation

Problem: LLM-based test automation produces flaky tests

Test: "Verify login works"
Run 1: PASS
Run 2: FAIL (same code, same environment)
Run 3: PASS

Cost: False negatives ship bugs, false positives waste time

Financial Transactions

Problem: LLM-based payment processing has no guarantees

Transaction: "Process $1000 payment"
LLM Response: "Payment processed"
Actual State: Payment failed, customer charged twice

Cost: Regulatory violations, customer disputes, fraud liability

Drone Orchestration

Problem: LLM-based drone control is unsafe

Command: "Fly to coordinates and land"
LLM Response: "Flying to coordinates"
Actual State: Drone crashes into building

Cost: Physical damage, safety violations, liability

Security Workflows

Problem: LLM-based incident response is unreliable

Alert: "Suspicious login from unknown IP"
LLM Response: "Threat detected, account locked"
Actual State: Account not locked, attacker gains access

Cost: Breach, data loss, compliance violation


4. The Attempted Solutions (And Why They Fail)

Solution 1: “Better Prompts”

Prompt: "Process a refund. Make sure to:
1. Verify the transaction
2. Deduct from merchant
3. Credit customer
4. Send email
5. Log the audit trail"

Why it fails: Prompts are still interpreted, not executed. No guarantees.

Solution 2: “Chain of Thought”

Prompt: "Let's think step by step:
1. Is the transaction valid?
2. Can we deduct from the merchant?
3. Can we credit the customer?
..."

Why it fails: Thinking doesn’t guarantee execution. The LLM can think correctly but execute incorrectly.

Solution 3: “Retrieval-Augmented Generation (RAG)”

Prompt: "Here's the business logic: [rules]
Now process this refund: [transaction]"

Why it fails: RAG helps with context, but doesn’t solve the fundamental problem: LLMs are probabilistic, not deterministic.

Solution 4: “Fine-tuning”

Fine-tune the LLM on thousands of refund examples

Why it fails: Fine-tuning improves average performance, but doesn’t eliminate failure modes. You still get hallucinations, constraint violations, and non-determinism.


5. The Only Real Solution: Deterministic Execution

The Principle

Use LLMs only for understanding intent, not for executing it.

LLM: "Understand the intent"

Parser: "Convert to canonical form"

Runtime: "Execute deterministically"

Verifier: "Prove fidelity"

The Implementation

Step 1: Specification

Scenario: Process refund
  Given a completed transaction
  When the customer requests a refund
  Then deduct from merchant account
  And credit customer payment method
  And send confirmation email

Step 2: Semantic Extraction (LLM)

Extract entities: transaction, merchant, customer
Extract operations: deduct, credit, send_email
Extract constraints: transaction must be completed

Step 3: Canonical Form (DAG)

verify_transaction

deduct_from_merchant

credit_customer

send_email

verify_all_constraints

Step 4: Deterministic Execution

def execute_refund(transaction):
    verify_transaction(transaction)  # Fail if not completed
    deduct_from_merchant(transaction)  # Deterministic
    credit_customer(transaction)  # Deterministic
    send_email(transaction)  # Deterministic
    verify_all_constraints()  # Prove fidelity
    return audit_trail()  # Proof of execution

Step 5: Verification

✓ Transaction was verified
✓ Merchant was debited
✓ Customer was credited
✓ Email was sent
✓ All constraints maintained
Fidelity: 100%

6. Why This Works

Determinism

Same input → Same output, always. No variation.

Auditability

Every step is logged and verifiable.

Constraint Enforcement

Invariants are checked at each step.

Composability

Behaviors can be combined without interference.

Scalability

The system scales to complex workflows without degradation.


7. The Paradigm Shift

Old Paradigm

"How do we make LLMs better at business logic?"

Answer: You don’t. LLMs are probabilistic. Business logic is deterministic. They’re fundamentally incompatible.

New Paradigm

"How do we use LLMs to understand intent, then execute deterministically?"

Answer: Agentic Workflows with Isomorphism of Intent.


8. The Path Forward

This is not about replacing LLMs. It’s about using them correctly:

  1. LLMs for understanding: Parse intent, extract semantics
  2. Deterministic systems for execution: Run the parsed intent
  3. Verification for proof: Prove execution matches specification

This is the only way to build reliable systems in critical domains.


Key Takeaways

  1. LLMs are probabilistic, business logic is deterministic—they’re incompatible
  2. Hallucination, context collapse, constraint violation are fundamental failure modes
  3. Better prompts don’t solve the problem—they just hide it
  4. Deterministic execution is the only real solution
  5. Agentic Workflows implement this principle at scale

Next in the Series

“Orthogonal Orchestration: Why Gherkin and Figma are the Only Inputs You Need” — How to structure agentic workflows for maximum composability and reliability.