(Working Guest Paper)—GOVERNING NONDETERMINISTIC LLM INFERENCE: Liability, Testing, and Regulatory Standards in U.S. Law.

(Working Guest Paper)—GOVERNING NONDETERMINISTIC LLM INFERENCE: Liability, Testing, and Regulatory Standards in U.S. Law.

Emma Liu, University of Texas at Austin | May 2026

This paper was developed as part of Professor Frazier's AI and Law Policy Workshop course at the University of Texas School of Law. If you or your organization is interested in learning more about participating in future workshops, please email Kevin Frazier at kevin.frazier@law.utexas.edu.

Executive Summary

Artificial intelligence systems are being integrated into workflows at scale, from employment hiring to surgical navigation, without a coherent legal framework for addressing resulting harms. This memo examines the root cause of that gap: many modern AI systems, including large language models and other probabilistic inference systems, are nondeterministic in practice. Using healthcare as a case study, the memo analyzes how LLM nondeterminism maps onto existing U.S. liability law and identifies the testing and documentation requirements needed to govern these systems. 

The Problem

Unlike conventional software, large language models and many AI inference systems do not produce the same output for the same input. Their outputs are probabilistic, variable, and in some configurations impossible to reconstruct after the fact. This property, known as nondeterminism, is not a flaw to be fixed. It is a fundamental design feature of probabilistic AI systems. It is also a feature that U.S. liability law was not built to handle.

Challenges related to LLM inference nondeterminism remain nascent in U.S. liability doctrine. Although courts have addressed LLM hallucinations in the sanctions context, reported cases treating LLM nondeterminism as the causal basis for tort liability remain limited. However, AI inference systems such as machine-learning algorithms have presented similar problems as the market has been adopting AI. Surgeons using TruDi, an FDA-authorized medical device, reported that the machine-learning-powered surgical navigation system was giving them wrong coordinates mid-procedure. Two patients suffered serious injuries, including a stroke. After the AI feature was integrated, the FDA received more than 100 unconfirmed malfunction and adverse-event reports involving the device, compared with seven before integration. A separate example shows the evidentiary problem in clinical documentation. An AI transcription tool used by over 40 health systems was found to be regularly hallucinating clinical content, inventing medications and patient details that were never spoken, while deleting the original audio recordings that would have allowed verification. Two active lawsuits are pending in the TruDi matter. By contrast, no plaintiff has brought a claim in the transcription hallucination matter, not because no harm occurred, but because the evidence becomes difficult to produce once the original recording has been deleted. 

The Gaps

This memo surveys existing U.S. liability frameworks, including medical malpractice, product liability, HIPAA, the 21st Century Cures Act, FDA oversight, and state AI statutes, and identifies several structural gaps created by nondeterminism. First, there is no general requirement that clinical AI outputs be logged, making incident reconstruction systematically difficult, if not impossible. Second, existing testing standards do not adequately address probabilistic systems; most assume deterministic and reproducible outputs. Third, liability is not clearly allocated across the AI development and deployment chain, including the base model developer, fine-tuner, deployer, institution, and clinician. Fourth, nondeterministic outputs complicate foreseeability analysis because the same input may produce different results across runs or deployment contexts. 

The Proposal

This memo proposes a three-tier regulatory framework. The first tier is process-based, requiring documented variance assessments, change control logs, incident response protocols, and output logging specifications. Compliant entities receive a rebuttable presumption of reasonable care. The second tier is performance-based, establishing domain-specific minimum thresholds for accuracy, non-discrimination, and adversarial robustness, with enhanced liability protection for compliant entities. The third tier is disclosure-based, requiring clinician and patient notification of AI involvement and probabilistic output characteristics, with versioning logs that make each inference reconstructible.

The framework includes liability-shifting provisions that allocate responsibility based on which actor in the chain (developer, deployer, or infrastructure provider) controlled the configuration choice or technical condition that produced the harmful output, and a qualified audit privilege that protects compliance documentation from broad civil discovery while preserving regulatory and plaintiff access.

The Ask

Legislative actions are needed to address the issues presented by nondeterministic AI inference. The most urgent near-term actions are a federal logging mandate for clinical AI outputs under the ONC certification authority, congressional direction to NIST to develop probabilistic testing standards for AI systems in high-stakes domains, and a safe harbor provision in pending federal AI legislation that provides compliance-based liability protection calibrated to the tiered framework described here. These actions do not require new agencies or sweeping legislation. They require targeted amendments to existing authorities and a willingness to treat nondeterminism as the structural legal problem it is, rather than a technical detail to be addressed later.