Glossary

CSV for AI systems

Computer System Validation is the discipline pharma already runs. Here it is applied to AI and LLM tools that do not always answer the same way.

Computer System Validation (CSV) for AI systems is documented proof that an AI or LLM tool does what its intended use says, governed by risk assessment, qualification stages (IQ, OQ, PQ), and lifecycle controls. It scopes the heaviest testing and monitoring onto the paths where a wrong output could reach a patient or a GxP record.

Most of CSV transfers to AI without much argument. You write intended use, run a risk assessment, qualify the system through IQ, OQ, and PQ, and keep change control and periodic review across its life. A drafting aid a reviewer reads line by line sits low on the risk scale. An agent that writes to a validated system on its own sits high, and that is where your evidence has to be strongest.

The point that breaks the old playbook is non-determinism. Classic CSV assumes the same input gives the same output, so one passing PQ run holds. An LLM does not promise that. So validation shifts to the controls around the model. You set acceptance criteria on the high-risk paths, you put a human in the loop where the output is consequential, and you monitor in production instead of trusting a one-time qualification. Pair it with an AI audit trail so each output stays traceable, keep it aligned with 21 CFR Part 11 for electronic records, and read the risk-based foundation in GAMP 5 for AI.

Common questions

What do IQ, OQ, and PQ mean for an AI system?

Installation Qualification confirms the system and its tenancy are set up as specified. Operational Qualification tests that functions behave against defined acceptance criteria. Performance Qualification proves the system works for its intended use in the real process. For an LLM, OQ and PQ carry more weight because output can vary, so they lean on acceptance criteria and human review rather than a single passing run.

Can you validate an LLM that does not give the same answer every time?

Yes, but you validate the controls around it, not a fixed output. Risk assessment finds where variance matters, acceptance criteria define what good looks like on those paths, a human signs high-risk steps, and ongoing monitoring replaces the assumption that one qualification holds forever.

Book an AI Audit

15 min. 5-day written diagnosis. No deck.