Framework

GAMP 5 for AI agents: a validation framework

GAMP 5 already tells you to scale validation to risk. Here is how to apply that to an AI agent, where the output is non-deterministic and the model itself can never be the thing you validate.

GAMP 5 categorises computerised systems by complexity and configurability, then asks you to spend validation effort in proportion to risk. A language model breaks the part of that model that assumed a system gives the same output for the same input. So the unit you validate shifts. You no longer validate a frozen configuration that produces a predictable result. You validate the controls that wrap a non-deterministic component: what it can reach, what it records, and who signs for what it produced.

This framework keeps the GAMP 5 instinct and updates the object. You categorise the agent by its GxP impact and match the validation effort to that tier, then show that the controls around the agent satisfy GAMP 5 for AI, 21 CFR Part 11, and ALCOA+. Everything below assumes the agent runs inside the client's own Claude Enterprise tenancy, so the data, the logs, and the access policy stay under the client's control.

Step 1. Define the agent's GxP footprint before anything else

Write down, in one paragraph, what decision or record the agent touches. An agent that drafts an internal meeting summary has no GxP footprint. An agent that proposes a deviation classification, drafts a CAPA, or assembles a section of a regulatory submission does. The footprint, not the cleverness of the model, sets the tier. If the output can change a GxP record or influence a quality decision, treat it as in scope and move on.

Step 2. Assign a risk tier

Use three tiers, each adapted from the GAMP 5 idea that effort follows impact.

  • Tier A, supporting. The agent drafts non-GxP text or surfaces information a person was going to read anyway. No record is altered without human authorship. Light validation: confirm scope, confirm logging is on, document intended use.
  • Tier B, GxP-adjacent. The agent produces content that feeds a GxP record but a qualified person rewrites and owns the final version. Validation covers intended use, data access scope, the audit trail, and a documented review step before anything is signed.
  • Tier C, GxP-determining. The agent's output, if accepted, directly becomes or shapes a GxP record or decision. Full validation: a written specification of intended use, access scoped to the minimum data the task needs, complete action logging, versioned records of every draft, defined human review and sign-off, and a tested failure path for when the agent gets it wrong.

Step 3. Specify intended use and the boundary

For Tier B and Tier C, write what the agent is for and, just as important, what it must not do. Name the systems it can read, the data classes it can see, and the actions it can take. This is the GAMP 5 specification step, narrowed to an agent. The boundary is what you actually validate, because the boundary is deterministic even when the model is not.

Step 4. Wire the three controls that carry the regulatory weight

  • An audit trail. Every action the agent takes, every tool it calls, every record it reads is logged with time and identity. This is the contemporaneous, attributable spine of ALCOA+ and the core of a Part 11 audit trail.
  • Versioned records. Each draft the agent produces is kept as a distinct version, so an inspector can see the original, the human edits, and the final. That is the original and accurate part of ALCOA+, made legible.
  • Human accountability. A named, qualified person reviews the output and applies the signature. The agent does not sign GxP records. That signature, bound to the versioned record and the trail, is the attributable and accountable evidence Part 11 asks for.

Step 5. Test the failure path, not just the happy path

A demo shows the agent doing well. Validation cares about the day it does badly. For Tier C, run cases where the source data is wrong, ambiguous, or missing, and confirm the human review step catches the bad draft before it reaches a record. Document what the agent did and how the control held. An agent you can only show succeeding is an agent you have not validated.

Step 6. Decide CSA over CSV where it fits

The FDA's Computer Software Assurance thinking lets you put critical thinking and risk-based testing ahead of exhaustive documentation. For Tier A and much of Tier B, that is the proportionate path. Tier C still earns heavier evidence. If the CSA and CSV distinction is new to your team, the difference and when each applies is set out in CSA vs CSV.

How the controls map to Part 11 and ALCOA+

Part 11 wants attributable electronic records, secure audit trails, and valid electronic signatures. The audit trail gives you the first two. The human sign-off on a versioned record gives you the third. ALCOA+ wants data that is attributable, legible, contemporaneous, original, and accurate, plus complete, consistent, enduring, and available. Logging at the moment of action covers contemporaneous and attributable. Versioning covers original and accurate. Keeping it inside the client tenancy covers enduring and available. None of this depends on the model being right every time. It depends on the controls around the model being honest about what actually happened.

Why this is the only defensible shape

Most AI sold to regulated teams sells the model and leaves the validation as the buyer's problem. In a pharma, CRO, or CDMO setting the model was never the hard part. The hard part arrives when an inspector asks who approved this output and on what evidence. An agent built to this framework can answer that. That is the whole point of audit-ready AI agents, and it is why the validation and the regulatory affairs view come first, with access and governance drawn before the build, not bolted on after.

A downloadable GAMP 5 for AI agents validation checklist, with the tiering questions and the Part 11 and ALCOA+ mapping, is available on request.

Common questions

Does GAMP 5 apply to AI agents at all?

GAMP 5 was written for computerised systems, not specifically for LLMs. Its risk-based logic still holds: scale the validation effort to GxP impact. What changes is that a language model is non-deterministic, so you validate the controls around the agent, not just a fixed configuration.

How do you validate something that gives different answers each time?

You stop trying to prove the output is fixed and start proving the boundary is fixed. You define what the agent is allowed to touch, log every action, version every record, and require a named human to sign the output. The non-determinism lives inside a controlled, traceable box.

Where does the human fit in a Part 11 sense?

Part 11 needs an attributable, accountable signature on the record. The agent drafts; a qualified person reviews and signs. That signature, tied to the versioned record and the audit trail, is what an inspector follows. The agent removes the blank page, not the accountability.

Book an AI Audit

15 min. 5-day written diagnosis. No deck.