Personal Data Definitions in AI Contracts: What to Fix

Most data protection addenda were drafted for a much simpler world. A vendor receives personal data, processes it on the customer's behalf, returns or deletes it. AI products do not behave that way, and the standard DPA template we have all been recycling for years quietly stops working once a model starts training on inputs, deriving new data from outputs, and routing information through a stack of cloud providers, model hosts, and monitoring tools nobody fully tracks.

This How to Contract webinar was hosted by Laura Frederick and featured Shannon Yavorsky, Partner at Orrick and Head of the firm's global Cyber, Privacy & Data Innovation group, alongside Arohi Kashyap, Partner at Kashyap Partners with a transactional and technology practice across California and India. Shannon spoke from the customer side. Arohi spoke from the vendor side. The split made the trade-offs concrete, because the same clause that looked acceptable to one side often broke for the other.

The conversation worked through three AI-generated clauses that looked polished on the page and fell apart on a careful read. The personal data definition. The data subject categories clause. And the subprocessor flowdown. Along the way the discussion covered the DPA-MSA divide for non-personal data, the controller-processor blur when vendors want to train on customer inputs, why "reasonably available" qualifiers need pinning down, and why the standard subprocessor list and objection rights no longer match how AI vendors actually operate.

Here are our top ten takeaways from the speakers' comments during the webinar:

Route personal data through the DPA and everything else through the MSA. The old approach of treating the DPA as a catch-all for data terms creates overlap and confusion. We are better off making the DPA apply only to personal information and pushing confidential business data, proprietary inputs, and other data buckets to the MSA. That separation forces both sides to think clearly about what kind of data they are actually talking about. It also makes the liability allocation cleaner, because the buckets stop competing for the same caps and carve-outs.
Watch for vendors slipping from processor to controller through training rights. When an AI vendor wants the right to use inputs, outputs, or interaction data to improve its model, the controller-processor analysis has to be redone. Shannon noted that most standard DPAs were not drafted with this distinction in mind. We should be explicit about whether the vendor's use of data for model improvement counts as processing on our behalf or as a separate use that puts the vendor in a different regulatory bucket. The downstream consequences are real, because a controller has independent obligations that the customer cannot offload through contract.
Cover derived data and outputs, not just inputs. Most personal data definitions only capture what the customer submits. That misses everything the AI produces from those inputs, and derived data is still subject to the GDPR and the CCPA. Arohi made the point that this gap exposes the vendor to liability under the underlying regulation rather than under the guardrails of the DPA. We should make sure the definition reaches inputs, outputs, and derived data.
Pin down qualifiers like "reasonably available." When a definition turns on whether information is "reasonably available to the provider," that qualifier has to be defined. Is it public data? Data accessible through the model? Data the customer happens to have submitted? Leaving it open expands the scope of the definition in ways neither side can predict and creates fights later about what the contract actually covers.
Treat de-identification standards skeptically when the provider sets them. Providers with access to large pools of personal data have an incentive to call data de-identified and reuse it. Shannon's instinct as a customer is to strike unilateral de-identification carve-outs entirely. If we do allow one, we need to understand the standard, evaluate it against a recognizable legal framework, and not just defer to whatever exhibit the vendor attached. Provider-defined standards rarely match the GDPR or the CCPA definitions, and that gap is where the data ends up getting reused.
Stop treating data subject categories as a closed list. AI prompts and uploaded documents sweep in job candidates, vendors, third-party contractors, and people who just happen to be mentioned. A static list of categories does not reflect what the AI actually processes, and the GDPR and the CCPA expect the description of processing to reflect reality. Narrow lists invite regulatory risk rather than reducing it. The cleaner approach is to acknowledge the open-ended nature of AI inputs in the contract and shift some of the responsibility to the customer for the data its users actually put in.
Move from effort-based to result-based obligations where the technology allows. "Reasonable technical measures" sounds vendor-friendly, but regulators will judge the vendor against the harm, not against whatever measures the vendor chose to deploy. Arohi recommended that vendors define their measures concretely and tie the obligation to that defined standard. That gives the customer something specific to diligence and gives the vendor a defensible standard if something goes wrong. The contract should match the technology rather than borrow the soft language we used to use in SaaS deals.
Do real diligence on technical capability before signing. Shannon noted that many new AI vendors claim full GDPR compliance, and the claim does not always hold up. Under the GDPR, the regulator looks at the customer before the vendor when something goes wrong. The contract sits on top of the diligence. It does not replace it. We should be running diligence on the vendor's actual technical capability to deliver what it is promising before we negotiate the contract terms.
Demand subprocessor lists that match how AI vendors actually operate. A list available on written request does not work when the vendor's stack runs through cloud providers, model hosts, monitoring tools, and safety vendors. We should be asking for a publicly available, continuously updated subprocessor list and tightening the objection process so it can actually be exercised. If we have real concerns about a subprocessor, the contract should give us a termination lever, not just a vague duty to cooperate in good faith. Point-in-time disclosure is functionally meaningless when the stack changes month to month.
Be honest about who controls what. The vendor cannot control what users type into the prompt. The customer cannot fully police its own users. Both sides need to acknowledge that some inputs and outputs sit outside either party's complete control and draft accordingly. Arohi made the point that vendors who say yes to commitments they cannot deliver create exposure for both sides. The cleaner contract starts with a clear-eyed map of where control sits.

If you enjoyed this recap, our weekly newsletter brings you more like it, plus the schedule for upcoming How to Contract webinars. Subscribe now and keep these insights coming whether or not you can join us live.

Drafting Personal Data Definitions for AI Contracts: Why Your DPA No Longer Fits

Keep Reading

Why you should put a price on contract risk | Newsletter July 16, 2026

Weekly ContractsCon 2026 Ticket Giveaway: Official Rules

Why You Should Put a Price on Contract Risk

Weekly Lesson: How to Draft AI Model Training Provisions

Sometimes there's nothing to negotiate. It's just a risk decision. | Newsletter July 9, 2026

Sometimes there's nothing to negotiate. It's just a risk decision.

Future-proof your contract skills