AI Contract Data Retention and Deletion Provisions

Data retention and deletion provisions are doing different work in AI contracts than they did in the SaaS era, and the language most contracts still use has not caught up. The clauses lawyers were comfortable with five years ago assumed data sat in a database that could be queried and emptied. AI products do not work that way, and the gap between the old language and the new reality is where most of the risk has quietly relocated.

This webinar, hosted by Laura Frederick, brought in Shannon Yavorsky, Partner at Orrick and Head of the firm's global Cyber, Privacy and Data Innovation Group, and Matthew Hitchcock, Associate General Counsel at Vault Insurance. Shannon brings outside-counsel depth on data and AI work for vendors and customers across the market. Matthew brings inside-the-deal experience as a buyer in a regulated industry. The pairing made the conversation useful because the same provision drew very different reactions depending on which side of the table you were on, and the speakers were direct about where they would hold the line.

The webinar walked through three AI-generated data retention and deletion provisions, the kind that look polished on the surface but fall apart on a closer read. The discussion covered the interplay between the MSA, the DPA, and the NDA, the trap of section-only definitions of customer data, how to handle de-identification, why "commercially reasonable efforts" and "legitimate business purposes" are vendor gifts customers should refuse, and what really needs to survive termination.

Here are our top ten takeaways from the speakers' comments during the webinar:

Stop treating AI contracts like SaaS contracts. SaaS-era retention and deletion language assumes data sits in a database that can be queried and emptied. AI products do not work that way. Data that goes into model training is hard to extract, sometimes impossible. We need contract language that reflects how the technology actually handles data, not what we drafted in 2019.
Read the MSA, the DPA, and the NDA as one document. The retention, deletion, and de-identification terms appear in all three, with different definitions and different carve-outs. The DPA came onto the scene late, after vendors started training on personal information, and it does not always sit cleanly alongside the MSA's data provisions. Redlining only the MSA misses where most of the risk actually lives. Block time at the start of the deal to map how the three documents interact.
Customer data needs a real definition, not an inline one. The AI tools love putting "for purposes of this section, customer data means..." inside a single clause. That is a trap. Customer data shows up in indemnification, breach, confidentiality, and elsewhere, and a section-only definition breaks the contract. Push for a dedicated definitions section that covers metadata, prompts, logs, and derived data explicitly.
Derived data is where vendors take what they want. Vendors typically claim full ownership and IP rights in derived data, and a vague customer-data definition lets them do it without surfacing the issue. Matthew flagged this as a common move worth pushing on. We should call it out directly in the definitions section and decide whether the customer is comfortable with the vendor monetizing patterns and insights drawn from its data. That decision belongs at the front of the negotiation, not at the back.
De-identification is not a free pass anymore. With enough additional data points, an individual can often be re-identified from a "de-identified" dataset, sometimes from as little as one or two extra fields. Matthew has seen this play out from his time near a data brokerage business. The contract needs a defined de-identification standard, warranties that the vendor will not re-identify, and a decision about whether the vendor can use de-identified data at all. The old vendor pitch that the data has been de-identified, so the customer should feel completely at ease, is no longer good enough.
"Commercially reasonable efforts" lets vendors stop when it gets expensive. Matthew explained that a vendor can decline to delete data on the grounds that it would cost too much. "Best efforts" forces them through whether it costs money or not. The two standards land in very different places legally, and the difference matters here. We should never let "commercially reasonable" stand as the operative standard for deletion without external guardrails like a SOC 2 reference or a scheduled data management policy.
"Standard data management practices" means whatever the vendor wants. Internal standards can change tomorrow, and the vendor's standard could be no standard at all. Even where the vendor has shared a disaster recovery plan or a written destruction policy in diligence, the contract usually does not bind them to it. We should tie any reference to "standard practices" to a specific external certification or to a written policy attached as a schedule. Anything else is the vendor committing to nothing.
Backup-system carve-outs need rails. Residual copies, archival copies, and backup retention are reasonable asks from vendors. They turn into indefinite retention without limits on duration, logical separation, no active processing, and automatic purging on rotation cycles. Shannon framed it bluntly. Without those rails, the backup carve-out becomes a backdoor for keeping data forever. We should not approve a backup carve-out without each of those guardrails.
Audit rights and confidentiality obligations have to survive. The post-term retention clause usually says retained data is "subject to the confidentiality obligations of this agreement," but if the agreement is terminated, those obligations may not actually survive. Matthew flagged this trap directly. Check the survival section, and do not rely on a vague catch-all that the provisions that should survive will survive. Audit rights especially need to extend for some period after termination, because that is when the customer needs to verify deletion happened.
Bring IT into the conversation when the vendor pleads technical infeasibility. When the vendor says data cannot be extracted from a trained model, we should not accept the assertion at face value. Matthew recommended putting our IT team in a room with theirs to stress-test the claim. A real technical constraint usually holds up under that scrutiny. A pretextual one falls apart, and we walk out with leverage we did not have before. Even tech-savvy lawyers cannot have this conversation alone.

The How to Contract weekly newsletter brings you upcoming webinars and recaps like this one, so you can keep up with the practical work of contract drafting and negotiation. Subscribe now and get the next one in your inbox.

Data Retention and Deletion in AI Contracts: What to Draft, What to Push Back On

Keep Reading

Why you should put a price on contract risk | Newsletter July 16, 2026

Weekly ContractsCon 2026 Ticket Giveaway: Official Rules

Why You Should Put a Price on Contract Risk

Weekly Lesson: How to Draft AI Model Training Provisions

Sometimes there's nothing to negotiate. It's just a risk decision. | Newsletter July 9, 2026

Sometimes there's nothing to negotiate. It's just a risk decision.

Future-proof your contract skills