AI Data Use Clauses: Drafting Permitted Use the Right Way

Every AI vendor agreement turns on the same question. What can the vendor actually do with your data once the user hits enter? The traditional SaaS answer was a clause that said the vendor could use customer data to provide the service and nothing else. That worked when the product was static. AI broke the assumption, and the standard language has not caught up. Words like "training" and "anonymized" travel through these contracts undefined, and the protection a customer thinks it is getting can give away far more than intended.

This was the third session in the How to Contract series on AI and data contracting provisions. Host Laura Frederick was joined by Kate Aishton, founder of Aishton Law and a longtime product attorney who spent years as product counsel at Instagram, and Matthew Kohel, a partner at Saul Ewing in Baltimore who leads the firm's AI team. Kate took the vendor side and Matt took the customer side. Their perspectives made the conversation useful because the same clause looks different from each chair, and the gap between the two views is where most of the negotiation happens.

The format used generic data use language drafted by an AI tool, then asked both speakers to find what was wrong with it. The three provisions in focus were the basic scope of permitted use for service delivery, the AI model training restriction, and the secondary use provision for analytics and benchmarking. The conversation covered why the SaaS approach no longer fits, why definitions are now the real battleground, how subprocessors and third party models change the analysis, and where small wording shifts quietly lower the bar.

Here are our top ten takeaways from the speakers' comments during the webinar:

Start with the data flow, not the clause. Before you draft or redline a data use provision, find out what actually happens after the user hits enter. Ask where the input goes, what other data gets pulled in, who sees it, and what the vendor wants to do with it afterward. The permitted use clause only protects you if you understand the system it is describing. Most of the real risk lives in the part nobody thought to ask about, and that is the part worth your time.
Make the contract match what your business team actually expects. It is common to get a clean briefing from your internal client about why they need a product, then get the vendor's contract back with a described use that does not match. Run that comparison on purpose rather than assuming the two line up. The permitted use clause is exactly where the disconnect either gets caught or gets signed. Catching it is cheap. Discovering it later, after the data is already flowing, is not.
Do not over-lock the data just because you can. Locking down every possible use feels safe, but it can leave you with a worse product, because AI tools improve by working off real data and real scenarios. Treat permitted use as a balancing exercise, not a pure restriction. Push for a properly calibrated and defined level of de-identification rather than a blanket ban on improvement. The goal is a product that gets better over time without your data leaking value to people who should not have it.
Enumerate the vendor's other legitimate purposes instead of accepting a vague catchall. A vendor genuinely needs to use data for fraud prevention, security, and legal compliance, and those are not the same thing. Insist that each one is named in the clause. When the purposes are not enumerated, the provision becomes a contest of negotiating power over whose vague language wins. You usually lose that contest without noticing, because broad language reads as normal until something goes wrong.
Treat "training" as undefined until you define it. Training, retraining, and fine tuning are different activities, and the word may carry a technical meaning your contract does not capture. Get the vendor to describe how the product actually works, including whether multiple models are involved and whether a third party model sits underneath. A restriction that only covers the vendor's own model should be widened to cover any model, including third parties. If the vendor cannot explain its own architecture clearly enough to define the term, that hesitation tells you something worth knowing.
Pick a real de-identification standard and name it. "Anonymized" has no consistent legal meaning, and the old assumption that re-identification would take years no longer holds when an AI tool can do it in minutes. Reference a published standard like NIST, or HIPAA for health data, and calibrate the bar to the sensitivity of the data. Sensitive consumer data needs a far higher bar than employee email addresses. Naming the standard is what makes the protection enforceable later, because a vague term gives you nothing to point at in an audit.
Separate the vendor's technology layer from third party models in the clause. You probably want the vendor's own product to improve once there is a solid de-identification standard, but you do not want your data enhancing a third party model. Draft those as two different things with two different sets of restrictions. Collapsing them into one line gives away control you would not give away on purpose. This is the kind of distinction that looks technical but is really about where your data ends up and who benefits from it.
Watch "derived from" and similar language. Information derived from your confidential material can itself carry trade secret value, so an exception that lets the vendor use "derived" data is not as small as it looks. Swapping in "generated from" does not fix it, because the problem is the concept, not the exact word. The real move is to understand what the vendor collects, including patterns of data, before you agree to any exception. A note-taker that surfaces insights from an R&D meeting is a good reminder of how much value can ride on that one phrase.
Break secondary use categories apart and treat them differently. "Analytics, insights, benchmarks, and reports" is a wide range of things with very different risk profiles, and bundling them invites the broadest possible use. Define each category, and define usage data and metadata rather than assuming they carry a standard meaning. In a narrow market, aggregation does not actually hide you, so do not rely on it as if it does. The question to ask the vendor is why every one of those categories needs to be there, not whether the list looks reasonable.
Protect everyone downstream, and watch for standards that quietly drop. A "may share" clause should protect your users and your customers from identification, not just your own company. Compare standards across provisions too, because a contract that says "anonymized and aggregated" in one place and only "aggregated" in another has lowered the bar on you. Small wording shifts like that are where the real exposure hides. They rarely look like concessions, which is exactly why they slip through.

This recap is one of the things our weekly How to Contract newsletter delivers, along with a heads-up on upcoming webinars in this AI and data contracting series. Subscribe now so the next deep dive and recap land in your inbox whether or not you can join live.