AI Contract IP Rights: Inputs, Outputs, Grant Backs

The word "own" does a lot of comforting work in an AI contract, and most of it is false comfort. A customer can own its inputs outright and still hand the provider everything that matters through a license clause three lines down. That gap between the ownership label and the control that license scope actually grants is where the real money and the real risk sit.

This was the focus of a How to Contract webinar in the Contracts in the Real World series, hosted by Laura Frederick and featuring Shannon Yavorsky, a partner at Orrick, Herrington & Sutcliffe who heads the firm's global Cyber, Privacy & Data Innovation group and co-leads its AI practice, and Mike Dockery, Chief Legal Officer at Marveri, a software company building AI products for lawyers and a former corporate partner at Morrison & Foerster. Shannon took the customer side and Mike took the provider side, which made the conversation feel less like a lecture and more like watching the two sides of a real negotiation talk through where they would actually push.

They worked through three AI-generated sample clauses, one on inputs, one on outputs, and one on grant back licenses, and pressure tested each from both perspectives. Along the way they covered why ownership of AI output may be worth less than it sounds, how training pipelines change the meaning of a license, why definitions of "input" and "output" carry most of the weight, and what the biggest mistakes look like in practice.

Here are our top ten takeaways from the speakers' comments during the webinar:

Treat ownership and control as two separate questions. A clause can say the customer owns its inputs and still let the provider do almost anything with them through the license that follows. Shannon's point landed hard here. Control comes from license scope, use rights, sublicensing, and survival, not from the ownership label. When you read one of these provisions, find the license before you take comfort in the word "own."
Make the provider's license only as broad as the service requires. Some agreements limit the provider to what it needs to deliver the service. Others quietly grant rights to improve, train, and develop models, which is where ownership stops meaning much. If the service does not technically require training on your data, that is a fair thing to question rather than accept. Watch for language that reaches past the actual function of the product.
Pin down what "input" and "output" actually mean. Both Mike and Shannon kept returning to definitions because they carry the real weight. What a user types is not what goes into the model, since the platform and any foundation model provider transform it first. Output is messier still, an amorphous mix of customer data, provider data, and subprocessor data. A clause that looks fine reads very differently once you check how those two words are defined.
Remember that training is usually a one-way door. Once a model has trained on customer data or IP, you generally cannot untrain it, so those licenses are effectively perpetual, irrevocable, royalty free, and worldwide whether the contract says so or not. That reality should shape how hard you negotiate the input license up front. The nightmare scenario is a court later ordering infringing material out of a model that cannot be cleanly retrained. Knowing the door only swings one way changes what you are willing to grant.
Question whether AI output can even be owned. In the US and many other jurisdictions, copyright protection requires a human author, and works generated autonomously by AI are not copyrightable. So a clause saying you "own" the output may hand you something your competitors can freely copy. That does not make ownership language pointless, but it should temper how much you rely on it. For risk averse clients, a provider indemnity against output infringement can matter more than a paper ownership grant.
Watch for ownership that gets clawed back by retained rights. A provision can give the customer the output with one hand and subject it to undefined "retained provider IP" or "preexisting provider IP" with the other. That makes the ownership illusory, because you cannot fully use what you supposedly own. Press on what those retained categories include, since terms like templates, models, and business logic can swallow the grant. A clean ownership grant paired with a clear license to any embedded provider IP beats a messy grant every time.
Be careful what you represent about third party content. A customer representation that inputs do not infringe third party IP sounds standard, but it can be dangerous where inputs include user generated content, scraped data, open source, or mixed data sets. You may not have full visibility into the provenance, and the clause can make you the sole risk bearer. A blanket representation without a knowledge qualifier is worth a hard look. Providers resist qualifiers when they have no visibility into what you are uploading, so expect that to be a real negotiation.
Treat the grant back license as a value question, not a formality. A grant back lets the provider use the customer's IP to operate the system, and in the AI context its scope can quietly expand. Phrases like "internal business purposes" or "to improve the services" can stretch into cross customer reuse, benchmarking, and productization. Set clear limits so improvements derived from your data do not end up helping your competitors. The auto industry's surgical approach to data rights is starting to spread to other sectors for exactly this reason.
Do not confuse data with IP when you talk about ownership. There is no general concept of data ownership in statute or common law, just a patchwork of rights to use data for particular purposes. With pure data, the better question is who can do what, not who owns it. Reserve the ownership framing for actual IP like copyright, patents, and trade secrets. Keeping the two straight prevents a lot of confused drafting.
Meet aggressive or AI assisted counterparties with curiosity. Mike's biggest red flag was lawyers who do not understand the technology and respond by drawing every line as far to their own side as possible, sometimes asking for things that are technically impossible. That tends to expose gaps in their knowledge rather than strengthen their position. The better move is to get specific about what the technology actually requires and build the deal from there. Good fences make good neighbors, as Mike put it, so make the lines explicit in the document.

This recap is one of the ways we keep the learning going after the webinar ends. Our weekly newsletter brings you upcoming How to Contract events along with write ups like this one, so the insights stick even if you could not attend live. Subscribe now to get them in your inbox.