
One of the first questions many of us ask when negotiating AI product contracts is “does this vendor train on our data?” We’ve had it drilled into our psyche over the past two years that we need to focus on that to prevent the AI product vendor and underlying model from taking all our proprietary information.
If we get a “no” answer, we check the box and move on. We’ve done our duty to protect against the risk. Of have we?
Our AI vendor compliance and governance webinar this week with Olga Mack and Linsey Krolik helped me realize that I’ve approaching the “does it train on our data” question the wrong way.
The problem with focusing on “does it train on our data” question first is we are asking a question too early in the process. When we focus on the training data language before we understand what the AI actually does, we may be drafting around a problem that does not exist. When we do that, we risk not seeing the transaction’s risks that truly do matter.
Olga Mack, a former general counsel and current CEO of TermScout, made the point directly when an audience member asked about protecting company data from vendor training.
I was thinking of it as a simple question and expected a simple “yes, you should do that” kind of answer. But that’s not what Olga said. Instead, she stated, our first step should not be focusing on that language. She emphasized we need to stap back and start with understanding how the product processes the different types of data.
There’s a huge variation in how AI products interact with customer data, how the products generate their own data, and how the vendor, customer and product work the data.
Many AI products these days are not training on any meaningful customer data. In thsoe cases, spending time at the start of our negotiations getting vendor commitments not to train on customer data do not make us safer. As Olga explained, it consumes negotiating capital on a phantom risk. The actual data flows, what the vendor's AI ingests, processes, retains, or shares, may go unexamined because we spent our attention in the wrong place.
The fix requires a step most of us skip. Olga called it an AI usage map. Before opening the contract, inventory all of the vendor's AI uses. Classify them by type. Ask specifically whether training is involved and, if so, with which data. She highlighted how this is similar to how privacy lawyers create data maps. Except for AI contracting, we need to map AI data use rather than data categories.
This map provides the information you need to then look at the data training language. As Olga said, doing the mapping first means "[y]ou will know exactly what words to use once you know what the problem is."
To be clear, Olga is not saying to skip the training data question and ensuring your contract has the appropriate restrictions. Her point was that negotiating training data representations without understanding the underlying architecture may result in language that is technically accurate but practically useless. A representation built on a map of actual AI use empower us to create narrower and more specific restrictions tailored to this specific transaction, customer, and use case. That customization also makes it much harder for a vendor to wiggle out of when something goes wrong.
The lesson I learned, and the contract idea from Olga that’s worth repeating, is to create the data map before the draft. Drafting will be better because it follows the map.
Thanks so much Olga and her co-speaker Linsey Krolik for helping all of us improve our AI contract drafting skills.





