Following several high-profile investigations, these US tech companies have suspended the training of generative AI models in the 27-nation EU this year.
Regulators are concerned that the training could violate the EU's General Data Protection Regulation through generative AI models ingesting vast amounts of personal data without users' consent or applying the correct legal basis.
The training of AI models relies on vast amounts of data from the public Internet, including personal, non-personal and mixed sets, most of them scraped from publicly available sources, such as social media sites.
AI model training has raised a host of complex questions on how data protection rules should be applied. How can AI companies train generative models while also complying with GDPR principles such as lawfulness, transparency and fairness, purpose limitation, data minimization, storage limitation, accuracy, security and accountability? There's also the burning question for authors and creators on whether AI models engage in copyright infringement.
Once a model is trained, and has outputs via a chatbot, how can users exercise their right to correct false information? How should the company realistically fulfill their obligations under the EU's data protection rules and seek consent before processing special categories of personal data, such as political or religious beliefs or sexual preferences?
Until European data protection authorities reach a consensus on these issues — and particularly the vexed question of using "legitimate interest" as a GDPR legal basis for data processing — tech companies will avoid AI model training in Europe.
— Irish investigations —
After a flurry of probes into OpenAI's ChatGPT last year — including by authorities in Italy, France, Spain and Germany — the latest investigations into AI model training have been prompted by complaints from privacy advocates.
Austrian advocacy group Noyb filed 11 coordinated complaints last June against Meta's AI models' data scraping (see here). "We can’t just sacrifice data protection just to have this technology out in the open. Generative AI is very useful, but there are ways to make it compatible" with data protection rules, Kleanthi Sardeli, a lawyer from Noyb, told MLex.
Scraping means personal data is being used for commercial purposes without users even knowing, representing a loss of control, transparency and then privacy, Guido Scorza, a board member of the Italian data watchdog, warned in July (see here).
— Legitimate Interest —
Training of AI models has sparked a debate in Europe over how strict privacy regulators should be in applying the GDPR. Some authorities, such as in France, have taken a pragmatic approach, arguing that AI companies should be allowed to invoke "legitimate interest" as a legal basis for the models' processing of personal data.
Other authorities, such as in Italy and the Netherlands, are deeply skeptical about the use of legitimate interest. The Dutch authority said in guidelines for AI companies that "scraping may not be used with the sole interest of making money."
OpenAI, Google and Meta, responsible for the largest AI models, have opted for legitimate interest.
Companies that use this legal basis don't have to ask for permission to process the data in question. But they must give users the ability to object and must perform risk assessments balancing their activities against people’s rights.
Italy's Scorza has said it is “unsustainable” that businesses are using citizens’ personal data for commercial gains and that there’s no way for regulators to stop the practice (see here). He said European lawmakers need to find a “high-level political solution” to address the “unacceptable” problem of AI systems using citizens’ personal data to train their models.
Mark Zuckerberg, Meta's chief executive, is frustrated with the divergent approaches by the EU's data protection authorities on AI models. In June, Meta suspended the training of Meta AI on June 14 (see here). In July, the operator of Instagram and Facebook ditched plans to roll out its multi-modal Llama service because of Europe's "unpredictable" privacy regulations.
Under the GDPR, companies are regulated for their EU operations by the authority in which they have their EU headquarters, known as the "one-stop shop" mechanism. For most US tech companies, that authority is the Irish Data Protection Commission.
In an interview with tech news outlet The Verge in September, Zuckerberg said that the one-stop shop was at risk because authorities other than the regulator in Dublin have been investigating Meta's AI model. He accused the authorities of "backsliding" on the one-stop shop rules.
"There’s no doubt that when you have dozens of different regulators that can ask you the same questions about different things, it makes it a much more difficult environment to build things," Zuckerberg said. "I don’t think that’s just us. I think that’s all the companies."
— Threat to innovation? —
Leading European companies are also openly questioning the tough stance taken by the bloc's data privacy watchdogs on the GDPR and AI models.
A group of companies, led by Meta, wrote an open letter to EU leaders, warning that Europe will fall further behind in the global race to attract investment for AI companies thanks to leading social-media companies avoiding AI model training in the EU.
"Regulatory decision making has become fragmented and unpredictable, while interventions by the European data protection authorities have created huge uncertainty about what kinds of data can be used to train AI models," the Sept. 19 letter said.
Noyb is concerned that generative AI models' output could reveal personal information or produce false personal data about individuals. It argues that AI companies should be barred from data scraping because they couldn't pass a three-part test outlined by the EU's highest court to invoke the use of legitimate interests as a legal basis to scrape websites.
First, the company must identify a legitimate interest, such as defining the purpose of the AI technology once it is deployed; second, it must show that it needs to process personal data for the purposes of the legitimate interest; third is a balancing exercise in which the company's purpose for using personal data is weighed against users' interests, rights and freedoms.
— Regulatory divergence —
Despite both having nearly identical data protection laws, earlier in September Meta said it was resuming its plans for collection in the UK after receiving “clarity” from the UK's privacy watchdog, the Information Commissioner's Office, on its opt-out process for training AI.
In the EU, the project is still on hold while the Irish Data Protection Commission investigates and seeks an opinion on AI model training from the European Data Protection Board, the umbrella group for the bloc’s data authorities (see here).
Anu Talus, the board's chair, has decided to seek an extension for issuing the opinion because of "the complexity of the subject matter," a spokesperson for the board said in an e-mail in September (see here).
The uncertainty over the EU's direction on AI regulation has attracted the attention of the US's largest technology companies.
Kent Walker, Google's chief legal counsel, said in a blog post on Oct. 1 that Europe risks slipping further behind the US and China in competitiveness terms without a "regulatory and enabling environment to spur the adoption of AI solutions at scale."
"With AI promising to power competitiveness and bring economic, social, and sustainability benefits on a scale we’ve not seen before, a targeted approach to regulation is needed — one that focuses not on starting from scratch, but on filling specific gaps in existing laws to give companies of all sizes the legal certainty they need to invest in new products and services," Walker said.
When asked what clarity looks like on AI training, Zuckerberg said in the interview, "It starts with having some framework on what the process looks like for working through that."
With the publication of the EDPB's opinion on AI training by the end of the year, the process of bringing more legal certainty may start.
Please e-mail editors@mlex.com to contact the editorial staff regarding this story, or to submit the names of lawyers and advisers.