This is the new MLex platform. Existing customers should continue to use the existing MLex platform until migrated.
For any queries, please contact Customer Services or your Account Manager.
Dismiss

Comment: Questions about little-known US web scraper could mean increased legal scrutiny for AI training data

By Mike Swift ( February 10, 2024, 00:53 GMT | Comment) -- A big chunk of the data used to train the AI systems of the likes of OpenAI, Google and Meta Platforms was scraped from the Internet by Common Crawl, a little-known nonprofit with no paid employees that occupies a nondescript address in Beverly Hills, California. A report by the Mozilla Foundation, which called Common Crawl  "likely the most influential nonprofit you've never heard of," took issue with large AI companies using Common Crawl for training data because it’s not filtered for hate speech, copyrighted materials or personal data. A review of Common Crawl’s US tax returns by MLex shows that it’s had a 500 percent increase in contributions over the past two years, as its 9.5 petabytes of data have become prominent training data for big AI companies.A large chunk of the data used to train AI systems such as ChatGPT comes from a little-known nonprofit with no paid employees whose address comes back to a nondescript building beside a parking deck in Beverly Hills, California....

Prepare for tomorrow’s regulatory change, today

MLex identifies risk to business wherever it emerges, with specialist reporters across the globe providing exclusive news and deep-dive analysis on the proposals, probes, enforcement actions and rulings that matter to your organization and clients, now and in the longer term.


Know what others in the room don’t, with features including:

  • Daily newsletters for Antitrust, M&A, Trade, Data Privacy & Security, Technology, AI and more
  • Custom alerts on specific filters including geographies, industries, topics and companies to suit your practice needs
  • Predictive analysis from expert journalists across North America, the UK and Europe, Latin America and Asia-Pacific
  • Curated case files bringing together news, analysis and source documents in a single timeline

Experience MLex today with a 14-day free trial.

Start Free Trial

Already a subscriber? Click here to login