Comment: Questions about little-known US web scraper could mean increased legal scrutiny for AI training data

By Mike Swift
February 10, 2024, 12:53 AM GMT

A big chunk of the data used to train the AI systems of the likes of OpenAI, Google and Meta Platforms was scraped from the Internet by Common Crawl, a little-known nonprofit...

Already a subscriber? Click here to view full article