Microsoft‘s AI chief, Mustafa Suleyman, recently sparked controversy with his views on using content from the open web. During an interview with CNBC’s Andrew Ross Sorkin, Suleyman suggested that any content published on the open web becomes “freeware,” allowing anyone to copy and use it without restriction.
Suleyman’s perspective on open web content
When Sorkin asked whether AI companies have been effectively stealing intellectual property (IP) worldwide, Suleyman responded confidently. He stated that since the 1990s, there has been a social contract regarding content on the open web, treating it as fair use. According to him, anyone can copy, recreate, or reproduce such content freely.
His stance comes amid several lawsuits accusing Microsoft and OpenAI of using copyrighted online stories to train their generative AI models. While it’s not surprising to hear a Microsoft executive defend their practices, Suleyman’s public stance has raised eyebrows due to its boldness and potential legal inaccuracies.
Understanding copyright and fair use
It’s important to clarify that in the US, any work created is automatically protected by copyright the moment it’s made. There’s no need to apply for it; simply publishing it on the web does not void these rights. Waiving these rights is so challenging that special web licenses have been created to help manage them.
Fair use, on the other hand, is not determined by a social contract but by legal proceedings. It is a legal defence allowing some uses of copyrighted material, evaluated based on what is copied, why, how much, and whether it harms the copyright owner. Despite this, many AI companies, including Microsoft, argue that training AI models on copyrighted content falls under fair use, although few have been as forthright as Suleyman in their claims.
The debate over robots.txt
Suleyman also touched upon the concept of robots.txt, a text file websites use to instruct bots on which parts of the site they are allowed to crawl. He suggested that if a website explicitly states it should not be scraped for any purpose other than indexing, this constitutes a grey area that needs legal clarification.
While robots.txt is not a legal document, it has been a social contract since the ‘90s, guiding bots on proper web scraping etiquette. However, some AI companies, including Microsoft’s partner OpenAI, reportedly disregard these instructions, further complicating the debate.
Suleyman’s comments and the ongoing lawsuits highlight the tension between technological advancement and intellectual property rights. As the courts continue to address these issues, the legal landscape surrounding AI and copyright will likely evolve, impacting how content is used and protected in the digital age.