Jacob Ridley, a senior hardware editor, discusses the ongoing debate about the use of copyrighted works for training artificial intelligence (AI) systems. The debate centers around the collection of data by AI companies, which is a contentious issue due to the exponential growth of AI. Many argue that AI firms have been using data freely from the web without permission, while AI leaders like Mustafa Suleyman, the CEO of Microsoft AI, argue that the “social contract” of content on the open web since the 1990s has been that it is fair use. However, Ridley disagrees with this notion, stating that copyright law covers original literary, dramatic, musical, or artistic works, and that copyright is automatically applied to these works.
Ridley argues that the creations of generative AI are original works and therefore qualify for automatic copyright. The question of who owns the copyright to AI-generated art is an ongoing debate, with US courts currently ruling against granting copyright in these instances. Ridley also discusses the use of copyrighted works for training purposes, stating that the copyright owner gets to say who can use its images and how. He explains that without permission, a user could be sued for damages or even get an injunction banning them from publishing or repeating an offense again.
Ridley is frustrated by the moves from Google and Microsoft to use AI to summarize articles into little regurgitated bites, arguing that this threatens the business of the internet. He also discusses the EU’s Artificial Intelligence (AI) Act, which includes a transparency requirement for “publishing summaries of copyrighted data used in training” and rules on compliance with EU copyright law. However, the EU also includes some get-outs allowing for data mining of copyrighted works in some instances. The UK has similar exceptions to the 1988 Act, but these are generally not considered a viable defense for large AI firms with public, and commercial, generative AI systems.
Ridley concludes by stating that by acting as though these rules don’t apply to them, AI firms have largely gotten away with it to-date. He argues that if copyright owners don’t manage to fend off AI, what will become of the internet, or “open web,” as we know it? Will an artist want to publish anything online? Will social media platforms arise with the promise to be ‘AI-proof’? Will the internet become more siloed as a result, split off into smaller communities off the beaten track and away from the prying eyes of Google, Microsoft, and crawlers sent out by dataset companies? These are all questions that remain to be answered as the debate around AI and copyright continues.