When it comes to OpenAI, the issue of training data is a major controversy. Publishers are faced with a choice: to distance themselves or to make deals with the company.

OpenAI has been secretive about the training data for models like GPT4o, keeping ChatGPT’s recipe a mystery. On the other hand, similar LLMs are fed with a variety of online sources such as social media, blogs, books, reviews, and Wikipedia pages. There are claims that a significant portion of the internet has been consumed by LLMs to replicate human intelligence for automated responses.

AI training data also includes articles from online news and media sites. Publications have observed that ChatGPT’s knowledge is derived from stories published on their sites, leading to allegations of copyright infringement by OpenAI.

While OpenAI argues that using publicly available internet materials for training AI models falls under fair use, some experts warn that this could amount to copyright infringement camouflaged as fair use. The debate on what OpenAI can legally feed into its models is ongoing, with some publications opting to block access to their content, while others choose to enter partnerships.

A way to avoid obsolescence, or a ‘devil’s bargain’?

The legality of the content fed into OpenAI’s models is still under discussion. Some media companies believe that partnering with OpenAI is essential to adapt to the evolving landscape and maintain control over how their journalism is presented in AI-generated responses. However, critics argue that such deals compromise intellectual property rights and credibility.

While OpenAI benefits from exclusive access to real-time news and positive PR through these partnerships, the terms of the licensing agreements remain largely undisclosed, raising questions about the exchange of data and insights between the parties.

Several media companies have formed partnerships with OpenAI, while others have resorted to legal action for alleged copyright infringement. The ongoing debate highlights the complexities of AI technology in the journalism industry.

Media companies that have licensing deals with OpenAI

Associated Press

On July 23, 2023, the Associated Press announced a partnership with OpenAI, granting access to its news archive for training models like ChatGPT.

Axel Springer

Axel Springer, the German media company, has partnered with OpenAI to explore the potential of AI in journalism, receiving a substantial sum for the deal.

FT Group

The Financial Times announced a partnership with OpenAI in April 2024 to gain insights into how AI surfaces content.

Dotdash Meredith

Dotdash Meredith, the media company behind various lifestyle magazines, has entered into an agreement with OpenAI to enhance content creation.

News Corp

News Corp, the parent company of several prominent publications, has established a partnership with OpenAI to support journalism.

Vox Media

Vox Media, a collection of diverse publications, has signed a deal with OpenAI, sparking concerns among journalists and workers about the ethical implications of the partnership.

The Atlantic

The Atlantic partnered with OpenAI to explore AI navigation of the web, despite concerns about generative AI’s impact on the news industry.

Media companies that have filed lawsuits against OpenAI

The New York Times, The Intercept, and a group of daily newspapers have filed lawsuits against OpenAI and its major investor Microsoft for copyright infringement, highlighting the legal challenges surrounding AI technology in journalism.

Artificial Intelligence


Please enter your comment!
Please enter your name here