Tumblr and WordPress to be used for OpenAI and Midjourney training – 70 characters

0
493
Tumblr and WordPress posts will reportedly be used for OpenAI and Midjourney training

Tumblr and WordPress Set to Sell User Data to AI Companies: What You Need to Know

Overview

Tumblr and WordPress are reportedly on the verge of striking deals with artificial intelligence companies OpenAI and Midjourney to sell user data. According to 404 Media, the platforms’ parent company Automattic is close to finalizing agreements that would provide data to help train the AI companies’ models. While the specifics of the data are still unclear, there are concerns that private or partner-related information may have been included in the deal.

Controversy Over Data Sharing

Reports suggest that Automattic may have overstepped in its data-sharing practices, with an alleged internal post from Tumblr product manager Cyle Gage indicating that sensitive information such as private posts, deleted or suspended blogs, and explicit content may have been included. The company is said to be working on identifying and excluding content that should not have been shared with the AI companies. It remains unclear whether this data has already been transmitted.

Automattic responded to inquiries about the report with a published statement, asserting that only public content hosted on WordPress.com and Tumblr from sites that have not opted out will be shared. The company acknowledged that current legal regulations do not require AI companies’ web crawlers to respect users’ opt-out preferences.

Opt-Out Tool and Data Removal

Automattic reportedly plans to introduce a new opt-out tool that allows users to block third parties, including AI companies, from training on their data. The company’s proposed tool is designed to prevent web crawlers from accessing content from opted-out sites and to notify partners of users who have newly opted out. Additionally, Automattic intends to advocate for the removal of opted-out content from past sources and future training runs.

According to an internal document attributed to AI head Andrew Spittle, the company will regularly notify partners of opt-out requests and request the deletion of content from their training datasets. Spittle expressed confidence that partners would comply with these requests, emphasizing the potential benefits of respecting user preferences.

Implications of AI Data Training Deals

The practice of selling data for AI training has become a lucrative trend for websites facing challenges in the competitive online publishing landscape. As platforms seek new revenue streams, partnerships with AI companies offer opportunities for monetization. While these deals can provide valuable insights for AI model development, concerns about privacy and data security remain paramount.

In conclusion, the reported agreements between Tumblr, WordPress, and AI companies underscore the complex dynamics of data sharing in the digital age. As companies navigate the ethical and legal implications of AI training partnerships, transparency and user consent will be key considerations moving forward.

Update

Update, February 27, 2024, 3:56 PM ET: This story has been updated to include a published statement from Automattic regarding the data-sharing agreements.