Search This Blog

Powered by Blogger.

Blog Archive

Labels

About Me

Showing posts with label user content. Show all posts

Reddit Sues Anthropic for Training Claude AI with User Content Without Permission

 

Reddit, a social media site, filed a lawsuit against Anthropic on Wednesday, claiming that the artificial intelligence firm is unlawfully "scraping" millions of Reddit users' comments in order to train its chatbot Claude. 

Reddit alleges that Anthropic "intentionally trained on the personal data of Reddit users without ever requesting their consent" and utilised automated bots to access Reddit's material in spite of being requested not to. 

In a response, Anthropic stated that it "will defend ourselves vigorously" against Reddit's allegations. Reddit filed the complaint Wednesday in California Superior Court in San Francisco, where both firms are headquartered.

“AI companies should not be allowed to scrape information and content from people without clear limitations on how they can use that data,” noted Ben Lee, Reddit’s chief legal officer, in a statement Wednesday.

Reddit has previously entered into licensing deals with Google, OpenAI, and other companies who pay to train their AI systems on Reddit's over 100 million daily users' public comments. 

The contracts "enable us to enforce meaningful protections for our users, including the right to delete your content, user privacy protections, and preventing users from being spammed using this content," according to Lee. 

The license agreements also helped the 20-year-old internet platform acquire funds ahead of its Wall Street debut as a publicly traded business last year. Former OpenAI executives founded Anthropic in 2021, and its primary chatbot, Claude, remains a prominent competitor to OpenAI's ChatGPT. While OpenAI has close relationships with Microsoft, Anthropic's principal commercial partner is Amazon, which is utilising Claude to develop its popular Alexa voice assistant. 

Anthropic, like other AI businesses, has relied extensively on websites like Wikipedia and Reddit, which contain vast troves of written material that can help an AI assistant learn the patterns of human language.

In a 2021 paper co-authored by Anthropic CEO Dario Amodei, which was cited in the lawsuit, the company's researchers identified the subreddits, or subject-matter forums, that contained the highest quality AI training data, such as those focused on gardening, history, relationship advice, or shower thoughts. 

In 2023, Anthropic stated in a letter to the United States Copyright Office that the "way Claude was trained qualifies as a quintessentially lawful use of materials," by making copies of information to do a statistical analysis on a big dataset. It is already facing a lawsuit from major music companies who claim Claude regurgitates the lyrics of copyrighted songs.

However, Reddit's lawsuit differs from others filed against AI companies in that it does not claim copyright violation. Instead, it focusses on the alleged breach of Reddit's terms of service, which it claims resulted in unfair competition.

Social Media Content Fueling AI: How Platforms Are Using Your Data for Training

 

OpenAI has admitted that developing ChatGPT would not have been feasible without the use of copyrighted content to train its algorithms. It is widely known that artificial intelligence (AI) systems heavily rely on social media content for their development. In fact, AI has become an essential tool for many social media platforms.

For instance, LinkedIn is now using its users’ resumes to fine-tune its AI models, while Snapchat has indicated that if users engage with certain AI features, their content might appear in advertisements. Despite this, many users remain unaware that their social media posts and photos are being used to train AI systems.

Social Media: A Prime Resource for AI Training

AI companies aim to make their models as natural and conversational as possible, with social media serving as an ideal training ground. The content generated by users on these platforms offers an extensive and varied source of human interaction. Social media posts reflect everyday speech and provide up-to-date information on global events, which is vital for producing reliable AI systems.

However, it's important to recognize that AI companies are utilizing user-generated content for free. Your vacation pictures, birthday selfies, and personal posts are being exploited for profit. While users can opt out of certain services, the process varies across platforms, and there is no assurance that your content will be fully protected, as third parties may still have access to it.

How Social Platforms Are Using Your Data

Recently, the United States Federal Trade Commission (FTC) revealed that social media platforms are not effectively regulating how they use user data. Major platforms have been found to use personal data for AI training purposes without proper oversight.

For example, LinkedIn has stated that user content can be utilized by the platform or its partners, though they aim to redact or remove personal details from AI training data sets. Users can opt out by navigating to their "Settings and Privacy" under the "Data Privacy" section. However, opting out won’t affect data already collected.

Similarly, the platform formerly known as Twitter, now X, has been using user posts to train its chatbot, Grok. Elon Musk’s social media company has confirmed that its AI startup, xAI, leverages content from X users and their interactions with Grok to enhance the chatbot’s ability to deliver “accurate, relevant, and engaging” responses. The goal is to give the bot a more human-like sense of humor and wit.

To opt out of this, users need to visit the "Data Sharing and Personalization" tab in the "Privacy and Safety" settings. Under the “Grok” section, they can uncheck the box that permits the platform to use their data for AI purposes.

Regardless of the platform, users need to stay vigilant about how their online content may be repurposed by AI companies for training. Always review your privacy settings to ensure you’re informed and protected from unintended data usage by AI technologies