Search This Blog

Powered by Blogger.

Blog Archive

Labels

Footer About

Footer About

Labels

Showing posts with label AI Training Data. Show all posts

Neon App Rebounds After Data Exposure Scare, Secures $25 Million and Revamps Security

 

Neon, an app that incentivizes users to sell personal data they would otherwise share for free, quickly gained traction after its September debut—rising to the second spot among the most downloaded free apps on Apple’s App Store within just eight days.

The platform’s model revolves around users voluntarily recording phone calls and selling that data to artificial intelligence firms for training purposes. However, concerns around privacy surfaced almost immediately. A probe by TechCrunch revealed that Neon’s servers were vulnerable, allowing unauthorized access to more than users may have intended to share. Exposed data reportedly included metadata such as phone numbers, along with call transcripts and audio recordings. Some reviewed transcripts even suggested that in-person conversations had been recorded without clear consent.

Despite the early controversy, Neon has staged a comeback. Six months post-launch, the company has raised $25 million and relaunched its platform with a stronger focus on security and transparency. Founder and CEO Alex Kiam addressed the incident candidly, acknowledging the company’s initial shortcomings.

“We had not done [penetration] testing, and TechCrunch was able to get into the database, and so we immediately shut it off. We basically went back to the drawing board,” Kiam says.

Following the breach, Neon collaborated with external cybersecurity specialists, including Unit 42, a research division owned by Palo Alto Networks, and brought on Ian Reid, former chief technology officer at Stamped, who now serves as Neon’s CTO. The team undertook a comprehensive code audit before relaunching the app in early November.

According to Kiam, the updated version of Neon quickly regained popularity, climbing to the third position on the App Store charts. He credits user trust and transparency for the app’s renewed success.

“I think the reason people came back is because they had a great experience with the app. Because we had been transparent with them during, I think they were able to give us a second chance. And we’re really grateful for that,” he says.

Even with its viral growth and financial backing, industry observers remain cautious about the broader implications of monetizing personal data, especially in a time when privacy concerns are becoming increasingly critical.

Snap Faces Lawsuit From Creators Over Alleged AI Data Misuse


 

A legal conflict between online creators and companies dedicated to artificial intelligence has entered an increasingly personal and sharper stage. In recent weeks, well-known YouTubers have filed suits in federal court against Snap alleging that the company built its artificial intelligence capabilities on the basis of their copyrighted material. 

In the complaint, there is a familiar but unresolved question for the digital economy: Can the vast archives of video created by creators that power the internet be repurposed to train commercial artificial intelligence systems without the knowledge or consent of the creators? 

Among the participants in the proposed class action, which was filed in the Central District Court of California on Friday, are internet personalities whose combined YouTube audience exceeds 6.2 million subscribers.

According to Snap, the videos they uploaded to YouTube were scraped to be used as datasets for training AI models on Snapchat, which were scraped in violation of platform rules as well as federal copyright laws.

A similar claim has previously been brought against Nvidia, Meta, and ByteDance by the plaintiffs, claiming that a growing segment of the artificial intelligence industry is relying on creator content without authorization. Specifically, the YouTubers contend that Snap was using large-scale video-language datasets, including HD-VILA-100M, developed for academic and research purposes rather than commercial applications. 

The newly filed complaint specifically challenges Snap's reported use of these datasets. Upon filing the lawsuit, YouTube has asserted that any commercial use would have been subject to YouTube's technological safeguards, terms of service, and licensing restrictions. Plaintiffs argue that these limitations were bypassed in order for Snap's AI systems to incorporate the material. 

In addition to statutory damages, the lawsuit seeks a permanent injunction prohibiting further alleged infringements. Among the participants are the creators of the YouTube channel h3h3, which has a subscriber base of 5.52 million, as well as the golf-focused channels MrShortGame Golf and Golfholics. 

The case is one of the latest in a series of copyright disputes between users and artificial intelligence developers. Recently, publishers, authors, newspapers, artists, and user-generated content platforms have brought similar claims. As reported by the nonprofit Copyright Alliance, over 70 copyright infringement lawsuits have been filed against artificial intelligence companies to date with varying outcomes. 

Several cases involving Meta and a group of authors were resolved in favor of the technology company by a federal judge. In another case involving Anthropic and authors, the company reached a settlement. Several other cases are still pending, which leaves courts with the task of defining how technological innovation intersects with intellectual property rights in our rapidly evolving age.

There are a number of individuals in the U.S. who have uploaded original video content to YouTube and whose works have allegedly been incorporated into the large-scale video datasets referenced in the complaint. The proposed class entails more than just the named plaintiffs, but all U.S-based individuals who have uploaded original video content to YouTube. 

According to Snap's filing, these datasets formed the foundation for the company's artificial intelligence training pipeline, enabling the company to process and ingest creator content in significant quantities. ByteDance, Meta, and Nvidia have been the targets of comparable class complaints, resulting from a coordinated legal strategy intended to challenge industry-wide data acquisition practices by the same plaintiffs. 

Also requesting declaratory judgment that Snap willfully circumvented YouTube’s copyright protection mechanisms, the plaintiffs seek monetary relief along with declaratory judgment. As part of the complaint, statutory damages, costs and interest are requested, as well as an injunction to stop the continued use of the disputed video materials.

There is a central claim in the complaint that Snap developed and refined its generative AI video systems by accessing and copying YouTube content en masse, despite the platform's architecture which permits controlled streaming, but does not provide access to source files for download. 

Snap’s model development is attributed to specific datasets, including HD-VILA-100M and Panda-70M, cited in the complaint. According to the filing, HD-VILA-100M contains metadata that references YouTube videos rather than hosting the audiovisual files themselves. As a result, the plaintiffs maintain that Snap had to retrieve and duplicate the references directly from YouTube’s servers in order to operationalize such datasets for model training.

As a result of this process, they contend that technology protection measures and access controls designed to prevent large-scale extraction and downloading were necessarily bypassed. This lawsuit alleges the use of automated tools and structured workflows to facilitate this retrieval. Moreover, the complaint claims that the datasets segmented individual YouTube uploads into multiple discrete clips, which required repeated access to the same source video as well. 

According to the plaintiffs, this method resulted in millions of separate acts of copying which were essentially identical in nature. In Snapchat’s AI-powered features, those copies were allegedly used to train and enhance text-to-video and image-to-video models.

In spite of license restrictions associated with certain datasets, the filing asserts that these activities were conducted for commercial deployment rather than academic or research purposes. As a final point, the plaintiffs assert Snap's conduct violated YouTube's terms of service and constituted unlawful circumvention of technological safeguards, regardless of whether particular videos had been formally registered with the U.S. Copyright Office. 

Thus, the complaint positions the dispute in context not merely as a disagreement over platform rules but as a broader issue related to the legal and technical limits governing large-scale data ingestion for commercial AI development. 

Depending on the outcome of the litigation, it may have implications that extend far beyond the parties involved. At stake are not only the questions of liability in a single dispute but also the broader compliance landscape that undergirds commercial AI development.

In this case, the court will examine how training data is sourced, whether technical safeguards constitute enforceable measures of protection, and how thoroughly dataset provenance and licensing constraints need to be audited before model deployment is undertaken. 

Technology companies are reminded by this case that data governance frameworks that can be defended, training pipelines that are transparent, and third-party datasets should be rigorously reviewed. Creators and platforms alike should take note of this development as it signals that regulation of artificial intelligence will be shaped less by abstract policy debates and more by detailed judicial scrutiny of the technological processes used in transforming publicly accessible content into machine-learning systems.