Search This Blog

Powered by Blogger.

Blog Archive

Labels

Showing posts with label User Data Goldmine. Show all posts

User Data Goldmine: Google's Ambitious Mission to Scrape Everything for AI Advancement

 


It was announced over the weekend that Google had made a change to its privacy policies. This change explicitly states that the company reserves the right to scrape everything you post online to build its artificial intelligence tools. Considering how far Google can read what you have to say, you can assume that you can expect your words to end up nestled somewhere within the bowels of a chatbot now that Google can read them. 

Google and Facebook privacy policies were quietly updated over the weekend and, likely, you didn't notice. There has been a slight change in the policy wording, but the change is significant, particularly because it is a revision.

In a recent report by Gizmodo, Google revised its privacy policy. Even though most of the policy is not particularly noteworthy, there is one section that stands out - one related to research and development - that could make a significant difference. 

The Gizmodo team has learned that Google's new privacy statement has been revised. While most of the policy is relatively unremarkable, one section in particular, the one dealing with research and development, stands out, particularly from the rest.  

For those who love history, Google has compiled a history of changes to its terms of service over the years that can be found here. According to the new language, the tech giant has written new ways in which your online musings might be used in the company's AI tools, which would not contradict the existing language in its policies. 

Google said in the past that the data would be used "for language models," rather than making "AI models," and places like Bard, Cloud AI, and Google Translate are now being mentioned, as well as the older policy that only mentioned Google Translate. 

Generally, a privacy policy does not include a clause such as this one. This type of policy describes how companies use your information when you post it on a company's service such as their website or their social media. It appears that Google has a right to harvest and harness any data posted to any part of the public web. This is as if the entire internet is the firm's playground for artificial intelligence experiments. Several requests for comment were sent to Google, but the company did not respond immediately. 

The practice raises interesting questions regarding the privacy of patients and raises new privacy concerns. Public posts are understood by the majority of people as being public. It is important to remember that what it means to write something online has changed over the years. 

The question is no longer whether a person has access to the information, but how can they use it based on that information. Your long-forgotten blog posts or even restaurant reviews from 15 years ago are very likely to have been ingested by Bard and ChatGPT. In the course of reading this, the chatbots may regurgitate some funny, humonculoid version of the words you have just spoken. This is in ways that are difficult to predict and comprehend. 

It seems odd for a company to add such a clause to its contract, as pointed out by this outlet. There is something peculiar about this because the way it has been worded gives the impression that the tech giant does reserve the right to harvest and use any data available on any part of the public internet at any time. There are times when a company's data usage policy only addresses how that company plans to make use of the personal information it has collected. 

The vast majority of people probably realize that whatever information they post online will be visible to the world at large, but this development opens up a whole new world of opportunities. The issue of privacy does not just extend to those who see your online posts, but to everything that is done with those posts as well. 

There used to be a reference here to "AI models" rather than "language models" before the update, and that statement has been changed. Furthermore, it mentioned the addition of Bard and Cloud AI to Google Translate, a service that has been included with Bard since then. 

In the outlet's opinion, this is an unusual clause that a business would enshrine in its policies. The writing of this statement seems odd since the way it's written implies that Apple owns the right to collect and use data from any section of the Internet that is open to the public. The purpose of a policy such as this is normally to tell the customer how its services will use the data it posts.

It is well known that anything you post online will be seen by almost everyone, but with the new developments that have come about, there is an unexpected twist: the possibility of using it. The thing you need to keep in mind is not just who can read what you write online, but also how that information will be used by the people who can read it. 

It is also possible to use real-time data-looking technology such as Bard, ChatGPT, Bing Chat, and other AI models that scrape data from the internet in real-time. Often, sources of information can be found in other people's intellectual property and come from their sources. AI tools currently being used for such activities are accused of theft, and more lawsuits are likely. 

The question of where data-hungry chatbots acquire their information in the post-ChatGPT world is one of the lesser-known complications of the post-ChatGPT world. Google and OpenAI scrape the Internet to fuel their robot habits. 

There is no clear legal guidance on whether it is legal. There is no doubt that the courts will have to deal with copyright questions that seemed like science fiction a few years ago when they first came up. At the same time, there have been some surprising effects on consumers that have been caused by the phenomenon so far.    

There is some aggrievement among Twitter and Reddit overlords related to the AI issue. Both have made controversial changes to lock down their platforms going forward. There has been a change in both companies' API which prevented third parties from downloading large quantities of posts for free. This was something they allowed anyone to download. There is no doubt that this statement is intended to protect social media sites from being harvested by other companies looking to steal their intellectual property. However, the consequences of this decision are far more significant. 

Third-party tools that people used to access Twitter and Reddit have been broken by the API changes that Twitter and Reddit implemented. At one point, Twitter even appeared to be considering requiring public entities such as weather forecasts, transit lines, and emergency services to pay a monthly fee to use their Twitter services, but Twitter backed down after receiving a hailstorm of criticism for this plan. 

Elon Musk has historically made web scraping his favorite boogieman in recent years. Musk explained a number of the recent Twitter disasters as a result of the company's need to guard against the theft of data from the site by others, even when the issues do not seem to be related. There was a problem with Twitter over the weekend when the number of tweets a user was permitted to view per day was limited, making the service almost unusable for many users. 

Musk believed rate-limiting was a necessary response to "data scraping" and "system manipulation." However, most IT experts agree that it was more likely a crisis response resulting from mismanagement or incompetence rather than an attempt to solve a problem. Despite Gizmodo's repeated requests for information on the matter, Twitter did not respond.