One of the less discussed consequences in regard to ChatGPT is its privacy risk. Google only yesterday launched Bard, its own conversational AI, and others will undoubtedly follow. Technology firms engaged in AI development have certainly entered a race.
The issue would be its technology, which is entirely based on users’ personal data.
ChatGPT is apparently based on a massive language model, which backs up an enormous amount of data to operate and enhance its functions. Implying, the model gets more adept at seeing patterns, foreseeing what will happen next, and producing credible text as more data is used to train it.
OpenAI, the developer of ChatGPT, sourced the Chatbot model with some 3 million words systematically taken from the internet – via books, articles, websites, and posts – which also undeniably involves online users’ personal information, gathered without their consent.
Every blog post, product review, or comment written on an article, which exists or ever existed in the online world has a good chance that the data or information involved it is was consumed by ChatGPT.
The gathered data used in order to train ChatGPT is problematic for numerous reasons.
First, the collected data is unconsented, since none of the online users were ever asked if OpenAI could use their seemingly personal information. Thus, this would be a clear violation of privacy, especially when the data is sensitive and can be used to locate us, identify our loved ones, or identify ourselves.
The usage of data can compromise what we refer to as contextual integrity even when the data is publicly available. This is a cornerstone idea in discussions about privacy law. Information on people must not be made public outside of the context in which it was first created.
Moreover, OpenAI does not include any procedure for users to monitor whether the company has their personal information in-store, or to request it to be taken down. The European General Data Protection Regulation (GDPR), which guarantees this right, is still being discussed as to whether ChatGPT complies with its criteria.
This “right to be forgotten” is specifically essential when it comes to situations involving information that is inaccurate or misleading, which seems to be a regular occurrence in ChatGPT.
Furthermore, the scraped data that ChatGPT was trained on may be confidential or protected by copyright. For instance, the tool replicated the opening few chapters of Joseph Heller's copyrighted book Catch-22.
Finally, OpenAI did not pay for the internet data it downloaded. Its creators—individuals, website owners, and businesses—were not being compensated. This is especially remarkable in light of the recent US$29 billion valuation of OpenAI, which is more than double its value in 2021.
OpenAI has also recently announced ChatGPT Plus, which is a paid subscription plan that will provide users ongoing access to the tool, swift response times, and priority access to its new feature. By 2024, it is anticipated that this approach would help generate $1 billion in revenue.
None of this would have been possible without the usage of ‘our’ data, acquired and utilized without our consent.
According to some professionals and experts, ChatGPT is a “tipping point for AI” - The realisation of technological advancement that can revolutionize the way we work, learn, write, and even think.
Despite its potential advantages, we must keep in mind that OpenAI is a private, for-profit organization whose objectives and business demands may not always coincide with those of the larger community requirements.
The privacy hazards associated with ChatGPT should serve as a caution. And as users of an increasing number of AI technologies, we need to exercise extreme caution when deciding what data to provide such tools with.
This phased rollout of its “EU data boundary” will apparently be applied to all of its core cloud services - Azure, Microsoft 365, Dynamics 365 and Power BI platform.
Since the introduction of General Data Protection Regulation (GDPR) by the EU IN 2018, that protects user privacy, business giants have grown increasingly anxious of the international flow of consumer data.
The European Commission, which serves as the executive arm of the EU, is developing ideas in order to safeguard the privacy of the European customers whose data is being transferred to the United States.
"As we dived deeper into this project, we learned that we needed to be taken more phased approach," says Microsoft’s Chief Privacy Officer Julie Brill. “The first phase will be customer data. And then as we move into the next phases, we will be moving logging data, service data and other kind of data into the boundary.”
The second phase will reportedly be completed by the end of 2023, while the third in year 2024, she added.
Microsoft runs more than a dozen datacenters throughout the European countries, like France, Germany, Spain and Switzerland.
Data storage, for large corporation, have become so vast and is distributed across so many different countries that it has now become a challenge to understand where their data is stored and whether it complies with regulations like GDPR.
"We are creating this solution to make our customers feel more confident and to be able to have clear conversations with their regulators on where their data is being processed as well as stored," says Brill.
Moreover, Microsoft has previously mentioned how it would eventually challenge government request for customer data, and that it would compensate financially to any customer, whose data it shared in breach of GDPR.