AI chatbots like ChatGPT have captured widespread attention for their remarkable conversational abilities, allowing users to engage on diverse topics with ease. However, while these tools offer convenience and creativity, they also pose significant privacy risks. The very technology that powers lifelike interactions can also store, analyze, and potentially resurface user data, raising critical concerns about data security and ethical use.
Chatbots like ChatGPT rely on Large Language Models (LLMs) trained on vast datasets to generate human-like responses. This training often includes learning from user interactions. Much like how John Connor taught the Terminator quirky catchphrases in Terminator 2: Judgment Day, these systems refine their capabilities through real-world inputs. However, this improvement process comes at a cost: personal data shared during conversations may be stored and analyzed, often without users fully understanding the implications.
For instance, OpenAI’s terms and conditions explicitly state that data shared with ChatGPT may be used to improve its models. Unless users actively opt-out through privacy settings, all shared information—from casual remarks to sensitive details like financial data—can be logged and analyzed. Although OpenAI claims to anonymize and aggregate user data for further study, the risk of unintended exposure remains.
Despite assurances of data security, breaches have occurred. In May 2023, hackers exploited a vulnerability in ChatGPT’s Redis library, compromising the personal data of around 101,000 users. This breach underscored the risks associated with storing chat histories, even when companies emphasize their commitment to privacy. Similarly, companies like Samsung faced internal crises when employees inadvertently uploaded confidential information to chatbots, prompting some organizations to ban generative AI tools altogether.
Governments and industries are starting to address these risks. For instance, in October 2023, President Joe Biden signed an executive order focusing on privacy and data protection in AI systems. While this marks a step in the right direction, legal frameworks remain unclear, particularly around the use of user data for training AI models without explicit consent. Current practices are often classified as “fair use,” leaving consumers exposed to potential misuse.
Until stricter regulations are implemented, users must take proactive steps to safeguard their privacy while interacting with AI chatbots. Here are some key practices to consider:
Training courses in GenAI cover a wide range of topics. Introductory courses, which can be completed in just a few hours, address the fundamentals, ethics, and social implications of GenAI. For those seeking deeper knowledge, advanced modules are available that focus on development using GenAI and large language models (LLMs), requiring over 100 hours to complete.
These courses are designed to cater to various job roles and functions within the organisations. For example, KPMG India aims to have its entire workforce trained in GenAI by the end of the fiscal year, with 50% already trained. Their programs are tailored to different levels of employees, from teaching leaders about return on investment and business envisioning to training coders in prompt engineering and LLM operations.
EY India has implemented a structured approach, offering distinct sets of courses for non-technologists, software professionals, project managers, and executives. Presently, 80% of their employees are trained in GenAI. Similarly, PwC India focuses on providing industry-specific masterclasses for leaders to enhance their client interactions, alongside offering brief nano courses for those interested in the basics of GenAI.
Wipro organises its courses into three levels based on employee seniority, with plans to develop industry-specific courses for domain experts. Cognizant has created shorter courses for leaders, sales, and HR teams to ensure a broad understanding of GenAI. Infosys also has a program for its senior leaders, with 400 of them currently enrolled.
Ray Wang, principal analyst and founder at Constellation Research, highlighted the extensive range of programs developed by tech firms, including training on Python and chatbot interactions. Cognizant has partnerships with Udemy, Microsoft, Google Cloud, and AWS, while TCS collaborates with NVIDIA, IBM, and GitHub.
Cognizant boasts 160,000 GenAI-trained employees, and TCS offers a free GenAI course on Oracle Cloud Infrastructure until the end of July to encourage participation. According to TCS's annual report, over half of its workforce, amounting to 300,000 employees, have been trained in generative AI, with a goal of training all staff by 2025.
The investment in GenAI training by IT and consulting firms pivots towards the importance of staying ahead in the rapidly evolving technological landscape. By equipping their employees with essential AI skills, these companies aim to enhance their capabilities, drive innovation, and maintain a competitive edge in the market. As the demand for AI expertise grows, these training programs will play a crucial role in shaping the future of the industry.
Apart from providing a space for experimentation, other points increasingly show that open-source LLMs are going to gain the same attention closed-source LLMs are getting now.
The open-source nature allows organizations to understand,
modify, and tailor the models to their specific requirements. The collaborative
environment nurtured by open-source fosters innovation, enabling faster
development cycles. Additionally, the avoidance of vendor lock-in and adherence
to industry standards contribute to seamless integration. The security benefits
derived from community scrutiny and ethical considerations further bolster the
appeal of open-source LLMs, making them a strategic choice for enterprises
navigating the evolving landscape of artificial intelligence.
In recent years, the emergence of Large Language Models (LLMs), commonly referred to as Smart Computers, has ushered in a technological revolution with profound implications for various industries. As these models promise to redefine human-computer interactions, it's crucial to explore both their remarkable impacts and the challenges that come with them.
Smart Computers, or LLMs, have become instrumental in expediting software development processes. Their standout capability lies in the swift and efficient generation of source code, enabling developers to bring their ideas to fruition with unprecedented speed and accuracy. Furthermore, these models play a pivotal role in advancing artificial intelligence applications, fostering the development of more intelligent and user-friendly AI-driven systems. Their ability to understand and process natural language has democratized AI, making it accessible to individuals and organizations without extensive technical expertise. With their integration into daily operations, Smart Computers generate vast amounts of data from nuanced user interactions, paving the way for data-driven insights and decision-making across various domains.
Managing Risks and Ensuring Responsible Usage
However, the benefits of Smart Computers are accompanied by inherent risks that necessitate careful management. Privacy concerns loom large, especially regarding the accidental exposure of sensitive information. For instance, models like ChatGPT learn from user interactions, raising the possibility of unintentional disclosure of confidential details. Organisations relying on external model providers, such as Samsung, have responded to these concerns by implementing usage limitations to protect sensitive business information. Privacy and data exposure concerns are further accentuated by default practices, like ChatGPT saving chat history for model training, prompting the need for organizations to thoroughly inquire about data usage, storage, and training processes to safeguard against data leaks.
Addressing Security Challenges
Security concerns encompass malicious usage, where cybercriminals exploit Smart Computers for harmful purposes, potentially evading security measures. The compromise or contamination of training data introduces the risk of biased or manipulated model outputs, posing significant threats to the integrity of AI-generated content. Additionally, the resource-intensive nature of Smart Computers makes them prime targets for Distributed Denial of Service (DDoS) attacks. Organisations must implement proper input validation strategies, selectively restricting characters and words to mitigate potential attacks. API rate controls are essential to prevent overload and potential denial of service, promoting responsible usage by limiting the number of API calls for free memberships.
A Balanced Approach for a Secure Future
To navigate these challenges and anticipate future risks, organisations must adopt a multifaceted approach. Implementing advanced threat detection systems and conducting regular vulnerability assessments of the entire technology stack are essential. Furthermore, active community engagement in industry forums facilitates staying informed about emerging threats and sharing valuable insights with peers, fostering a collaborative approach to security.
All in all, while Smart Computers bring unprecedented opportunities, the careful consideration of risks and the adoption of robust security measures are essential for ensuring a responsible and secure future in the era of these groundbreaking technologies.
The intel came from a leaked audio file of an internal presentation on an early version of Microsoft’s Security Copilot a ChatGPT-like artificial intelligence platform that Microsoft created to assist cybersecurity professionals.
Apparently, the audio consists of a Microsoft researcher addressing the result of "threat hunter" testing, in which the AI examined a Windows security log for any indications of potentially malicious behaviour.
"We had to cherry-pick a little bit to get an example that looked good because it would stray and because it's a stochastic model, it would give us different answers when we asked it the same questions," said Lloyd Greenwald, a Microsoft Security Partner giving the presentation, as quoted by BI.
"It wasn't that easy to get good answers," he added.
Security Copilot, like any chatbot, allows users to enter their query into a chat window and receive responses as a customer service reply. Security Copilot is largely built on OpenAI's GPT-4 large language model (LLM), which also runs Microsoft's other generative AI forays like the Bing Search assistant. Greenwald claims that these demonstrations were "initial explorations" of the possibilities of GPT-4 and that Microsoft was given early access to the technology.
Similar to Bing AI in its early days, which responded so ludicrous that it had to be "lobotomized," the researchers claimed that Security Copilot often "hallucinated" wrong answers in its early versions, an issue that appeared to be inherent to the technology. "Hallucination is a big problem with LLMs and there's a lot we do at Microsoft to try to eliminate hallucinations and part of that is grounding it with real data," Greenwald said in the audio, "but this is just taking the model without grounding it with any data."
The LLM Microsoft used to build Security Pilot, GPT-4, however it was not trained on cybersecurity-specific data. Rather, it was utilized directly out of the box, depending just on its massive generic dataset, which is standard.
Discussing other queries in regards to security, Greenwald revealed that, "this is just what we demoed to the government."
However, it is unclear whether Microsoft used these “cherry-picked” examples in its to the government and other potential customers – or if its researchers were really upfront about the selection process of the examples.
A spokeswoman for Microsoft told BI that "the technology discussed at the meeting was exploratory work that predated Security Copilot and was tested on simulations created from public data sets for the model evaluations," stating that "no customer data was used."
Google has been working on the development of the Gemini large language model (LLM) for the past eight months and just recently provided access to its early versions to a small group of companies. This LLM is believed to be giving head-to-head competition to other LLMs like Meta’s Llama 2 and OpenAI’s GPT-4.
The AI model is designed to operate on various formats, be it text, image or video, making the feature one of the most significant algorithms in Google’s history.
In a blog post, Google CEO Sundar Pichai wrote, “This new era of models represents one of the biggest science and engineering efforts we’ve undertaken as a company.”
The new LLM, also known as a multimodal model, is capable of various methods of input, like audio, video, and images. Traditionally, multimodal model creation involves training discrete parts for several modalities and then piecing them together.
“These models can sometimes be good at performing certain tasks, like describing images, but struggle with more conceptual and complex reasoning,” Pichai said. “We designed Gemini to be natively multimodal, pre-trained from the start on different modalities. Then we fine-tuned it with additional multimodal data to further refine its effectiveness.”
Google also unveiled the Cloud TPU v5p, its most potent ASIC chip, in tandem with the launch. This chip was created expressly to meet the enormous processing demands of artificial intelligence. According to the company, the new processor can train LLMs 2.8 times faster than Google's prior TPU v4.
For ChatGPT and Bard, two examples of generative AI chatbots, LLMs are the algorithmic platforms.
The Cloud TPU v5e, which touted 2.3 times the price performance over the previous generation TPU v4, was made generally available by Google earlier last year. The TPU v5p is significantly faster than the v4, but it costs three and a half times as much./ Google’s new Gemini LLM is now available in some of Google’s core products. For example, Google’s Bard chatbot is using a version of Gemini Pro for advanced reasoning, planning, and understanding.
Developers and enterprise customers can use the Gemini API in Vertex AI or Google AI Studio, the company's free web-based development tool, to access Gemini Pro as of December 13. Further improvements to Gemini Ultra, including thorough security and trust assessments, led Google to announce that it will be made available to a limited number of users in early 2024, ahead of developers and business clients.