Search This Blog

Powered by Blogger.

Blog Archive

Labels

Showing posts with label Voice Command. Show all posts

From Text to Multisensory AI: ChatGPT's Latest Evolution

 


The OpenAI generative AI bot, ChatGPT, has been updated to enable it to take on a whole new level of capabilities. Artificial intelligence (AI) is a fast, dynamic field that is constantly evolving and moving forward.  

A newly launched generative AI-powered chatbot, ChatGPT, from OpenAI, an AI startup backed by Microsoft, has been expanded with new features on Monday. It is now possible to ask ChatGPT questions in five different voices, and you can submit images for ChatGPT's response, as you can now ask ChatGPT questions based on the images you have uploaded.

By doing so, users will be able to ask ChatGPT in five different voices. Open AI shared a video on the X (formerly Twitter) social network showing how the feature works when it was announced that ChatGPT could now see, hear, and speak. This was announced as a post on the X (formerly Twitter). 

According to the note that was attached to the video, ChatGPT is now capable of watching, hearing, and speaking. As a result, users will soon be able to engage in voice conversations using ChatGPT (iOS & Android), as well as include images in the conversation (all platforms) over the next two weeks. 

A major update to ChatGPT is the introduction of an image analysis and response function. As an example, if you upload a picture of your bike, for example, and send it to the site, you'll receive instructions on how to lower the seat, or if you upload a picture of your refrigerator, you'll get some ideas for recipes based on its contents. 

The second feature of ChatGPT is that it allows users to interact with it in a synthetic voice, which is similar to how you'd interact with Google Now or Siri. The threads you ask ChatGPT are answered based on customized A.I. algorithms. 

A multimodal artificial intelligence system can handle any text, picture, video, or other form of information that a user chooses to throw at it. This feature is part of a trend across the entire industry toward so-called multimodal artificial intelligence systems. 

Researchers believe that the ultimate goal is the development of an artificial intelligence capable of processing information in the same way as a human does. In addition to answering users' questions in a variety of voices, ChatGPT will also be able to provide feedback in a variety of languages, based on their personal preferences. 

To create each voice, OpenAI has enlisted the help of professional voice actors, along with its proprietary speech recognition software Whisper, which transcribes spoken words into text using its proprietary technology. A new text-to-speech model, dubbed OpenAI's new text-to-speech model, is behind ChatGPT's new voice capabilities, which can create human-like audio using just text and a few seconds of speech samples. This opens the door to many "creative and accessible applications".

Aside from working with other companies, OpenAI is also collaborating with Spotify on a project to translate podcasts into several languages and to translate them as naturally as possible in the voice of the podcaster. A multimodal approach based on GPT-3.5 and GPT-4 is being used by OpenAI to enable ChatGPT to understand images based on multimodal capabilities. 

Users can now upload an image to the ChatGPT system to ask it a question such as exploring the contents of my fridge to plan a meal or analyzing the data from a complex graph for work-related purposes.  During the next two weeks, Plus and Enterprise users will be gradually introduced to new features, including voice and image capabilities, which can be enabled through their settings. 

A voice feature will be available on both iOS and Android platforms, with the option to enable them via the settings menu, whereas images will be available on all platforms from here on out. A ChatGPT model can be used by users for specific topics, such as research in specialized fields. 

OpenAI is very transparent about the model's limitations and discourages high-risk use cases that have not been properly verified. As a result, the model does a great job of transcribing English text, but it is not so good at transcribing other languages, especially those with non-Roman script. 

OpenAI advises non-English speaking users not to use ChatGPT for such purposes. In recognition of the potential risks involved with advanced capabilities such as voice, OpenAI has focused on voice chat, and the technology has been developed in collaboration with voice actors to ensure the authenticity and safety of voice chat. This technology is also being used by Spotify's Voice Translation feature, which allows podcasters to translate content into a range of different languages using their voice. This feature is important because it expands the reach of podcasters.

Using image input, OpenAI takes measures to protect the privacy of individuals by limiting the ability of ChatGPT to identify and describe people's identities directly. To further enhance these safeguards while ensuring the tool is as useful as possible, it will be crucial to follow real-world usage and user feedback to ensure it is as robust as possible.