Search This Blog

Powered by Blogger.

Blog Archive

Labels

Showing posts with label Google Gemini. Show all posts

From Text to Action: Chatbots in Their Stone Age

From Text to Action: Chatbots in Their Stone Age

The stone age of AI

Despite all the talk of generative AI disrupting the world, the technology has failed to significantly transform white-collar jobs. Workers are experimenting with chatbots for activities like email drafting, and businesses are doing numerous experiments, but office work has yet to experience a big AI overhaul.

Chatbots and their limitations

That could be because we haven't given chatbots like Google's Gemini and OpenAI's ChatGPT the proper capabilities yet; they're typically limited to taking in and spitting out text via a chat interface.

Things may become more fascinating in commercial settings when AI businesses begin to deploy so-called "AI agents," which may perform actions by running other software on a computer or over the internet.

Tool use for AI

Anthropic, a rival of OpenAI, unveiled a big new product today that seeks to establish the notion that tool use is required for AI's next jump in usefulness. The business is allowing developers to instruct its chatbot Claude to use external services and software to complete more valuable tasks. 

Claude can, for example, use a calculator to solve math problems that vex big language models; be asked to visit a database storing customer information; or be forced to use other programs on a user's computer when it would be beneficial.

Anthropic has been assisting various companies in developing Claude-based aides for their employees. For example, the online tutoring business Study Fetch has created a means for Claude to leverage various platform tools to customize the user interface and syllabus content displayed to students.

Other businesses are also joining the AI Stone Age. At its I/O developer conference earlier this month, Google showed off a few prototype AI agents, among other new AI features. One of the agents was created to handle online shopping returns by searching for the receipt in the customer's Gmail account, completing the return form, and scheduling a package pickup.

Challenges and caution

  • While tool use is exciting, it comes with challenges. Language models, including large ones, don’t always understand context perfectly.
  • Ensuring that AI agents behave correctly and interpret user requests accurately remains a hurdle.
  • Companies are cautiously exploring these capabilities, aware of the potential pitfalls.

The Next Leap

The Stone Age of chatbots represents a significant leap forward. Here’s what we can expect:

Action-oriented chatbots

  • Chatbots that can interact with external services will be more useful. Imagine a chatbot that books flights, schedules meetings, or orders groceries—all through seamless interactions.
  • These chatbots won’t be limited to answering questions; they’ll take action based on user requests.

Enhanced Productivity

  • As chatbots gain tool-using abilities, productivity will soar. Imagine a virtual assistant that not only schedules your day but also handles routine tasks.
  • Businesses can benefit from AI agents that automate repetitive processes, freeing up human resources for more strategic work.

Gemini: Google Launches its Most Powerful AI Software Model


Google has recently launched Gemini, its most powerful generative AI software model to date. And since the model is designed in three different sizes, Gemini may be utilized in a variety of settings, including mobile devices and data centres.

Google has been working on the development of the Gemini large language model (LLM) for the past eight months and just recently provided access to its early versions to a small group of companies. This LLM is believed to be giving head-to-head competition to other LLMs like Meta’s Llama 2 and OpenAI’s GPT-4. 

The AI model is designed to operate on various formats, be it text, image or video, making the feature one of the most significant algorithms in Google’s history.

In a blog post, Google CEO Sundar Pichai wrote, “This new era of models represents one of the biggest science and engineering efforts we’ve undertaken as a company.”

The new LLM, also known as a multimodal model, is capable of various methods of input, like audio, video, and images. Traditionally, multimodal model creation involves training discrete parts for several modalities and then piecing them together.

“These models can sometimes be good at performing certain tasks, like describing images, but struggle with more conceptual and complex reasoning,” Pichai said. “We designed Gemini to be natively multimodal, pre-trained from the start on different modalities. Then we fine-tuned it with additional multimodal data to further refine its effectiveness.”

Google also unveiled the Cloud TPU v5p, its most potent ASIC chip, in tandem with the launch. This chip was created expressly to meet the enormous processing demands of artificial intelligence. According to the company, the new processor can train LLMs 2.8 times faster than Google's prior TPU v4.

For ChatGPT and Bard, two examples of generative AI chatbots, LLMs are the algorithmic platforms.

The Cloud TPU v5e, which touted 2.3 times the price performance over the previous generation TPU v4, was made generally available by Google earlier last year. The TPU v5p is significantly faster than the v4, but it costs three and a half times as much./ Google’s new Gemini LLM is now available in some of Google’s core products. For example, Google’s Bard chatbot is using a version of Gemini Pro for advanced reasoning, planning, and understanding. 

Developers and enterprise customers can use the Gemini API in Vertex AI or Google AI Studio, the company's free web-based development tool, to access Gemini Pro as of December 13. Further improvements to Gemini Ultra, including thorough security and trust assessments, led Google to announce that it will be made available to a limited number of users in early 2024, ahead of developers and business clients.