Google has launched Gemini 2, a sophisticated AI model that enhances users’ ability to interact with technology. This announcement, made during a press event, highlights Gemini 2’s capabilities in task execution, conversational abilities, and multimodal comprehension, representing a significant advancement in AI technology.

Google launches Gemini 2: A new era for AI assistants

Demis Hassabis, CEO of Google DeepMind, noted that Gemini 2 functions as a virtual assistant capable of “planning and executing tasks on a user’s computers and the web.” The model aims to help users manage various activities seamlessly, potentially paving the way toward artificial general intelligence (AGI) by mimicking human-like cognitive functions. Google’s CEO, Sundar Pichai, emphasized the company’s commitment to developing “agentic models” that can understand and act in the world more effectively, indicating significant investments made over the past year.

Gemini 2 includes improved “multimodal” functions, which allow the AI to parse audio and video more effectively while engaging in sophisticated conversations. By demonstrating these capabilities, Gemini 2 could redefine how personal computing operates, potentially saving time through automated tasks such as booking flights and managing documents. However, challenges remain regarding the technology’s ability to process open-ended commands without error, which could lead to costly mistakes.

Specialized AI agents for coding and data science are also part of Gemini 2’s toolbox, allowing users to tackle complex programming tasks that are beyond the capabilities of earlier models. Unlike prevailing AI tools that focus on basic code completion, these agents can conduct comprehensive tasks like checking code into repositories and facilitating data analysis.

Google’s Gemini 2.0 is here: Multimodal and mighty
Google’s Gemini 2.0 is here: Multimodal and mighty

Project Mariner: New approach to web navigation

To showcase Gemini 2’s capabilities, Google introduced Project Mariner, an experimental Chrome extension that assists users in navigating the web. In a recent demonstration, the AI agent was tasked with planning a meal, as it autonomously navigated to a supermarket’s website, logged in, and added items to a shopping basket, even suggesting replacements when certain items were unavailable. Hassabis described Mariner as a research prototype that reimagines user interactions with AI, targeting everyday tasks.

Gemini was initially launched in December 2023 as part of Google’s strategy to compete with OpenAI’s ChatGPT, which gained acclaim for its utility in AI-assisted experiences. With the introduction of Gemini 2, Google now positions its model as capable as OpenAI’s offerings, aiming to enhance the search experience through AI-driven functionalities.

Google has also revealed the latest version of Project Astra, an experimental initiative that allows Gemini 2 to interpret a user’s surroundings via a smartphone camera. During testing, Gemini 2 displayed its skill in recognizing wine bottles, providing geographical information, pricing, and taste characteristics sourced from the web. Hassabis expressed a desire for Astra to evolve into an ultimate recommendation system, capable of linking interests across different domains to enhance user experiences.

The focus on memory within Gemini 2 allows the AI to retain insights about user preferences, with Google assuring users they can manage their data, including deletion capabilities. During tests with Astra, the AI exhibited impressive adaptability by maintaining conversational context while responding to interruptions.

Safety and reliability of AI agents

As Gemini 2’s functionality expands, Google emphasizes the importance of ensuring safety and reliability. While the agents show promise, potential risks stem from how users may interact with the systems and the data they provide. Project Mariner incorporates prompts that require user confirmation before executing sensitive actions, thereby safeguarding against unauthorized transactions.

Google has been proactive in addressing safety concerns by collaborating with internal and external experts to evaluate risks associated with AI usage. This includes exploring measures to prevent misuse of the platform through malicious prompts or instructions, thereby protecting users from potential threats such as fraud or phishing attacks.

Google’s release of Gemini 2 marks a pivotal moment in the advancement of AI, as the company continues to track user experiences and feedback. The journey toward AGI progressively unfolds with each development phase, highlighting ongoing research that might influence future iterations of the technology. As investigations into user interactions and responses continue, the next steps for Gemini 2 and its associated projects appear to be closely monitored.


Image credits: Google