arstechnica.com

Google goes “agentic” with Gemini 2.0’s ambitious AI agent features

Google barrels ahead with a push into AI systems that take action for you.

The Gemini 2.0 logo from Google The Gemini 2.0 logo from Google

Credit: Google

On Wednesday, Google unveiled Gemini 2.0, the next generation of its AI-model family, starting with an experimental release called Gemini 2.0 Flash. The model family can generate text, images, and speech while processing multiple types of input including text, images, audio, and video. It's similar to multimodal AI models like GPT-4o, which powers OpenAI's ChatGPT.

"Gemini 2.0 Flash builds on the success of 1.5 Flash, our most popular model yet for developers, with enhanced performance at similarly fast response times," said Google in a statement. "Notably, 2.0 Flash even outperforms 1.5 Pro on key benchmarks, at twice the speed."

Gemini 2.0 Flash—which is the smallest model of the 2.0 family in terms of parameter count—launches today through Google's developer platforms like Gemini API, AI Studio, and Vertex AI. However, its image generation and text-to-speech features remain limited to early access partners until January 2025. Google plans to integrate the tech into products like Android Studio, Chrome DevTools, and Firebase.

The company addressed potential misuse of generated content by implementing SynthID watermarking technology on all audio and images created by Gemini 2.0 Flash. This watermark appears in supported Google products to identify AI-generated content.

Google's newest announcements lean heavily into the concept of agentic AI systems that can take action for you. "Over the last year, we have been investing in developing more agentic models, meaning they can understand more about the world around you, think multiple steps ahead, and take action on your behalf, with your supervision," said Google CEO Sundar Pichai in a statement. "Today we’re excited to launch our next era of models built for this new agentic era."

Applications of Gemini 2.0

Beyond Gemini 2.0 Flash, Google revealed several research projects that showcase the model family's capabilities in context. In particular, Project Astra (which we covered briefly upon its debut in May) is an impressive-looking visual AI assistant prototype for Android phones. It has now received updates to handle multiple languages, use Google Search and Maps, and remember conversations for up to 10 minutes.

Google says it is collaborating with game developers Supercell to create "AI agents" that can understand gameplay and offer real-time suggestions, as showcased in a YouTube demo the company published.

Gemini 2.0 for games demo | Playing Squad Busters, Clash of Clans, and Hay Day.

Project Mariner, a new Chrome extension prototype, helps users complete web-based tasks in an agentic manner by understanding screen content and browser elements, similar to Microsoft's Copilot Vision.

For developers, Google introduced Jules, an experimental AI coding agent that works within GitHub workflows to help developers plan and execute programming tasks. And Google's new Multimodal Live API enables the creation of applications with real-time audio- and video-streaming capabilities. The API supports integration with external tools and handles natural conversation patterns like interruptions.

Overall, Google emphasizes that Gemini 2.0 is still a work in progress, likely to trickle out with updates, larger models, and enhancements over time: "We’re still in the early stages of development, but we’re excited to see how trusted testers use these new capabilities and what lessons we can learn, so we can make them more widely available in products in the future."

Read full news in source page