theregister.com

Google Gemini 2.0 Flash comes out with real-time conversation, image analysis

Google on Wednesday released Gemini 2.0 Flash, the latest addition to its AI model lineup, in the hope that developers will create agentic applications in AI Studio and Vertex AI.

AI agents are all the rage at the moment among makers of machine learning models because there's a presumed market for software-derived labor that's capable, compliant, and cheap – qualities not yet consistently evident in current AI models.

AI agents consist of AI models that can accomplish multi-step tasks, as directed by complex prompts, generally in conjunction with external data sources and tools. The pitch to shareholders of the AI-focused firms goes something like this: customers will be able to ask our AI agent to plan a vacation and – insert monetization strategy here – the agent will actually execute the various steps required, including making reservations and paying for transportation.

We're not there yet because most people aren't ready to delegate purchasing authority or full application access to unreliable AI models. But the hope is that such concerns can be addressed to the point that people are willing to try it. And given the public's demonstrated risk tolerance for cryptocurrency investment and hands-free automated driving, that point isn't far off.

"The practical application of AI agents is a research area full of exciting possibilities," said Demis Hassabis, CEO of Google DeepMind and Koray Kavukcuoglu, CTO of Google DeepMind, in a blog post provided to The Register. "We’re exploring this new frontier with a series of prototypes that can help people accomplish tasks and get things done. These include an update to Project Astra, our research prototype exploring future capabilities of a universal AI assistant; the new Project Mariner, which explores the future of human-agent interaction, starting with your browser; and Jules, an AI-powered code agent that can help developers."

To realize its dreams of Jarvis – a reference to the personal assistant depicted in Marvel's Iron Man films – Google is aiming at software developers. Through its AI Studio and Vertex AI platforms, the biz is offering AI models that can be grounded – linked to specific sources of data to make model responses more accurate – and given access to specific functions and tools.

"AI Studio is really intended to be sort of the interface for developers to get access to Google's latest models," explained Logan Kilpatrick, product manager for AI Studio and Gemini API, during a media briefing. "You have all those sort of experimental models that we released there. You have all the production models. The intent is sort of get you interested in the capabilities that Gemini has to offer and then ultimately get you building with the Gemini API and actually like putting Gemini into your apps and projects."

Gemini 2.0 Flash arrives a year after Google's first Gemini model debuted. It joins a lineup that includes other Gemini family models: Gemini 1.0 Ultra, Gemini 1.5 Pro, Gemini 1.0 Pro, and Gemini 1.5 Flash, and Gemini 1.0 Nano.

Generally, the 1.5 versions are more capable than the 1.0 versions, and the larger models tend to perform better than the smaller ones (Ultra, Pro, Flash, and Nano, in order of size); the Chocolate Factory has published benchmarks that provide more details. Gemini 2.0 Flash is said to be twice as fast as 1.5 Pro, with better performance.

Gemini 2.0 Flash brings some new capabilities to the table. The model is multilingual and also multimodal – it can accept text, imagery, and audio as input and can respond in any of those modes. And it sports a multimodal live API – so it can engage in real-time conversation and image analysis.

Plus the new model supports tool use, in the form of code execution and search, which provides access to recent information, calculation capabilities, and the ability to interact with data sources without extra setup.

"The model is now able to natively output both audio and images, which will start off in an early access program but roll out more broadly over the next few months," said Kilpatrick.

In conjunction with the debut of Gemini 2.0 Flash, Google is starting to roll out Jules, adding some "agentic data science capabilities" to Google Colab, and making its new model available within Gemini Code Assist, the company's AI coding extension for VS Code, IntelliJ PyCharm, and other IDEs.

"Starting today [for trusted testers], you can offload Python and JavaScript coding tasks to Jules, an experimental AI-powered code agent that will use Gemini 2.0," said Shrestha Basu Mallick, group product manager for Gemini API, and Kathy Korevec, director of product for Google Labs, in a blog post provided to The Register. "Working asynchronously and integrated with your GitHub workflow, Jules handles bug fixes and other time-consuming tasks while you focus on what you actually want to build."

Those not in the trusted tester program can sign up to try Jules in 2025.

As a demonstration of Gemini 2.0 Flash, Basu Mallick played a game of 20 questions with the model by speaking to it and listening to its responses. She also asked it to count the number of fingers she was holding up in a video stream and to say what color her nails were painted. The model answered adequately in both cases, though we'd argue that "red" would have been a more accurate answer to the nail color query than "pink." That could just reflect differences in monitor color rendering, however.

Basu Mallick also demonstrated how Gemini 2.0 Flash can handle a multistep prompt asking the model to identify the five longest movies by Denis Villeneuve, calculate their respective running times, then plot the data on a graph. The task involved having the model generate Python code and execute it in a sandbox to calculate the results.

"This is the kind of complex prompt where you first have to solve the first part of the prompt and then the second part of the prompt," she explained. "Then I'm asking it to write some code to work out which has the longest and shortest runtimes and then do a plot."

Screenshot of Gemini 2.0 Flash model generating recipe and image

Screenshot of Gemini 2.0 Flash model generating recipe and image - click to enlarge

Another demonstration showed off Gemini 2.0 Flash's multimodal capabilities for generating recipes. The model was able to create visuals showing how ingredients might look in a frying pan to augment generated text instructions.

The only thing missing was pricing information. "We aren't really saying pricing at this time," said Kilpatrick. "Developers will be able to use the multimodal live API and the 2.0 models for free through AI studio, and when we do a wider release early next year, we'll follow up with pricing." ®

Read full news in source page