zdnet.com

Gemini Pro 2.5 is one of only two AIs to crush all my coding tests - and it's free

This free Google AI just passed all my coding tests

Elyse Betters Picaro / ZDNET

As part of my AI coding evaluations, I run a standardized series of four programming tests against each AI. These tests are designed to determine how well a given AI can help you program. This is kind of useful, especially if you're counting on the AI to help you produce code. The last thing you want is for an AI helper to introduce more bugs into your work output, right?

Also: The best AI for coding (and what not to use)

Some time ago, a reader reached out to me and asked why I keep using the same tests. He reasoned that the AIs might succeed if they were given different challenges.

This is a fair question, but my answer is also fair. These are super-simple tests. I'm using PHP and JavaScript, which are not exactly challenging languages, and I'm running some scripting queries through the AIs. By using exactly the same tests, we're able to compare performance directly.

One is a request to write a simple WordPress plugin, one is to rewrite a string function, one asks for help finding a bug I originally had difficulty finding on my own, and the final one uses a few programming tools to get data back from Chrome.

But it's also like teaching someone to drive. If they can't get out of the driveway, you're not going to set them loose in a fast car on a crowded highway.

To date, only ChatGPT's GPT-4 (and above) LLM has passed them all. Yes, Perplexity Pro also passed all the tests, but that's because Perplexity Pro runs the GPT-4 series LLM. Oddly enough, Microsoft Copilot, which also runs ChatGPT's LLM, failed all the tests.

Also: How I test an AI chatbot's coding ability - and you can, too

Google's Gemini didn't do much better. When I tested Bard (the early name for Gemini), it failed most of the tests (twice). Last year, when I ran the $20-per-month Gemini Advanced through my tests, it failed three of the four tests.

But now, Google is back with Gemini Pro 2.5. What caught our eyes here at ZDNET was that Gemini Pro 2.5 is available for free, to everyone. No $20 per month surcharge. While Google was clear that the free access was subject to rate limits, I don't think any of us realized it would throttle us after two prompts, which is what happened to me during testing.

It's possible that Gemini Pro 2.5 is not counting prompt requests for rate limiting but basing its throttling on the scope of the work being requested. My first two prompts asked Gemini Pro 2.5 to write a full WordPress plugin and fix some code, so I may have used up the limits faster than you would if you used it to ask a simple question.

Even so, it took me a few days to run these tests. To my considerable surprise, it was very much worth the wait.

Test 1: Write a simple WordPress plugin

Wow. Well, this is certainly a far cry from how Bard failed twice and Gemini Advanced failed back in February 2024. Quite simply, Gemini Pro 2.5 aced this test right out of the gate.

Also: I asked ChatGPT to write a WordPress plugin I needed. It did it in less than 5 minutes

The challenge is to write a simple WordPress plugin that provides a simple user interface. It randomizes the input lines and distributes (not removes) duplicates so they're not next to each other.

Last time, Gemini Advanced did not write a back-end dashboard interface but instead required a shortcode that needed to be placed in the body text of a public-facing page.

Gemini Advanced did create a basic user interface, but that time clicking the button resulted in no action whatsoever. I gave it a few alternative prompts, and it still failed.

But this time, Gemini Pro 2.5 gave me a solid UI, and the code actually ran and did what it was supposed to.

randomizer-ui

Screenshot by David Gewirtz/ZDNET

What caught my eye, in addition to the nicely presented interface, was the icon choice for the plugin. Most AIs ignore the icon choice, letting the interface default to what WordPress assigns.

But Gemini Pro 2.5 had clearly picked out an icon from the WordPress Dashicon selection. Not only that, but the icon is perfectly appropriate to randomizing the lines in a plugin.

icon

Screenshot by David Gewirtz/ZDNET

Not only did Gemini Pro 2.5 succeed in this test, it actually earned a "wow" for its icon choice. I didn't prompt it to do that, and it was just right. The code was all inline (the JavaScript and HTML were embedded in the PHP) and was well documented. In addition, Gemini Pro 2.5 documented each major segment of the code with a separate explainer text.

Test 2: Rewrite a string function

In the second test, I asked Gemini Pro 2.5 to rewrite some string processing code that processed dollars and cents. My initial test code only allowed integers (so, dollars only), but the goal was to allow dollars and cents. This is a test that ChatGPT got right. Bard initially failed, but eventually succeeded.

Then, last time back in February 2024, Google Advanced failed the string processing code test in a way that was both subtle and dangerous. The generated Gemini Advanced code did not allow for non-decimal inputs. In other words, 1.00 was allowed, but 1 was not. Neither was 20. Worse, it decided to limit the numbers to two digits before the decimal point instead of after, showing it did not understand the concept of dollars and cents. It failed if you input 100.50, but allowed 99.50.

Also:How to use ChatGPT to write code - and my favorite trick to debug what it generates

This is a really easy problem, the sort of thing you give to first-year programming students. Worse, the Gemini Advanced failure was the sort of failure that might not be easy for a human programmer to find, so if you trusted Gemini Advanced to give you its code and assumed it worked, you might have a raft of bug reports later.

When I reran the test using Gemini Pro 2.5, the results were different. The code correctly checks input types, trims whitespace, repairs the regular expression to allow leading zeros, decimal-only input, and fails negative inputs. It also comprehensively comments the regular expression code and offers a full set of well-labeled test examples, both valid and invalid (and enumerated as such).

If anything, the code Gemini Pro 2.5 generated was a little overly strict. It did not allow grouping commas (as in $1,245.22) and also did not allow for leading currency symbols. But since my prompt did not call for that, and use of either commas or currency symbols returns a controlled error and not a crash, I'm counting that as acceptable.

So far, Gemini Pro 2.5 is two for two. This is a second win.

Test 3: Find a bug

At some point during my coding journey, I was struggling with a bug. My code should have worked, but it did not. The issue was far from immediately obvious, but when I asked ChatGPT, it pointed out that I was looking in the wrong place.

I was looking at the number of parameters being passed, which seemed like the right answer to the error I was getting. Instead, I needed to change the code in something called a hook.

Also:How to turn ChatGPT into your AI coding power tool - and double your output

Both Bard and Meta went down the same erroneous and futile path I had back then, missing the details of how the system really worked. As I said, ChatGPT got it. Back in February 2024, Gemini Advanced did not even bother to get it wrong. All it provided was the recommendation to look "likely somewhere else in the plugin or WordPress" to find the error.

Needless to say, Gemini Advanced, at that time, proved useless. But what about now, with Gemini Pro 2.5? Well, I honestly don't know, and I won't until tomorrow. Apparently, I used up my quota of free Gemini Pro 2.5 with my first two questions.

limit

Screenshot by David Gewirtz/ZDNET

So, I'll be back tomorrow.

OK, I'm back. It's the next day, the dog has had a nice walk, the sun is actually out (it's Oregon, so that's rare), and Gemini Pro 2.5 is once again letting me feed it prompts. I fed it the prompt for my third test.

Not only did it pass the test and find the somewhat hard to find bug, it pointed out where in the code to fix. Literally. It drew me a map, with an arrow and everything.

map

Screenshot by David Gewirtz/ZDNET

As compared to my February 2024 test of Gemini Advanced, this was night and day. Where Gemini Advanced was as unhelpful as it was possible to be (seriously, "likely somewhere else in the plugin or WordPress" is your answer?), Gemini Pro 2.5 was on target, correct, and helpful.

Also:I put GitHub Copilot's AI to the test - its mixed success at coding baffled me

With three out of four tests correct, Gemini Pro 2.5 moves out of the "Chatbots to avoid for programming help" category and into the top half of our leaderboard.

But there's one more test. Let's see how Gemini Pro 2.5 handles that.

Test 4: Writing a script

This last test isn't all that difficult in terms of programming skill. What it tests is the AI's ability to jump between three different environments, along with just how obscure the programming environments can be.

This test requires understanding the object model internal representation inside of Chrome, how to write AppleScript (itself far more obscure than, say Python), and then how to write code for Keyboard Maestro, a macro-building tool written by one guy in Australia.

The routine is designed to open Chrome tabs and set the currently active tab to the one the routine uses as a parameter. It's a fairly narrow coding requirement, but it's just the sort of thing that could take hours to puzzle out when done by hand, since it relies on understanding the right parameters to pass for each environment.

Also:I tested DeepSeek's R1 and V3 coding skills - and we're not all doomed (yet)

Most of the AIs do well with the link between AppleScript and Chrome, but more than half of them miss the details about how to pass parameters to and from Keyboard Maestro, a necessary component of the solution.

And, well, wow again. Gemini Pro 2.5 did, indeed, understand Keyboard Maestro. It wrote the code necessary to pass variables back and forth as it should. It added value by doing an error check and user notification (not requested in the prompt) if the variable could not be set.

Then, later in the explanation section, it even provided the steps necessary to set up Keyboard Maestro to work in this context.

maestro

Screenshot by David Gewirtz/ZDNET

And that, Ladies and Gentlemen, moves Gemini Pro 2.5 into the rarified air of the winner's circle.

We knew this was gonna happen

It was really just a matter of when. Google is filled with many very, very smart people. In fact, it was Google that kicked off the generative AI boom in 2017 with its "Attention is all you need" research paper.

So, while Bard, Gemini, and even Gemini Advanced failed miserably at my basic AI programming tests in the past, it was only a matter of time before Google's flagship AI tool caught up with OpenAI's offerings.

That time is now, at least for my programming tests. Gemini Pro 2.5 is slower than ChatGPT Plus. ChatGPT Plus responds with an answer nearly instantaneously. Gemini Pro 2.5 seems to take somewhere between 15 seconds and a minute.

Also:X's Grok did surprisingly well in my AI coding tests

Even so, waiting a few seconds for an accurate and helpful result is a far more valuable thing than getting wrong answers right away.

In February, I wrote about Google opening up Google Code Assist and making it free with very generous limits. I said that this would be good, but only if Google could generate quality code. With Gemini Pro 2.5, it can now do that.

The only gotcha, and I expect this to be resolved within a few months, is that Gemini Pro 2.5 is marked as "experimental." It's not clear how much it would cost, or even if you can upgrade to a paying version with fewer rate limits.

But I'm not concerned. Come back in a few months, and I'm sure this will all be resolved. Now that we know that Gemini (at least using Pro 2.5) can provide really good coding assistance, it's pretty clear Google is about to give ChatGPT a run for its money.

Stay tuned. You know I'll be writing more about this.

Have you tried Gemini Pro 2.5 yet?

Have you tried it yet? If so, how did it perform on your own coding tasks? Do you think it has finally caught up to, or even surpassed, ChatGPT when it comes to programming help? How important is speed versus accuracy when you're relying on an AI assistant for development work?

Also:Everyone can now try Gemini 2.5 Pro - for free

And if you've run your own tests, did Gemini Pro 2.5 surprise you the way it did here? Let us know in the comments below.

Get the morning's top stories in your inbox each day with ourTech Today newsletter.

You can follow my day-to-day project updates on social media. Be sure to subscribe to my weekly update newsletter, and follow me on Twitter/X at @DavidGewirtz, on Facebook at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, on Bluesky at @DavidGewirtz.com, and on YouTube at YouTube.com/DavidGewirtzTV.

Artificial Intelligence

Read full news in source page