Mint Explainer: Can Google get its AI mojo back with Gemini? | Mint

Mint Explainer: Can Google get its AI mojo back with Gemini?

Big Tech companies are leaving no stone unturned to outdo their competitors in GenAI
Big Tech companies are leaving no stone unturned to outdo their competitors in GenAI


  • Google says its new AI model's strength is that it has been built from scratch and not pieced together after training separate components for different media. But according to Gary Marcus, AI scientist and author, Gemini doesn’t blow away its competitor from OpenAI

The generative AI race is heating up with the release of Google's new AI model Gemini, which has been in the works for some time. It provides credible competition to OpenAI's GPT-4. And when OpenAI releases GPT-5, Google is likely to launch ‘Gemini Ultra’, the most powerful version of its AI model.

GenAI models are being used to generate content including text, images, audio, video, code and simulations with the help of natural-language prompts. They are being used in at least one business function, said one-third of respondents to a McKinsey Global survey in August. Moreover, 40% of respondents said their organizations would increase investments in AI overall because of advances in generative AI. This explains why Big Tech companies are leaving no stone unturned to outdo their competitors in GenAI.

Consider these developments. Meta, according to a report in the Wall Street Journal, is working on a new AI system that is expected to be more powerful than GPT-4 (Gemini had not been not released when this report was published). 

The system, which could be ready next year according to WSJ, aims to help companies develop sophisticated text analysis and provide other services. The new AI model is expected to be several times more powerful than Meta's own Llama 2. 

The company is in the process of acquiring Nvidia’s H100s chips for AI training and is building data centres to help train the model. (According to Nvidia, the H100 is up to nine times faster for AI training and 30 times faster for inference than the A100).

Microsoft unveiled two custom-designed chips and systems at its Ignite 2023 event in Seattle on 16 November, a move that will help the company reduce its dependence on chips from companies like Nvidia, leverage its investment in OpenAI, and stave off competition from the likes of Intel and AMD.  

While Microsoft’s Azure Maia AI Accelerator has been optimized for AI- and generative AI-specific tasks, its Azure Cobalt CPU is an Arm-based processor that will cater to general-purpose tasks on Microsoft Cloud. Maia has already been tested by OpenAI. Microsoft expects to roll out the new chips to its data centers early next year and will start by using them to power services such as Microsoft Copilot and Azure OpenAI Service.

Google, too, is looking to regain its AI glory with Gemini. After all, its transformer models are the base for a majority of foundation models and large language models (LLMs), including GPT-4. Gemini was being touted as Google’s “next-generation foundation model" even when it was still in training. Now that it has been fine-tuned and tested for safety, Gemini is available in three sizes – Ultra, Pro and Nano. Gemini Ultra is its "largest and most capable model for highly complex tasks", Pro is "for scaling across a wide range of tasks", and Nano for "on-device tasks".

Nano, for example, powers 'Summarize' in the Recorder app on Google's Pixel 8 Pro, which allows users to get a summary of their recorded conversations, interviews, presentations and more – even if they are offline. Bard uses a fine-tuned version of Gemini Pro. Google says Gemini will be available in more of its products and services such as Search, Ads, Chrome and Duet AI, "in the coming months". 

The company has already started to experiment with Gemini in search, "where it's making our search generative experience (SGE) faster for users, with a 40% reduction in latency in English in the US, alongside improvements in quality".

Bindu Reddy, CEO of Abacus.AI, wrote on X, “The race towards AGI has officially started! We will need a couple more significant breakthroughs from where we are today! Don’t underestimate open-source, we still have a very good at shot coming first! More and more companies will join the OSS (open-source software) revolution!"

Betting on native multimodal AI

Google argues that Gemini 1.0's strength is that it has been built from scratch and not pieced together after training separate components for different media.

"We designed Gemini to be natively multimodal, pre-trained from the start on different modalities. Then we fine-tuned it with additional multimodal data to further refine its effectiveness," reads Google’s blog. It adds that Gemini 1.0 was trained to simultaneously recognize and understand text, images, audio and more so it better understands nuanced information and can answer questions about complex subjects. This makes it especially good at reasoning in complex subjects like math and physics, where models like GPT are found wanting. You can explore Gemini's multimodal capabilities here.

Semiconductor research and consulting firm Seminanalysis said in a 28 August paper by Dylan Patel and Daniel Nishaball that Google Gemini would smash ChatGPT-4 by 5x by 2023 end and 100x by the end of 2024. According to the authors, “Google had all the keys to the kingdom, but they fumbled the bag." They were referring to Google’s MEENA model, which was released before the pandemic. 

“The Meena model has 2.6 billion parameters and is trained on 341 GB of text, filtered from public domain social media conversations. Compared to an existing state-of-the-art generative model, OpenAI GPT-2, MEENA has 1.7x greater model capacity and was trained on 8.5x more data," reads the 28 January 2020 blog. But Google could not capitalize on MEENA’s potential, and on 1 December 2022, OpenAI’s ChatGPT stole the show. Google, according to Seminanalysis, will get its mojo back with Gemini.

Expectedly, OpenAI’s Sam Altman countered with a tweet on 29 August: “Incredible Google got that Semianalysis guy to publish their internal marketing/recruiting chart lol". Elon Musk responded: “Are the numbers wrong?" to which Patel replied: “They are correct". The debate had just begun.

Gemini, Google said, is also the "first model to outperform human experts on MMLU (Massive Multitask Language Understanding), one of the most popular methods to test the knowledge and problem solving abilities of AI models". It uses Google’s in-house Tensor chips and custom-designed AI accelerators that have been powering Google's AI-powered products including Search, YouTube, Gmail, Google Maps, Google Play and Android. This, of course, implies that Google wants to reduce its dependency on GPUs chips from Nvidia, with whom it also partners for Google Cloud.

The market opportunity for AI chips is huge. According to a 22 August note by research firm Gartner, chips designed to execute AI workloads will represent a $53.4 billion revenue opportunity for the industry in 2023, an increase of 20.9% from 2022. 

Nvidia does not manufacture its own chips. Intel, on the other hand, has its own foundries to make chips but the company has lost ground to Taiwan Semiconductor Manufacturing Company (TSMC), which has helped its rivals AMD and Nvidia eat into its market share. 

Intel, too, is building AI PCs with Intel Core Ultra processors. Acer is already working on powering its laptops with these processors, according to Jerry Kao, its chief operating officer.

Gemini is yet to impress

According to Gary Marcus, AI scientist and author, "Gemini seems to have, by many measures, matched or slightly exceeded GPT-4, but not to have blown it away." He underscored, though, that Gemini is a commercial competitor to GPT-4, which will pose "a huge problem for OpenAI, especially post-drama, when many customers are now seeking a backup plan". However, he asks: “From a technical standpoint, the key question is: are LLMs close to a plateau?"

Ethan Mollick, an associate professor at the University of Pennsylvania’s Wharton School of Business, posted on X that “Bard now follows the script for more complex prompts, like our prompt that helps teachers find analogies to explain concepts. However, Bard confidently produces exactly the wrong explanation – getting things reversed. I asked GPT-4 to check it and it gently corrected the errors."

That said, these are still early days and it makes sense to see how users and companies adopt these new versions. Meanwhile, more competition is always welcome.

Catch all the Business News, Market News, Breaking News Events and Latest News Updates on Live Mint. Download The Mint News App to get Daily Market Updates.



Switch to the Mint app for fast and personalized news - Get App