OpenAI, Google, Meta or Anthropic? A Guide to the Best AI for Your Business

Summary
Different flavors work best for different business needsWe’re all being deluged with news about how the latest generation of AI is transforming people’s lives, helping businesses be more productive, and even leading to layoffs. But that flood of information doesn’t help anyone answer the most basic question about these AIs: Which is best?
So I canvassed executives, engineers and researchers who are knee-deep in the process of applying the world’s most powerful AIs to real world problems, to find out what they have learned.
Their answers surprised me. There was plenty of practical advice about the relative strengths and weaknesses of AIs from Google, OpenAI, Anthropic and Meta. But the overall message was that the best AI for any task depends on both the user and the task. Their insights also offer a glimpse of where the entire field of AI is going.
In a way that wasn’t true even six months ago, companies can now either embrace the potential cost savings and productivity boost of generative AI—which some researchers believe is on the path to a “general" or humanlike AI—or risk losing out to competitors who will.
Treat Your AIs Like the Employees They Are
Today’s most powerful AIs aren’t something you can buy and run on your own computers. They’re only accessible through the cloud. This makes it easy to test them by feeding them documents, images and text, but also means that businesses have limited ability to alter their behavior.
Testing these AIs is more like hiring an employee than just buying a piece of software off the shelf, says Mark Daley, chief AI officer of Western University in Ontario.
“People expect the chatbot to work right out of the box, but you have to spend time trying them and see which of these will deliver, just like you do with an employee," he adds.
Daley has found that all of the major large language models—including those from OpenAI, Anthropic, Google and Cohere, a startup that only offers its models to businesses—have their strengths and weaknesses. Which one to use depends on a person’s preferences and the task at hand, and it pays to experiment with them.
No One Ever Got Fired for Buying ChatGPT
Other companies appear to be catching up with the capabilities of OpenAI, but OpenAI’s models remain, for now, the standard by which all others are judged. Earlier this week, Anthropic rolled out Claude 3, a new large language model which the company claims beats the gold standard GPT-4 on every benchmark.
“We are using OpenAI like crazy," says Brad Schneider, chief executive of Nomad Data, a company that helps large companies use AI. Nomad Data uses OpenAI to digest, summarize and search within huge libraries of documents, such as legal briefs, court cases and insurance claims. The company’s clients also include private-equity firms who might have only a week to digest thousands of documents about a company they’re about to acquire.
After trying all of the most-capable large-language models, Schneider’s company found that none are as good as OpenAI for these kinds of document-processing tasks. Previous versions of Anthropic’s Claude and current versions of Google’s Gemini both hallucinated too often, he found. (In AI, ‘hallucination’ is a term for when a chatbot makes up false information.)
Google senior vice president Prabhakar Raghavan recently wrote that hallucination is a challenge common to all large language models, but that “this is something that we’re constantly working on improving." Anthropic President Daniela Amodei has said that it is “very, very hard" to get the hallucination rate in such models to zero. The company has said that its latest model is twice as likely as its previous one to answer questions accurately, and that eliminating all hallucinations can make models hesitant to answer questions they would otherwise get right.
Figure Out What Matters for Your AI
In addition to accuracy, the other two big considerations are speed and cost, says Eric Olson, chief executive of Consensus, a scientific search engine.
On a search engine, users expect a response within seconds. Because Consensus pairs its search results with summaries of scientific papers made by GPT-4, the company needs those summaries to be generated nearly instantaneously.
For Olson’s purposes, this means the only truly suitable model is OpenAI’s GPT-4 “turbo," which can get a user a response within 1.5 seconds, versus twice as long with regular GPT-4. Google’s Gemini and Anthropic’s Claude are also slower than OpenAI’s models, he adds.
That said, this kind of performance comes at a cost. OpenAI and its competitors charge business users of their systems by the token—in essence, by the word—to process their requests.
“We have cases where one question someone asks can cost $50," says Schneider. That could happen if, for example, someone asks a specific question about a collection of 5,000 legal documents, because the number of calls to OpenAI’s systems could be in the tens of thousands.
Google’s Strength: Scale
While OpenAI and Anthropic duke it out for the title of most capable large language model, Google has been a laggard by many benchmarks.
But one advantage for Google and its customers is that its models have the ability to ingest huge volumes of data in each query. That’s something OpenAI currently doesn’t offer, and Anthropic does for only a small group of customers.
“Gemini 1.5 allows a million tokens of context, and that is an absolute game-changer," says Daley. “You can feed in 10 textbooks of material, and it can synthesize across that, not perfectly, but better than a human could do given 35 seconds."
But What About Microsoft?
Microsoft has a couple of challenges in its rollout of AI. First is that, despite its tie-up with OpenAI, the company is in some ways a reseller of OpenAI’s services—which businesses can also buy directly from OpenAI.
To be clear, Microsoft is providing a platform for a lot of different AI models, which it offers through its Azure cloud service. Microsoft also has a partnership with Mistral, for example, and offers Meta’s open source Llama model.
“With Azure AI we are bringing the most comprehensive selection of high-performing open and frontier models to customers on the world’s most trusted cloud platform," says Eric Boyd, vice president of AI platform at Microsoft.
Amazon’s cloud services have a similar strategy, and the company has partnered with Anthropic.
When OpenAI adds a new feature, there is a meaningful delay before it becomes available in Microsoft’s version of those models, says Schneider. Microsoft’s version of GPT 4 also seems to be capacity constrained in a way that OpenAI’s is not, leading to stricter limits on how many tokens a minute businesses can buy from it, he adds.
Many Companies Will Build Their Own AIs
For many specialized applications of generative AI, companies may want to build and train their own AI—or pay someone else to do it for them, says Petr Baudis, chief AI architect at Prague-based Rossum. Rossum automates the processing of invoices for companies, using a variety of AIs which its research team built themselves.
Training your own large-ish language model might sound like an impossible task, but with the rapid development of open source models like Meta’s Llama, it’s the sort of thing that even a small team can accomplish.
Everyone I spoke to for this piece said that open source large language models, which are rapidly becoming more capable, can be operated at a fraction of the cost of accessing OpenAI and Google’s models. There are a couple of reasons for this. The primary one is that these models are much smaller, and therefore require less power to run. The second is that since they can run on a company’s own servers, they cut out the middleman of the big AI companies and their margins.
Custom, open source AIs can also outperform the biggest large language models if they’re trained on the right data, and asked to do a sufficiently narrow task—such as the invoice processing service that Rossum provides.
What’s True Today Won’t Be True Tomorrow
Generative AI is a technology evolving at a rate not seen since the go-go days of the early internet itself. Anthropic’s release of a model that seems every bit as capable as OpenAI’s, despite having a smaller team and having been founded much more recently, suggests that large language models may become commodities. At that point, the only thing that will matter will be which company can offer the fastest response at the lowest price.
The beneficiary of that fierce competition will be companies large and small, which could see significant increases in the productivity of their employees. Those gains will come at a fraction of the cost of paying humans to do the same knowledge work. The implications for the future of white collar jobs are obvious—and worrisome.
For more WSJ Technology analysis, reviews, advice and headlines, sign up for our weekly newsletter.
Write to Christopher Mims at christopher.mims@wsj.com