How years of Reddit posts have made the company an AI darling
Summary
- The social-media veteran is profiting from feeding artificial-intelligence companies’ insatiable appetite for conversational text.
Artificial-intelligence companies were one of Reddit’s biggest frustrations last year. Now they are a key source of growth for the social-media platform.
These companies have an insatiable appetite for online data to train their models and display content in an easy-to-digest format. In mid-2023, Reddit, a social-media veteran and IPO newbie, turned off the spigot and began charging some businesses for access to its data.
It turns out that Reddit’s ever-growing 19-year warehouse of user commentary makes it an attractive resource for AI companies. The platform recently reported its first quarterly profit as a publicly traded company, thanks partly to data-licensing deals it made in the past year with OpenAI and Google.
Reddit Chief Executive and co-founder Steve Huffman has said the company had to stop giving away its valuable data to the world’s largest companies for free.
“It is an arms race," he said at The Wall Street Journal’s Tech Live conference in October. “But we’re in talks with just about everybody, so we’ll see where these things land."
Reddit’s huge amount of data works well for AI companies because it is organized by topics and uses a voting system instead of an algorithm to sort content quality, and because people’s posts tend to be candid.
For the first nine months of 2024, Reddit’s revenue category that includes licensing grew to $81.6 million from $12.3 million a year earlier.
While data-licensing revenue remains dwarfed by Reddit’s core advertising sales, the new category’s rapid growth reveals a potential lucrative business line with relatively high margins.
Diversifying away from a reliance on advertising, while tapping into an AI-adjacent market, has also made Reddit attractive to investors who are searching for new exposure to the latest technology boom. Reddit’s stock has more than doubled in the past three months.
The source of Reddit’s newfound wealth is the burgeoning market for AI-useful data. Reddit’s willingness to sell its data to AI outfits makes it stand out, because there is only a finite amount of data available for AI companies to gobble up for free or purchase. Some executives and researchers say the industry’s need for high-quality text could outstrip supply within two years, potentially slowing AI’s development.
AI companies need data so their apps can respond to users’ prompts and search queries with accurate results and in the kind of conversational tone for which they are known. Enter Reddit, whose text-heavy platform and growing corpus of online human interactions fit the bill.
“It’s like manna from heaven," said Ari Morcos, chief executive of DatologyAI, a startup that curates data for AI training. “All they have to do is package the data set and hire salespeople."
However, it is unclear exactly how much future financial value Reddit’s data-licensing deals with AI companies hold. Reddit hasn’t disclosed the terms of the arrangements or how long they are set to last.
Selling data to AI companies doesn’t make sense for all social-media platforms. Some have their own AI models, including Facebook parent Meta and Elon Musk’s X, while others mostly contain private conversations or discussions on specific topics. Reddit isn’t building the kind of AI models its data-licensing customers have created, and it functions differently from most of its peers.
On Monday, Reddit began testing an AI-powered search tool for its own content using AI models from OpenAI and Google, a spokesman said.
For example, Reddit users can upvote or downvote each other’s posts and comments, and they can earn so-called karma points for posting popular content, which other users can see. Votes and karma can be used as signals to AI models of what is likely to be high-quality versus low-quality content, said Jaime Sevilla, director of Epoch AI, an AI-focused research institute. By contrast, most other social platforms only indicate if posts and comments are popular by counting “likes," or they measure users’ popularity by showing how many followers they have.
Another distinctive aspect of Reddit is that its users are mostly pseudonymous. People tend to be more honest and forthcoming online when they don’t have to worry about being embarrassed or offending folks they know by what they post, said Richard Lachman, an associate professor of digital media at Toronto Metropolitan University. The more authentic the content, the more useful it is for AI training, he said.
The same logic applies to Reddit’s diverse corpus, added Lachman. The platform is divided into more than 100,000 “subreddits" dedicated to all sorts of topics, from sports and religion to politics and animals. Many other social platforms, meanwhile, cater to narrow groups of people, as Discord does for videogame enthusiasts and Strava does for fitness fans.
“Reddit is like a 24-hour buffet," said Lachman.
As of October, Reddit.com was the fifth-most-visited website in the U.S., according to analytics company Similarweb. In the first half of this year, people posted more than 5.3 billion pieces of content to Reddit, a 20.5% increase from the second half of 2023, according to the company’s latest transparency report. That amount, however, includes content it doesn’t make available to its data-licensing partners such as private messages and chats.
The volume of data Reddit sells to customers is likely far smaller than that produced by other larger social platforms, said Morcos, who worked at Meta Platforms and Google’s DeepMind unit before founding DatologyAI last year. Reddit had 97 million daily users at the end of September, while Snapchat, for example, had 443 million daily users.
Meanwhile, some news publishers, including the New York Times, have opted to battle OpenAI and its backer Microsoft in court, alleging their content was used without permission to train artificial-intelligence tools and populate answers for users. OpenAI has said the lawsuit is without merit.
Wall Street Journal owner News Corp has a content-licensing partnership with OpenAI.
Write to Sarah E. Needleman at Sarah.Needleman@wsj.com