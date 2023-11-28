All these models are now scrambling for data—the second force shaping the generative-ai industry. The biggest, such as Openai’s and Google’s, are gluttonous: they are trained on more than 1trn words, the equivalent of over 250 English-language Wikipedias. As they grow bigger they will get hungrier. But the internet is close to being exhausted. Many model-makers are therefore signing deals with news and photography agencies. Others are racing to create “synthetic" training data using algorithms; still others are trying to work with new forms of data, such as video. The prize is a model that beats the rivals.