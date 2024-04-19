The neural networks in today’s LLMs are also inefficiently structured. Since 2017 most AI models have used a type of neural-network architecture known as a transformer (the “T" in GPT), which allowed them to establish relationships between bits of data that are far apart within a data set. Previous approaches struggled to make such long-range connections. If a transformer-based model were asked to write the lyrics to a song, for example, it could, in its coda, riff on lines from many verses earlier, whereas a more primitive model would have forgotten all about the start by the time it had got to the end of the song. Transformers can also be run on many processors at once, significantly reducing the time it takes to train them.