In August 2020, I wrote about the stunning storytelling prowess of another LLM, GPT3 (bit.ly/3RbHfbB ). The Generative Pre-trained Transformer Version 3, I wrote, was being heralded as the first step towards the holy grail of AGI (Artificial General Intelligence), where a machine has the capacity to understand or learn any intellectual task that a human being can. GPT has been trained on a massive body of text, mined for statistical regularities or parameters or connections between different nodes in its neural network. The scale is gargantuan, with 175 billion parameters; all of Wikipedia comprises just 0.6% of its training data! GPT-3 was developed by OpenAI too, and with DALL-E, it took this to another level. OpenAI took a 12-billion-parameter version of GPT-3 and trained it to interpret natural language inputs and generate images corresponding to it; thus literally ‘swapping texts for pixels’. So, if the text prompt was “an astronaut riding a yellow horse near Saturn", the program would break up this sentence into segments of information, find an image closest to it, and then synthesise all of it to show an astronaut sitting on a horse against a starry sky with Saturn hovering in the background. A sister model called CLIP (Contrastive Language Image Pretraining) would then rank the outputs created based on certain parameters and curate the best ones to show you. The model was trained on a large number of photos either scraped off the internet or acquired from licensed sources.