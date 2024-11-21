The fact remains that LLMs and LLM-powered chatbots are trained on humongous volumes of data. ChatGPT-3.5, for instance, was trained on 570GB of text data from the internet containing hundreds of billions of words, including text harvested from books, articles, and websites, including social media, following which these chatbots generate new content like text, images, videos, code, etc., with natural language prompts. That said, as explained in New York Times’ 69-page suit, this content is produced by journalists who not only spend considerable time and effort reporting pieces, but also review the articles for accuracy, independence, and fairness.