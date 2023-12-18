How can GenAI work with regional languages?

The fact remains that even though India is home to more than 400 languages, making it one of the most linguistically diverse countries in the world, most foundation models and LLMs are trained primarily using internet data, which is predominantly English. As per Statista, English was the most popular language for web content, representing nearly 59% of websites as of January this year. Russian ranked second with 5.3% of web content, followed by Spanish with 4.3%.