Rush to Use Generative AI Pushes Companies to Get Data in Order

This illustration picture shows the AI (Artificial Intelligence) smartphone app ChatGPT surrounded by other AI App in Vaasa, on June 6, 2023. (Photo by OLIVIER MORIN / AFP) (AFP)
This illustration picture shows the AI (Artificial Intelligence) smartphone app ChatGPT surrounded by other AI App in Vaasa, on June 6, 2023. (Photo by OLIVIER MORIN / AFP) (AFP)


  • Data management is under the spotlight again as companies seek to out-innovate competitors with large language models

Interest in large language models, such as those developed by ChatGPT maker OpenAI, has put renewed focus on data management—placing more pressure on corporate technology chiefs to ensure their companies’ data is stored, filtered, and protected for use with AI.

“Any company, no matter what industry they’re in, really needs to have good structure and governance around how they manage data," said Rob Zelinka, chief information officer of financial technology firm Jack Henry. “Introducing large language models now, it’s even more imperative."

Adding to the urgency is the fact that companies who’ve already established a robust data infrastructure can more rapidly make use of large language models for custom business uses like managing contracts, providing customer service, and writing code. Racing to out-innovate their competitors, business technology leaders are facing greater demands to deliver on data frameworks that can help make generative AI applications a reality.

Some CIOs have turned to in-house data experts and outside vendors who specialize in setting up data infrastructure and managing cost for help. Data, which can include a company’s transaction records, analytics, code, and other types of proprietary information, is considered the backbone of any AI model, because it is used to teach those algorithms to glean patterns and make predictions from it.

Larry Pickett, chief information and digital officer of Syneos Health, is responsible for helping set a corporate data management strategy, of which the focus is “managing, cleaning, and organizing all the data across the entire business." To start, the biopharmaceutical company integrated data from its operational systems, such as enterprise resource planning and clinical trial information, into a data lake, or a digital repository, Pickett said.

Syneos Health then spent about 18 months preparing its data repository for training and building AI models, Pickett said, assigning a team of data scientists and business domain experts to build “feature stores," or so-called centralized repositories of reusable machine-learning building blocks.

The Morrisville, N.C.-based company also deletes the data it no longer uses, keeping only what it needs for AI, dashboards, and other applications. “The cloud costs can certainly explode, and data storage costs, if you don’t stay on it," Pickett said.

Training large language models requires ready access to vast amounts of data, whose storage, processing, and protection can be costly. Vendors like Granica, a Mountain View, Calif.-based startup founded in 2019 that just emerged from stealth mode, are part of a growing crop of startups aimed at helping companies take advantage of generative AI with ready-made services, or lower costs and cybersecurity assurances.

Granica has built a method of compressing data stored in and Google’s cloud platforms that it says can reduce the size and cost of cloud object storage, which hold large amounts of unstructured data that don’t fit into traditional columns and rows. The startup is announcing Thursday that it has raised a total of $45 million from venture-capital firms New Enterprise Associates and Bain Capital Ventures.

For securing its AI training data, Nylas, a provider of email, calendar and contacts APIs, is testing Granica’s Screen service, which can remove sensitive company data and personally-identifiable information in the process of compressing it.

That is useful for a generative AI tool that could be trained to write emails like a specific user, said John Jung, Nylas’s vice president of engineering. “You’d want it scrubbed of [personally-identifiable information] so that you don’t potentially have the models hallucinate, and tell information that is sensitive," he said, referring to when generative AI programs spit back false results.

Analysts also expect more startups to focus specifically on helping companies sift through and control access to their data for generative AI.

For some CIOs, data quality is just as important as controlling cost—in other words, ensuring that their data is properly formatted, organized, and relevant for training AI models. “The most important thing is not just collect the data, but cleanse, categorize the data, and make sure it’s in a usable format," Zelinka said. “Otherwise you’re just paying to store meaningless data."

Jack Henry is focused on data governance at the moment, Zelinka said. He is working with the company’s chief risk officer to define who has access to its data and how it’s being used, and collaborating with the firm’s chief technology officer, who is figuring out how to embed generative AI into its products and platforms.

Most companies are concerned with the “quality, context and privacy," of their data for use with large language models, said Erick Brethenoux, a distinguished vice president analyst at IT research and consulting firm Gartner. Those problems have long existed, he said, but are now accelerated by interest in generative AI.

Syneos Health is currently preparing to release what it calls its “Protocol Genius" tool, a chatbot built on OpenAI’s large language model and ChatGPT, which can search across 400,000 clinical protocols, Pickett said. Business interest has pushed that pace of innovation, he said, “because we’re certain that others are going to do that as well."

Write to Belle Lin at

Catch all the Technology News and Updates on Live Mint. Download The Mint News App to get Daily Market Updates & Live Business News.


Switch to the Mint app for fast and personalized news - Get App