Home >News >Business Of Life >The confounding problem of garbage-in, garbage-out in ML

One of the top 10 trends in data and analytics this year as leaders navigate the covid-19 world, according to Gartner, is “augmented data management." It’s the growing use of tools with ML/AI to clean and prepare robust data for AI-based analytics. Companies are currently striving to go digital and derive insights from their data, but the roadblock is bad data, which leads to faulty decisions. In other words: garbage in, garbage out.

“I was talking to a university dean the other day. It had 20,000 students in its database, but only 9,000 students had actually passed out of the university," says Deleep Murali, co-founder and CEO of Bengaluru-based Zscore. This kind of faulty data has a cascading effect because all kinds of decisions, including financial allocations, are based on it.

Zscore started out with the idea of providing AI-based business intelligence to global enterprises. But the startup soon ran into a bigger problem: the domino effect of unreliable data feeding AI engines. “We realized we were barking up the wrong tree," says Murali. “Then we pivoted to focus on automating data checks."

For example, an insurance company allocates a budget to cover 5,000 hospitals in its database but it turns out that one-third of them are duplicates with a slight alteration in name. “So far in pilots we’ve run for insurance companies, we showed $35 million in savings, with just partial data. So it’s a huge problem," says Murali.


This is what prompted IBM chief Arvind Krishna to reveal that the top reason for its clients to halt or cancel AI projects was their data. He pointed out that 80% of an AI project involves collecting and cleansing data, but companies were reluctant to put in the effort and expense for it.

That was in the pre-covid era. “What’s happening now is that a lot of companies are keen to accelerate their digital transformation. So customer traction is picking up from banks and insurance companies as well as the manufacturing sector," says Murali.

Data analytics tends to be on the fringes of a company’s operations, rather than its core. Zscore’s product aims to change that by automating data flow and improving its quality. Use cases differ from industry to industry. For example, a huge drain on insurance companies is false claims, which can vary from absurdities like male pregnancies and braces for six-month-old toddlers to subtler cases like the same hospital receiving allocations under different names.

“We work with a leading insurance company in Australia and claims leakage is its biggest source of loss. The moment you save anything in claims, it has a direct impact on revenue," says Murali. “Male pregnancies and braces for six-month-olds seem like simple leaks but companies tend to ignore it. Legacy systems and rules haven’t accounted for all the possibilities. But now a claim comes to our system and multiple algorithms spot anything suspicious. It’s a parallel system to the existing claims processing system."

For manufacturing companies, buggy inventory data means placing orders for things they don’t need. For example, there can be 15 different serial numbers of spanners. So you might order a spanner that’s well-stocked, whereas the ones really required don’t show up. “Companies lose 12-15% of their revenue each because of data issues such as duplicate or excessive inventory," says Murali.

These problems have got exacerbated in the age of AI where algorithms drive decision-making. Companies typically lack the expertise to prepare data in a way that is suitable for machine-learning models. How data is labelled and annotated plays a huge role. Hence, the need for supervised machine learning from tech companies like Zscore that can identify bad data and quarantine it.


Semantics and context analysis and studying manual processes help develop industry- or organization-specific solutions. “So far 80-90% of data work has been manual. What we do is automate identification of data ingredients, data workflows and root cause analysis to understand what’s wrong with the data," says Murali.

A couple of years ago, Zscore got into cloud data management multinational NetApp’s accelerator programme in Bengaluru. This gave it a foothold abroad with a NetApp client in Australia. It also opened the door to working with large financial institutions.

“The Royal Commission of Australia, which is the equivalent of RBI, had come down hard on the top four banks and financial institutions for passing on faulty information. Its report said decisions had to be based on the right data and gave financial institutions 18 months to show progress. This became motivation for us because these were essentially data-oriented problems," says Murali.

Malavika Velayanikal is a consulting editor with Mint. She tweets @vmalu.

Subscribe to Mint Newsletters
* Enter a valid email
* Thank you for subscribing to our newsletter.

Never miss a story! Stay connected and informed with Mint. Download our App Now!!

Edit Profile
My ReadsRedeem a Gift CardLogout