These companies grapple with a wide variety of fraudulent activities where some users rig addresses to avail discounts or commit a fraud. This has been a major pain point for e-commerce companies which bank on the last-mile logistics and ultimately bleed revenues.
But fraud is just one aspect of the complicated address challenge. For instance, in many developed countries, latitude and longitude information of each address is also available. In many developing countries such as India, such accuracy in information is not available or is only partial.
Flipkart, one of the biggest e-commerce companies in India, is turning to Machine Learning and Artificial Intelligence to solve the complicated address puzzle.
We spoke to senior Flipkart data scientist Ravindra Babu on how the company uses new age technologies to build an ‘address intelligence’.
The Address Conundrum
One of the rudimentary problems Flipkart faces is the variation in spellings for a common place.
“There are many problems that are native to India. Address is one of them. In a given household, say there six members in a family. Each of them will write their address in different ways. There is no uniformity, although the house members are literate. Say Marathahalli in Bangalore can be written in two ways—Maratha Halli or Marathahalli. So there are these kinds of challenges," he said.
“So, if you want to involve a model when you expect a machine to understand these complications, that’s where the challenge lies. Any AI problem we come across, first thing we do is go through the data, get a feel about the data than really thinking of a model. We also consider how much of pre-processing is required and how much variability the data entails."
“On an average 9-10 words are sufficient for a shipment to reach a particular place (example: person’s name, TCS, near ITPL Bangalore). However, some people include details like cubicle number, extension number, directions to reach the place from nearest bus stop. We don’t restrict users while specifying addresses because we don’t want to give a bad experience. This makes an address length to almost 200 words. These kinds of addresses will be difficult for a machine to understand."
Flipkart is also working to get the addresses marked on the map. It has partnered with maps service providers like MapMyIndia for geo-tagging locations. But the challenge lies in building a solution where the data works according to the geo-location information while addressing the problem of wrong spellings.
How the system works
Ravindra Babu observed that without a model in place, delivery hub staff will have to manually go through each shipment and distribute to field executives. This will make the process more tedious and time-consuming.
Flipkart first tries to understand how the addresses are given. The idea is to make sense of these addresses without asking the customers to type the addresses differently. The company uses a learning model to identify whether the address is monkey typed—addresses containing randomly typed alphanumeric characters.
The process also involves leveraging the expertise of field executives who have a better understanding of local addresses. Before deploying a model, the data is rectified, validated and monitored. The mechanism ensures that the data is coherent as different executives may have labelled these data differently.
With a solution in place, the technology automatically provides a sub-area name. This makes the entire process faster and simpler.
Babu said the team has also improved geo-tagging. But there are some other challenges as well. Some people write wrong pin codes. And in some cases, a few areas have same pin codes. The scientists use a different model to solve the problem. The idea is to make the machine suggest the user right pin code for their area.
Powerful Systems Under-The-Hood
Flipkart uses both conventional machine learning models and deep learning solutions which are used for various purposes, including the fix for address problem.
“We have enough infrastructure, which include cloud-based computing and Flipkart-owned solutions, coupled with powerful GPUs and CPUs to solve these problems. We have a number of models deployed to look into various aspects. For instance, our machines can understand if a user has entered a monkey-typed address or non-genuine address," he explained.
Open-source, human intervention and more
Babu said they have thought about open-sourcing the model.
“We also have thoughts that when we have enough address intelligence, we can also make it open-source and deploy it across India. We are still maturing but in a broad sense we might do that in the future," he said.
On human intervention, he asserted that in some cases, human intervention may be required, especially when the model is yet to be rectified and the problem has to be resolved. In those cases, the problem has to be addressed using manual intervention.
“If you’re able to make sense of data and you’re able to build a model, there are two broad ways of approaching a problem — supervised classification and unsupervised classification. For supervised classification, you need labels. Someone needs to feed the data. In case of addresses, field executives are the teachers that this is the right label and this is a wrong label. Someone has to be telling the machine that. The moment you’re able to make sense of the entire problem, you have data in complete sense and you’re able to model. Then you try to see how precise your prediction is going to be. How many fraudsters you can catch versus how well you can predict a fraudster is a fraudster," he explained.