Mipra and Prime Match will soon make life more difficult for tax evaders in India. Neither is a new law or scheme that India’s finance minister has dreamt up to penalize people who do not pay tax; India has just around 30 million taxpayers. Mipra and Prime Match are search-engine tools developed by Hyderabad-based software firm Posidex Technologies Pvt. Ltd. They are part of a quiet initiative by the tax department to create what it calls a complete taxpayer’s profile (TPP) using software that will “dig out all information about one person on a single screen from as little information as a name,” according to a senior official at the Income-Tax department.
He, and the other officials in the department Mint spoke to, did not wish to be identified because of the sensitivity of the initiative which is just beginning to be rolled out nationally after a test in Delhi.
The TPP will have details of the individual that the department collects from forms taxpayers fill and from banks, fund houses and other establishments, and other information that the software fills in by simply (although the actual process is far from simple) ma-tching names and addresses.
IDENTIFYING TAX DEFAULTERS (Graphic)
Among the details, which will be presented on a single screen for officials at the tax department to see, will be: tax transactions, assets, phone and vehicle number, information on bank accounts and investments, even the family tree of the individual.
The software-enabled offensive will help the tax department and its overseeing ministry, finance, increase India’s tax base, apart from helping them do something they have done well over the past few years: increase the efficiency of the tax collecting process by ensuring that those who pay tax pay as much as they are supposed to.
“Tax laws in India have become stricter and technology has been used to (help) collect taxes,” said finance minister P. Chidambaram earlier this week at a function in New Delhi. He added that this had boosted tax collections to their highest level ever, Rs2,29,505 crore in 2006-07 (the number includes both income tax and corporate tax).
Around 50 million Indians have a permanent account number (PAN), a unique number assigned to them by the income-tax department. Of this number, taxpayers constitute 30 million, or 3% of the country’s total population. The corresponding number in most parts of the world is in the range of 30-50%, said an official of the income-tax department who is involved in implementing the software. The reason for the difference? “Individuals there (in other parts of the world) get a very clear message that they are being watched and they are ‘known’ to the government,” he said. This, he added, was what the department “wanted to replicate in India.”
The TPP is part of a larger plan by the tax department to create a national data centre or data warehouse. Although regional tax offices do store information online, there are some that that are not linked to each other; there is no central tax database—the warehouse will make that possible. More than the centralization, though, tax officials are looking forward to using the software because it has the ability to handle non-PAN information. Around 90% of tax-payer information in countries such as the US, Canada, the UK and Australia is defined by a single number (much like the PAN). In India, only 10% is; the remaining 90% either isn’t defined by a PAN or is defined by a wrong one.
Tighter tax laws have meant that people have to quote their PAN while performing high-value transactions. For instance, a PAN is a must for those who spend more than Rs25,000 in restaurants or on hotel stays or invest in excess of a certain amount in bank deposits or mutual funds. All these organizations (banks, fund houses, hotels and other merchant establishments) file this information to the tax department periodically. But some people get around this by spelling their name differently, using a different address, misquoting the PAN, or simply stating the truth—that they do not have a PAN.
“Earlier, we were groping in the dark on how to handle non-PAN entries where the maximum tax evasion takes place,” said another official in the tax department. The department doesn’t have a way to deal with non-PAN data today. That will change once the new software is adopted, added the official. “This software will be able to handle all PAN data and about 80% of non-PAN data including names and addresses.”
Any software that seeks to build a TPP won’t just have to scour multiple databases for information, but also be able to reconcile differences related to names and addresses. This is achieved through data mining, the process of extracting valid, previously unknown, and comprehensible information from large databases through the use of statistics and pattern recognition.
The software engine
The software that will do all this doesn’t have a name yet but was designed by five senior officers of the tax department with a penchant for technology. Posidex provided the tools and the technology, and worked on the actual coding (or writing the software).
The core of the software is an extract transfer and load (ETL) tool that allows the department to collect data in multiple file formats from several databases. Thus, ETL can pull out information from tax returns, annual information reports, a database of banking transactions or a list of art buyers and other such. This information can be a file in any form—text, spreadsheet, etc. The ETL saves this information in a database.
Two tools, Mipra, short for multi-interactive phonetic pattern recognition algorithm, and Prime Match then get to work on the data. They use phonetic matching and pattern recognition to detect variations of names. “The biggest challenge is to track and group names that are spelt differently,” said Bhavanishanker Chitoor, CEO, Posidex Technologies. “This software does that even if they disclose their name differently in different data sources,” he said. ICICI Bank uses Prime Match to match customer names in its credit cards business. Posidex said it had to adapt the tools to suit the needs of taxmen.
The software has been tested successfully, according to the tax department official involved in its implementation, in the tax office in New Delhi, and will be implemented in 14 other tax offices across the country in the next six to eight months.
“We have received an in-principle approval from the Central Board of Direct Taxes (CBDT) for a national rollout and the process has begun,” he added. He said that the department’s ability to link names to PANs and transactions in cases where there were discrepancies went up from 10% (which means the department was able to match these in one out of every 10 instances) to 80%. Delhi has four million PAN card holders.
Experts in the software industry said software such as the one being adopted by the tax department works best when it is complemented by human effort at appropriate stages. “A bit of manual intervention will shorten the chase,” said Umesh Gupta, CEO, Open Software Technology, a Gurgaon-based business intelligence firm.
The price of information
The tax official in charge of implementing the software said the Delhi experiment had cost the department Rs15-16 lakh “for the software” and “Rs5-6 lakh for the hardware.” He added that the bigger (tax collection) centres such as Delhi and Mumbai would require an entry-level server that costs Rs4-5 lakh. The smaller centres could do with a high-end PC with 1GB RAM, he said. “The beauty of the software is that it is so efficient—it requires less hardware.”
The official did not put a number to the cost of rolling out the project across India but Open Software’s Gupta estimates that it could be around Rs15-20 crore. Implementation and maintenance of the software and hardware would be key (to the venture’s success), he added.
The official at the tax department said the idea for the software had actually come from a suggestion from finance minister Chidambaram that the department try and use information technology for non-intrusive investigation techniques.
“This software fills that need perfectly,” he added.