Bengaluru: Last year, for the first time ever, an international beauty contest was judged by machines. Thousands of people from across the world submitted their photos to Beauty.AI, hoping that their faces would be selected by an advanced algorithm free of human biases, in the process accurately defining what constitutes human beauty.
In preparation, the algorithm had studied hundreds of images of past beauty contests, training itself to recognize human beauty based on the winners. But what was supposed to be a breakthrough moment that would showcase the potential of modern self-learning, artificially intelligent algorithms rapidly turned into an embarrassment for the creators of Beauty.AI, as the algorithm picked the winners solely on the basis of skin colour.
“The algorithm made a fairly non-trivial correlation between skin colour and beauty. A classic example of bias creeping into an algorithm," says Nisheeth K. Vishnoi, an associate professor at the School of Computer and Communication Sciences at Switzerland-based École Polytechnique Fédérale de Lausanne (EPFL). He specializes in issues related to algorithmic bias.
A widely cited piece titled “Machine bias" from US-based investigative journalism organization ProPublica in 2016 highlighted another disturbing case.
It cited an incident involving a black teenager named Brisha Borden who was arrested for riding an unlocked bicycle she found on the road. The police estimated the value of the item was about $80.
In a separate incident, a 41-year-old Caucasian man named Vernon Prater was arrested for shoplifting goods worth roughly the same amount. Unlike Borden, Prater had a prior criminal record and had already served prison time.
Yet, when Borden and Prater were brought for sentencing, a self-learning program determined Borden was more likely to commit future crimes than Prater—exhibiting the sort of racial bias computers were not supposed to have. Two years later, it was proved wrong when Prater was charged with another crime, while Borden’s record remained clean.
And who can forget Tay, the infamous “racist chatbot" that Microsoft Corp. developed last year?
Even as artificial intelligence and machine learning continue to break new ground, there is enough evidence to indicate how easy it is for bias to creep into even the most advanced algorithms. Given the extent to which these algorithms are capable of building deeply personal profiles about us from relatively trivial information, the impact that this can have on personal privacy is significant.
This issue caught the attention of the US government, which in October 2016 published a comprehensive report titled “Preparing for the future of artificial intelligence", turning the spotlight on the issue of algorithmic bias. It raised concerns about how machine-learning algorithms can discriminate against people or sets of people based on the personal profiles they develop of all of us.
“If a machine learning model is used to screen job applicants, and if the data used to train the model reflects past decisions that are biased, the result could be to perpetuate past bias. For example, looking for candidates who resemble past hires may bias a system toward hiring more people like those already on a team, rather than considering the best candidates across the full diversity of potential applicants," the report says.
“The difficulty of understanding machine learning results is at odds with the common misconception that complex algorithms always do what their designers choose to have them do, and therefore that bias will creep into an algorithm if and only if its developers themselves suffer from conscious or unconscious bias. It is certainly true that a technology developer who wants to produce a biased algorithm can do so, and that unconscious bias may cause practitioners to apply insufficient effort to preventing bias," it says.
Over the years, social media platforms have been using similar self-learning algorithms to personalize their services, offering content better suited to the preferences of their users—based solely on their past behaviour on the site in terms of what they “liked" or the links they clicked on.
“What you are seeing on platforms such as Google or Facebook is extreme personalization—which is basically when the algorithm realizes that you prefer one option over another. Maybe you have a slight bias towards (US President Donald) Trump versus Hillary (Clinton) or (Prime Minister Narendra) Modi versus other opponents—that’s when you get to see more and more articles which are confirming your bias. The trouble is that as you see more and more such articles, it actually influences your views," says EPFL’s Vishnoi.
“The opinions of human beings are malleable. The US election is a great example of how algorithmic bots were used to influence some of these very important historical events of mankind," he adds, referring to the impact of “fake news" on recent global events.
Experts, however, believe that these algorithms are rarely the product of malice. “It’s just a product of careless algorithm design," says Elisa Celis, a senior researcher along with Vishnoi at EPFL.
How does one detect bias in an algorithm? “It bears mentioning that machine learning-algorithms and neural networks are designed to function without human involvement. Even the most skilled data scientist has no way to predict how his algorithms will process the data provided to them," said Mint columnist and lawyer Rahul Matthan in a recent research paper on the issue of data privacy published by the Takshashila Institute, titled “Beyond consent: A new paradigm for data protection".
One solution is “black-box testing", which determines whether an algorithm is working as effectively as it should without peering into its internal structure. “In a black-box audit, the actual algorithms of the data controllers are not reviewed. Instead, the audit compares the input algorithm to the resulting output to verify that the algorithm is in fact performing in a privacy-preserving manner. This mechanism is designed to strike a balance between the auditability of the algorithm on the one hand and the need to preserve proprietary advantage of the data controller on the other. Data controllers should be mandated to make themselves and their algorithms accessible for a black box audit," says Matthan, who is also a fellow with Takshashila’s technology and policy research programme.
He suggests the creation of a class of technically skilled personnel or “learned intermediaries" whose sole job will be to protect data rights. “Learned intermediaries will be technical personnel trained to evaluate the output of machine-learning algorithms and detect bias on the margins and legitimate auditors who must conduct periodic reviews of the data algorithms with the objective of making them stronger and more privacy protective. They should be capable of indicating appropriate remedial measures if they detect bias in an algorithm. For instance, a learned intermediary can introduce an appropriate amount of noise into the processing so that any bias caused over time due to a set pattern is fuzzed out," Matthan explains.
That said there still remain significant challenges in removing the bias once discovered.
“If you are talking about removing biases from algorithms and developing appropriate solutions, this is an area that is still largely in the hands of academia—and removed from the broader industry. It will take time for the industry to adopt these solutions on a larger scale," says Animesh Mukherjee, an associate professor at the Indian Institute of Technology, Kharagpur, who specializes in areas such as natural language processing and complex algorithms.
This is the first in a four-part series. The next part will focus on consent as the basis of privacy protection.
A nine-judge Constitution bench of the Supreme Court is currently deliberating whether or not Indian citizens have the right to privacy. At the same time, the government has appointed a committee under the chairmanship of retired Supreme Court judge B.N. Srikrishna to formulate a data protection law for the country. Against this backdrop, a new discussion paper from the Takshashila Institute has proposed a model of privacy particularly suited for a data-intense world. Over the course of this week we will take a deeper look at that model and why we need a new paradigm for privacy. In that context, we examine the increasing reliance on software to make decisions for us, assuming that dispassionate algorithms will ensure a level of fairness that we are denied because of human frailties. But algorithms have their own shortcomings—and those can pose a serious threat to our personal privacy.