Online database queries, like searching for the best-priced flight or hotel, and apps can reveal a surprising amount of information about us. Travel sites, for instance, are known to jack up the prices on flights whose routes are drawing an unusually high volume of queries. Also consider the case of cybersecurity expert Ran Dubin, a doctoral student in the Ben-Gurion University of the Negev (BGU) department of communication systems engineering, who wrote a machine-learning algorithm that could determine if someone had watched a specific video from a set of suspicious terror-related videos.
Intelligence agencies, Dubin pointed out in a 15 March release by BGU research, could access this technology to track terrorists or other suspicious individuals. Internet marketing companies could track the number and make-up of viewers watching an ad. While this information could be helpful, Dubin warned average YouTube users to be aware that their viewing history on YouTube and other Internet video platforms could be tracked.
Researchers from the Massachusetts Institute of Technology’s (MIT’s) Computer Science and Artificial Intelligence Laboratory and Stanford University believe they can address such privacy issues to some extent with the new encryption system at the ongoing three-day USENIX Symposium on Networked Systems Design and Implementation which ends today.
Called Splinter, the system disguises users’ database queries so that they reveal no private information. It is named so because it splits a query and distributes it across copies of the same database on multiple servers. The servers return results that make sense only when recombined according to a procedure that the user alone knows. As long as at least one of the servers can be trusted, it’s impossible for anyone other than the user to determine what query the servers executed. In a 23 March release, the researchers do acknowledge that if the site that hosts the database is itself collecting users’ data without their consent, the requirement of at least one trusted server is difficult to enforce.
Splinter, according to the researchers, uses a technique called function secret sharing, which was first described in a 2015 paper by a trio of Israeli computer scientists. Systems for disguising database queries have been proposed in the past, but function secret sharing could make them as much as 10 times faster. In experiments, the MIT and Stanford researchers found that Splinter could return a result from a database with millions of entries—including a duplicate of the Yelp database for selected cities—in about a second.
Function secret sharing converts a database query into a set of complementary mathematical functions—each of which is sent to a different database server. On each server, the function must be applied to every record in the database, failing which a hacker could determine what data the user is interested in. Every time the function is applied to a new record, it updates a value stored in memory. After it has been applied to the last record, the final value is returned to the user. But that value is meaningless until it’s combined with the values reported by the other servers.
In a similar attempt a year ago, researchers at MIT and Harvard University demonstrated the prowess of an application called Sieve, designed to prevent apps from collecting user data indiscriminately. With Sieve, an online user would store all of his or her personal data, in encrypted form, on the cloud. Any app that wanted to use specific data items would send a request to the user and receive a secret key that decrypted only those items. If the user wanted to revoke the app’s access, Sieve would re-encrypt the data with a new key.
Sieve required the researchers to develop practical versions of two cutting-edge cryptographic techniques called attribute-based encryption and key homomorphism.
With attribute-based encryption, data items in a file are assigned different labels, or “attributes”. After encryption, secret keys can be generated that unlock only particular combinations of attributes: name and zip code but not street name, for instance, or zip code and date of birth but not name. Key homomorphism is what enables Sieve to revoke an app’s access to a user’s data.
With key homomorphism, the cloud server can re-encrypt the data it’s storing without decrypting it first—or without sending it to the user for decryption, re-encryption, and re-uploading.
Of course, a system like Sieve requires the participation of app developers. As for users, prevention remains better than cure.
Cutting Edge is a monthly column that explores the melding of science and technology.