University of Pennsylvania
To develop, analyze, and evaluate data science algorithms that provably protect privacy while avoiding overfitting and false discovery
This grant supports University of Pennsylvania computer scientist Aaron Roth in his work to develop, analyze, and evaluate “differentially private” algorithms for use in scientific discovery. First developed by mathematicians concerned about privacy, differentially private algorithms are ways of querying sensitive datasets. An algorithm or database query is “differentially private” if the results it returns would be provably the same even if an individual record were randomly replaced by another record in the queried dataset. Since the results such algorithms return do not depend on whether a given record is or is not included in the dataset, one cannot reverse engineer who is in the dataset from the results it generates. The privacy of the data is thereby protected. As it happens, this privacy protecting feature has uses outside the concern to protect privacy. Differentially private algorithms also prevent data mining and overfitting. Since differentially private algorithms produce the same results regardless of whether a given observation is randomly replaced by another, it is difficult to use them to craft results tailored to the particularities of the data you happen to have collected. At present, however, differentially private algorithms are more exciting in theory than in practice. They tend to be laborious and slow. What’s needed is further development and testing of such algorithms with scientific applications in mind. Dr. Roth is working on just such an approach, trying to develop practical applications of differentially private algorithms that are streamlined and reliable enough to be used in everyday scientific practice and analysis.