Stanford University
To develop practical techniques for allowing researchers to extract aggregate statistics from large datasets while protecting the privacy of information contained in individual entries
Datasets that contain private or proprietary information pose a vexing problem. On the one hand, we want to make these datasets available for statistical studies and other research. On the other, we want to protect the privacy of those people or firms referenced in the data. Effective solutions to the problem of how to maximize the usefulness of data while at the same time ensuring privacy have proven elusive. Sheltering datasets off in data enclaves does a good job of protecting proprietary information, but severely restricts their availability for research and inhibits the reproducibility of results. Releasing "anonymized" versions of the data-scrubbed of the private information-greatly increases its accessibility to researchers, but often results in data that is not useful or that can be combined with other public data to "reverse engineer" the removed private information. Funds from this grant support the work of Stanford University's Cynthia Dwork, who is developing methods for accessing data that both maximizes its usefulness to researchers and ensures the privacy and confidentiality of sensitive information in the data. Dwork's primary insight is the development of a precise mathematical definition she calls "differential privacy", which maintains that a data access system assures differential privacy if the outcome of any admissible analysis is essentially independent of whether or not any given individual's information is included in the dataset, and her work has already shown mathematically that several useful data release mechanisms can ensure privacy in this sense. Her work has the potential to become the basis for new ways of exploring sensitive data that could revolutionize empirical research in the social sciences.