Artificial intelligence (AI) algorithms are being built and trained to perform a wide variety of tasksСrecognizing faces, identifying objects in photos, processing natural language by extracting concepts from text. Once a system is built and trained, however, how do we know how well it performs relative to other such systems? How do we know if the data used to train the system reflect the context in which the system will be used? To answer these questions, we need to scrutinize the training datasets that are used to construct AI systems, and the benchmarking datasets against which these systems are assessed. This grant supports work by Meredith Whittaker and Kate CrawfordСthe co-founders of the AI Now Institute at New York UniversityСand NYU Law professor Jason Schultz. Over the course of three years, Whittaker, Crawford, Schultz, and their team will dig deeply into the history, design, and technical details of some of the most foundational AI datasets, investigating where they came from, how they have evolved, and how they have been used over time. They will use these findings to catalyze a broader conversation about how to understand and appropriately govern the AI systems that are informed by these datasets. The grant outputs will include multiple papers produced for both academic and lay audiences, visualizations of the provenance and uses of specific datasets, and workshops that will bring together the growing community of researchers studying the data that underpins AI research.