Anonymizing NYC taxi data: Does it matter?

Marie Douriez, Harish Doraiswamy, Juliana Freire, Claudio T. Silva

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The widespread use of location-based services has led to an increasing availability of trajectory data from urban environments. These data carry rich information that are useful for improving cities through traffic management and city planning. Yet, it also contains information about individuals which can jeopardize their privacy. In this study, we work with the New York City (NYC) taxi trips data set publicly released by the Taxi and Limousine Commission (TLC). This data set contains information about every taxi cab ride that happened in NYC. A bad hashing of the medallion numbers (the ID corresponding to a taxi) allowed the recovery of all the medallion numbers and led to a privacy breach for the drivers, whose income could be easily extracted. In this work, we initiate a study to evaluate whether 'perfect' anonymity is possible and if such an identity disclosure can be avoided given the availability of diverse sets of external data sets through which the hidden information can be recovered. This is accomplished through a spatio-Temporal join based attack which matches the taxi data with an external medallion data that can be easily gathered by an adversary. Using a simulation of the medallion data, we show that our attack can re-identify over 91% of the taxis that ply in NYC even when using a perfect pseudonymization of medallion numbers. We also explore the effectiveness of trajectory anonymization strategies and demonstrate that our attack can still identify a significant fraction of the taxis in NYC. Given the restrictions in publishing the taxi data by TLC, our results indicate that unless the utility of the data set is significantly compromised, it will not be possible to maintain the privacy of taxi medallion owners and drivers.

Original languageEnglish (US)
Title of host publicationProceedings - 3rd IEEE International Conference on Data Science and Advanced Analytics, DSAA 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages140-148
Number of pages9
ISBN (Electronic)9781509052066
DOIs
StatePublished - Dec 22 2016
Event3rd IEEE International Conference on Data Science and Advanced Analytics, DSAA 2016 - Montreal, Canada
Duration: Oct 17 2016Oct 19 2016

Publication series

NameProceedings - 3rd IEEE International Conference on Data Science and Advanced Analytics, DSAA 2016

Other

Other3rd IEEE International Conference on Data Science and Advanced Analytics, DSAA 2016
Country/TerritoryCanada
CityMontreal
Period10/17/1610/19/16

Keywords

  • Privacy attacks
  • Spatio-Temporal data
  • Taxi data
  • Trajectory privacy

ASJC Scopus subject areas

  • Information Systems
  • Information Systems and Management
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Anonymizing NYC taxi data: Does it matter?'. Together they form a unique fingerprint.

Cite this