Database Matching Under Noisy Synchronization Errors

Serhat Bakirtas, Elza Erkip

Research output: Contribution to journalArticlepeer-review

Abstract

The re- identification or de-anonymization of users from anonymized data through matching with publicly available correlated user data has raised privacy concerns, leading to the complementary measure of obfuscation in addition to anonymization. Recent research provides a fundamental understanding of the conditions under which privacy attacks, in the form of database matching, are successful in the presence of obfuscation. Motivated by synchronization errors stemming from the sampling of time-indexed databases, this paper presents a unified framework considering both obfuscation and synchronization errors and investigates the matching of databases under noisy entry repetitions. By investigating different structures for the repetition pattern, replica detection and seeded deletion detection algorithms are devised and sufficient and necessary conditions for successful matching are derived. Finally, the impacts of some variations of the underlying assumptions, such as the adversarial deletion model, seedless database matching, and zero-rate regime, on the results are discussed. Overall, our results provide insights into the privacy-preserving publication of anonymized and obfuscated time-indexed data as well as the closely related problem of the capacity of synchronization channels.

Original languageEnglish (US)
Pages (from-to)4335-4367
Number of pages33
JournalIEEE Transactions on Information Theory
Volume70
Issue number6
DOIs
StatePublished - Jun 1 2024

Keywords

  • Dataset
  • alignment
  • data
  • database
  • de-anonymization
  • matching
  • privacy
  • recovery
  • synchronization

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'Database Matching Under Noisy Synchronization Errors'. Together they form a unique fingerprint.

Cite this