TY - GEN
T1 - Distribution-Agnostic Database De-Anonymization Under Synchronization Errors
AU - Bakirtas, Serhat
AU - Erkip, Elza
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - There has recently been an increased scientific in-terest in the de-anonymization of users in anonymized databases containing user-level microdata via multifarious matching strate-gies utilizing publicly available correlated data. Existing literature has either emphasized practical aspects where underlying data distribution is not required, with limited or no theoret-ical guarantees, or theoretical aspects with the assumption of complete availability of underlying distributions. In this work, we take a step towards reconciling these two lines of work by providing theoretical guarantees for the de-anonymization of random correlated databases without prior knowledge of data distribution. Motivated by time-indexed microdata, we consider database de-anonymization under both synchronization errors (column repetitions) and obfuscation (noise). By modifying the previously used replica detection algorithm to accommodate for the unknown underlying distribution, proposing a new seeded deletion detection algorithm, and employing statistical and information-theoretic tools, we derive sufficient conditions on the database growth rate for successful matching. Our findings demonstrate that a double-logarithmic seed size relative to row size ensures successful deletion detection. More importantly, we show that the derived sufficient conditions are the same as in the distribution-aware setting, negating any asymptotic loss of performance due to unknown underlying distributions.
AB - There has recently been an increased scientific in-terest in the de-anonymization of users in anonymized databases containing user-level microdata via multifarious matching strate-gies utilizing publicly available correlated data. Existing literature has either emphasized practical aspects where underlying data distribution is not required, with limited or no theoret-ical guarantees, or theoretical aspects with the assumption of complete availability of underlying distributions. In this work, we take a step towards reconciling these two lines of work by providing theoretical guarantees for the de-anonymization of random correlated databases without prior knowledge of data distribution. Motivated by time-indexed microdata, we consider database de-anonymization under both synchronization errors (column repetitions) and obfuscation (noise). By modifying the previously used replica detection algorithm to accommodate for the unknown underlying distribution, proposing a new seeded deletion detection algorithm, and employing statistical and information-theoretic tools, we derive sufficient conditions on the database growth rate for successful matching. Our findings demonstrate that a double-logarithmic seed size relative to row size ensures successful deletion detection. More importantly, we show that the derived sufficient conditions are the same as in the distribution-aware setting, negating any asymptotic loss of performance due to unknown underlying distributions.
KW - alignment
KW - database
KW - dataset
KW - de-anonymization
KW - distribution-agnostic
KW - matching
KW - obfuscation
KW - privacy
KW - synchronization
UR - http://www.scopus.com/inward/record.url?scp=85177466769&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85177466769&partnerID=8YFLogxK
U2 - 10.1109/WIFS58808.2023.10374831
DO - 10.1109/WIFS58808.2023.10374831
M3 - Conference contribution
AN - SCOPUS:85177466769
T3 - WIFS 2023 - IEEE Workshop on Information Forensics and Security
BT - WIFS 2023 - IEEE Workshop on Information Forensics and Security
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 IEEE International Workshop on Information Forensics and Security, WIFS 2023
Y2 - 4 December 2023 through 7 December 2023
ER -