TY - JOUR
T1 - User Behavior Fingerprinting with Multi-Item-Sets and Its Application in IPTV Viewer Identification
AU - Yang, Can
AU - Wang, Lan
AU - Cao, Houwei
AU - Yuan, Qihu
AU - Liu, Yong
N1 - Funding Information:
Manuscript received June 29, 2020; revised November 30, 2020 and January 16, 2021; accepted January 19, 2021. Date of publication January 29, 2021; date of current version March 12, 2021. This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant U1611461 and Grant 61876065, in part by the Guangzhou Science and Technology Program Key Projects, Guangdong, China, under Grant 201704030124, and in part by the Science and Technology Plan Project of Guangdong, China, under Grant 2014B010115002. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Pedro Comesana. (Corresponding author: Can Yang.) Can Yang and Lan Wang are with the College of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, China (e-mail: [email protected]).
Publisher Copyright:
© 2005-2012 IEEE.
PY - 2021
Y1 - 2021
N2 - User activities in cyberspace leave unique traces for user identification (UI). Individual users can be identified by their frequent activity items through statistical feature matching. However, such approaches face the data sparsity problem. In this paper, we propose to address this problem by multi-item-set fingerprinting that identifies users not only based on their frequent individual activity items, but also their frequent consecutive item sequences with different lengths. We also propose a new similarity metric between fingerprint vectors that combines the advantages of Jaccard distance and relative entropy distance. Furthermore, we develop a fusion decision scheme by consolidating matching candidates generated by different similarity metrics. It improves the precision at the price of extra rejection. Our proposed approaches can be used in both one-by-one matching and bipartite graph group matching. Through extensive experiments on three real user datasets, in particular a large-scale Internet Protocol Television (IPTV) viewer dataset, we demonstrate that the proposed approaches outperform the state-of-the-art methods. The average matching precision reaches 93.8% for a dataset of 1,000 users and 100% for a dataset of 100 users. This work is of significance for information forensics and raises a new challenge for human privacy protection in cyberspace.
AB - User activities in cyberspace leave unique traces for user identification (UI). Individual users can be identified by their frequent activity items through statistical feature matching. However, such approaches face the data sparsity problem. In this paper, we propose to address this problem by multi-item-set fingerprinting that identifies users not only based on their frequent individual activity items, but also their frequent consecutive item sequences with different lengths. We also propose a new similarity metric between fingerprint vectors that combines the advantages of Jaccard distance and relative entropy distance. Furthermore, we develop a fusion decision scheme by consolidating matching candidates generated by different similarity metrics. It improves the precision at the price of extra rejection. Our proposed approaches can be used in both one-by-one matching and bipartite graph group matching. Through extensive experiments on three real user datasets, in particular a large-scale Internet Protocol Television (IPTV) viewer dataset, we demonstrate that the proposed approaches outperform the state-of-the-art methods. The average matching precision reaches 93.8% for a dataset of 1,000 users and 100% for a dataset of 100 users. This work is of significance for information forensics and raises a new challenge for human privacy protection in cyberspace.
KW - IPTV
KW - User identification
KW - deanonymization
KW - frequent item set
KW - pattern recognition
KW - statistical feature matching
KW - user behaviors
KW - user identification
UR - http://www.scopus.com/inward/record.url?scp=85100510872&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85100510872&partnerID=8YFLogxK
U2 - 10.1109/TIFS.2021.3055638
DO - 10.1109/TIFS.2021.3055638
M3 - Article
AN - SCOPUS:85100510872
SN - 1556-6013
VL - 16
SP - 2667
EP - 2682
JO - IEEE Transactions on Information Forensics and Security
JF - IEEE Transactions on Information Forensics and Security
M1 - 9340396
ER -