TY - GEN
T1 - Interactive Audience Expansion on Large Scale Online Visitor Data
AU - Chan, Gromit Yeuk Yin
AU - Mai, Tung
AU - Rao, Anup B.
AU - Rossi, Ryan A.
AU - Du, Fan
AU - Silva, Cláudio T.
AU - Freire, Juliana
N1 - Funding Information:
This work was supported in part by: NSF awards CNS-1229185, CCF-1533564, CNS-1544753, CNS-1730396, CNS-1828576, CNS-1626098; and the NVIDIA NVAIL at NYU. We also thank Nvidia Corporation for donating GPUs used in this research.
Publisher Copyright:
© 2021 ACM.
PY - 2021/8/14
Y1 - 2021/8/14
N2 - Online marketing platforms often store millions of website visitors' behavior as a large sparse matrix with rows as visitors and columns as behavior. These platforms allow marketers to conduct Audience Expansion, a technique to identify new audiences with similar behavior to the original target audiences. In this paper, we propose a method to achieve interactive Audience Expansion from millions of visitor data efficiently. Unlike other methods that undergo significant computations upon inputs, our approach provides interactive responses when a marketer inputs the target audiences and similarity measures. The idea is to apply data summarization technique on the large visitor matrix to obtain a small set of summaries representing the similarities in the matrix. We propose efficient algorithms to compute the data summaries on a distributed computing environment (i.e., Spark) and conduct the expansion using the summaries. Our experiment shows that our approach (1) provides 10 times more accurate and 27 times faster Audience Expansion results on real datasets and (2) achieves a 98% speed-up compared to straightforward data summarization implementations. We also present an interface to apply the algorithm for real-world scenarios.
AB - Online marketing platforms often store millions of website visitors' behavior as a large sparse matrix with rows as visitors and columns as behavior. These platforms allow marketers to conduct Audience Expansion, a technique to identify new audiences with similar behavior to the original target audiences. In this paper, we propose a method to achieve interactive Audience Expansion from millions of visitor data efficiently. Unlike other methods that undergo significant computations upon inputs, our approach provides interactive responses when a marketer inputs the target audiences and similarity measures. The idea is to apply data summarization technique on the large visitor matrix to obtain a small set of summaries representing the similarities in the matrix. We propose efficient algorithms to compute the data summaries on a distributed computing environment (i.e., Spark) and conduct the expansion using the summaries. Our experiment shows that our approach (1) provides 10 times more accurate and 27 times faster Audience Expansion results on real datasets and (2) achieves a 98% speed-up compared to straightforward data summarization implementations. We also present an interface to apply the algorithm for real-world scenarios.
KW - interactive audience expansion
KW - look-alike modeling
UR - http://www.scopus.com/inward/record.url?scp=85114947882&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85114947882&partnerID=8YFLogxK
U2 - 10.1145/3447548.3467179
DO - 10.1145/3447548.3467179
M3 - Conference contribution
AN - SCOPUS:85114947882
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 2621
EP - 2631
BT - KDD 2021 - Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PB - Association for Computing Machinery
T2 - 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2021
Y2 - 14 August 2021 through 18 August 2021
ER -