Making interval-based clustering rank-aware

Julia Stoyanovich, Sihem Amer-Yahia, Tova Milo

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    In online applications, such as online dating, users often query and rank large collections of structured items. Top results tend to be homogeneous, which hinders data exploration. For example, a dating website user who is looking for a partner between 20 and 40 years old, and who sorts the matches by income from higher to lower, will see a large number of matches in their late 30s who hold an MBA degree and work in the financial industry, before seeing any matches in different age groups and walks of life. An alternative to presenting results in a ranked list is to find clusters in the result space, identified by a combination of attributes that correlate with rank. Such clusters may describe matches between 35 and 40 with an MBA, matches between 25 and 30 who work in the software industry, etc., allowing for data exploration of ranked results. We refer to the problem of finding such clusters as rank-aware interval-based clustering and argue that it is not addressed by standard clustering algorithms. We formally define the problem and, to solve it, propose a novel measure of locality, together with a family of clustering quality measures appropriate for this application scenario. These ingredients may be used by a variety of clustering algorithms, and we present BARAC, a particular subspace-clustering algorithm that enables rank-aware interval-based clustering in domains with heterogeneous attributes. We validate the effectiveness of our approach with a large-scale user study, and perform an extensive experimental evaluation of efficiency, demonstrating that our methods are practical on the large scale. Our evaluation is performed on large datasets from Yahoo! Personals, a leading online dating site, and on restaurant data from Yahoo! Local.

    Original languageEnglish (US)
    Title of host publicationAdvances in Database Technology - EDBT 2011
    Subtitle of host publication14th International Conference on Extending Database Technology, Proceedings
    Pages437-448
    Number of pages12
    DOIs
    StatePublished - 2011
    Event14th International Conference on Extending Database Technology: Advances in Database Technology, EDBT 2011 - Uppsala, Sweden
    Duration: Mar 22 2011Mar 24 2011

    Publication series

    NameACM International Conference Proceeding Series

    Other

    Other14th International Conference on Extending Database Technology: Advances in Database Technology, EDBT 2011
    CountrySweden
    CityUppsala
    Period3/22/113/24/11

    Keywords

    • Clustering
    • Data exploration
    • Ranking

    ASJC Scopus subject areas

    • Software
    • Human-Computer Interaction
    • Computer Vision and Pattern Recognition
    • Computer Networks and Communications

    Fingerprint Dive into the research topics of 'Making interval-based clustering rank-aware'. Together they form a unique fingerprint.

  • Cite this

    Stoyanovich, J., Amer-Yahia, S., & Milo, T. (2011). Making interval-based clustering rank-aware. In Advances in Database Technology - EDBT 2011: 14th International Conference on Extending Database Technology, Proceedings (pp. 437-448). (ACM International Conference Proceeding Series). https://doi.org/10.1145/1951365.1951417