EXACTA: Explainable Column Annotation

Yikun Xian, Handong Zhao, Tak Yeon Lee, Sungchul Kim, Ryan Rossi, Zuohui Fu, Gerard De Melo, S. Muthukrishnan

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Column annotation, the process of annotating tabular columns with labels, plays a fundamental role in digital marketing data governance. It has a direct impact on how customers manage their data and facilitates compliance with regulations, restrictions, and policies applicable to data use. Despite substantial gains in accuracy brought by recent deep learning-driven column annotation methods, their incapability of explaining why columns are matched with particular target labels has drawn concern, due to the black-box nature of deep neural networks. Such explainability is of particular importance in industrial marketing scenarios, where data stewards need to quickly verify and calibrate the annotation results to ascertain the correctness of downstream applications. This work sheds new light on the explainable column annotation problem, the first of its kind column annotation task. To achieve this, we propose a new approach called EXACTA, which conducts multi-hop knowledge graph reasoning using inverse reinforcement learning to find a path from a column to a potential target label while ensuring both annotation performance and explainability. We experiment on four benchmarks, both publicly available and real-world ones, and undertake a comprehensive analysis on the explainability. The results suggest that our method not only provides competitive annotation performance compared with existing deep learning-based models, but more importantly, produces faithfully explainable paths for annotated columns to facilitate human examination.

    Original languageEnglish (US)
    Title of host publicationKDD 2021 - Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
    PublisherAssociation for Computing Machinery
    Pages3775-3785
    Number of pages11
    ISBN (Electronic)9781450383325
    DOIs
    StatePublished - Aug 14 2021
    Event27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2021 - Virtual, Online, Singapore
    Duration: Aug 14 2021Aug 18 2021

    Publication series

    NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

    Conference

    Conference27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2021
    Country/TerritorySingapore
    CityVirtual, Online
    Period8/14/218/18/21

    Keywords

    • column annotation
    • explainability
    • knowledge graph reasoning

    ASJC Scopus subject areas

    • Software
    • Information Systems

    Fingerprint

    Dive into the research topics of 'EXACTA: Explainable Column Annotation'. Together they form a unique fingerprint.

    Cite this