CrunchQA: A Synthetic Dataset for Question Answering over Crunchbase Knowledge Graph

Lifan Yu, Nadya Abdel Madjid, Djellel Difallah

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The digital transformation in the finance and enterprise sector has been driven by the advances made in big data and artificial intelligence technologies. For instance, data integration enables businesses to make better decisions by consolidating and mining heterogeneous data repositories. In particular, knowledge graphs (KGs) are used to facilitate the integration of disparate data sources and can be utilized to answer complex queries. This work proposes a new dataset for question-answering on knowledge graphs (KGQA) to reflect the challenges we identified in real-world applications which are not covered by existing benchmarks, namely, multi-hop constraints, numeric and literal embeddings, ranking, reification, and hyper-relations. To build the dataset, we create a new Knowledge Graph from the Crunchbase database using a lightweight schema to support high-quality entity embeddings in large graphs. Next, we create a Question Answering dataset based on natural language question generation using predefined multiple-hop templates and paraphrasing. Finally, we conduct extensive experiments with state-of-the-art KGQA models and compare their performance on CrunchQA. The results show that the existing models do not perform well, for example, on multi-hop constrained queries. Hence, CrunchQA can be used as a challenging benchmark dataset for future KGQA reasoning models. The dataset and scripts are available on the project repository.

Original languageEnglish (US)
Title of host publicationProceedings - 2022 IEEE International Conference on Big Data, Big Data 2022
EditorsShusaku Tsumoto, Yukio Ohsawa, Lei Chen, Dirk Van den Poel, Xiaohua Hu, Yoichi Motomura, Takuya Takagi, Lingfei Wu, Ying Xie, Akihiro Abe, Vijay Raghavan
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages4635-4641
Number of pages7
ISBN (Electronic)9781665480451
DOIs
StatePublished - 2022
Event2022 IEEE International Conference on Big Data, Big Data 2022 - Osaka, Japan
Duration: Dec 17 2022Dec 20 2022

Publication series

NameProceedings - 2022 IEEE International Conference on Big Data, Big Data 2022

Conference

Conference2022 IEEE International Conference on Big Data, Big Data 2022
Country/TerritoryJapan
CityOsaka
Period12/17/2212/20/22

Keywords

  • Enterprise KB
  • Graph Embedding
  • Knowledge Graph
  • Question Answering

ASJC Scopus subject areas

  • Modeling and Simulation
  • Computer Networks and Communications
  • Information Systems
  • Information Systems and Management
  • Safety, Risk, Reliability and Quality
  • Control and Optimization

Fingerprint

Dive into the research topics of 'CrunchQA: A Synthetic Dataset for Question Answering over Crunchbase Knowledge Graph'. Together they form a unique fingerprint.

Cite this