TY - GEN
T1 - ParaGraph
T2 - 2022 IEEE International Conference on Big Data, Big Data 2022
AU - Ostapuk, Natalia
AU - Difallah, Djellel
AU - Cudre-Mauroux, Philippe
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Bridging unstructured data with knowledge bases is an essential task in many problems related to natural language understanding. Traditionally, this task is considered in one direction only: linking entity mentions in a text to their counterpart in a knowledge base (also known as entity linking). In this paper, we propose to tackle this problem from a different angle: linking entities from a knowledge base to paragraphs describing those entities. We argue that such a new perspective can be beneficial to several applications, including information retrieval, knowledge base population, and joint entity and word embedding. We present a transformer-based model, ParaGraph, which, given a Wikidata entity as input, retrieves its corresponding Wikipedia section. To perform this task, ParaGraph first generates an entity summary and compares it to sections to select an initial set of candidates. The candidates are then ranked using additional information from the entity's textual description and contextual information. Our experimental results show that ParaGraph achieves 87% Hits@10 when ranking Wikipedia sections given a Wikidata entity as input. The obtained results show that ParaGraph can reduce the information gap between Wikipedia-based entities and tail entities and demonstrate the effectiveness of our proposed approach towards linking knowledge graph entities to their text counterparts.
AB - Bridging unstructured data with knowledge bases is an essential task in many problems related to natural language understanding. Traditionally, this task is considered in one direction only: linking entity mentions in a text to their counterpart in a knowledge base (also known as entity linking). In this paper, we propose to tackle this problem from a different angle: linking entities from a knowledge base to paragraphs describing those entities. We argue that such a new perspective can be beneficial to several applications, including information retrieval, knowledge base population, and joint entity and word embedding. We present a transformer-based model, ParaGraph, which, given a Wikidata entity as input, retrieves its corresponding Wikipedia section. To perform this task, ParaGraph first generates an entity summary and compares it to sections to select an initial set of candidates. The candidates are then ranked using additional information from the entity's textual description and contextual information. Our experimental results show that ParaGraph achieves 87% Hits@10 when ranking Wikipedia sections given a Wikidata entity as input. The obtained results show that ParaGraph can reduce the information gap between Wikipedia-based entities and tail entities and demonstrate the effectiveness of our proposed approach towards linking knowledge graph entities to their text counterparts.
KW - Entity Linking
KW - Knowledge Graphs
KW - Linked Data
UR - http://www.scopus.com/inward/record.url?scp=85147957392&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85147957392&partnerID=8YFLogxK
U2 - 10.1109/BigData55660.2022.10020207
DO - 10.1109/BigData55660.2022.10020207
M3 - Conference contribution
AN - SCOPUS:85147957392
T3 - Proceedings - 2022 IEEE International Conference on Big Data, Big Data 2022
SP - 6008
EP - 6017
BT - Proceedings - 2022 IEEE International Conference on Big Data, Big Data 2022
A2 - Tsumoto, Shusaku
A2 - Ohsawa, Yukio
A2 - Chen, Lei
A2 - Van den Poel, Dirk
A2 - Hu, Xiaohua
A2 - Motomura, Yoichi
A2 - Takagi, Takuya
A2 - Wu, Lingfei
A2 - Xie, Ying
A2 - Abe, Akihiro
A2 - Raghavan, Vijay
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 17 December 2022 through 20 December 2022
ER -