TY - JOUR
T1 - Probabilistic Belief Embedding for Large-Scale Knowledge Population
AU - Fan, Miao
AU - Zhou, Qiang
AU - Abel, Andrew
AU - Zheng, Thomas Fang
AU - Grishman, Ralph
N1 - Funding Information:
The paper is dedicated to all the members of CSLT ( http://cslt.riit.tsinghua.edu.cn/ ) and Proteus Group ( http://nlp.cs.nyu.edu/index.shtml ). It was supported by National Program on Key Basic Research Project (973 Program) Under Grant 2013CB329304, National Science Foundation of China (NSFC) Under Grant Nos. 61433018 and 61373075, and Chinese Scholarship Council, when the first author was a joint-supervision Ph.D. candidate of Tsinghua University and New York University.
Publisher Copyright:
© 2016, Springer Science+Business Media New York.
PY - 2016/12/1
Y1 - 2016/12/1
N2 - Background: To populate knowledge repositories, such as WordNet, Freebase and NELL, two branches of research have grown separately for decades. On the one hand, corpus-based methods which leverage unstructured free texts have been explored for years; on the other hand, some recently emerged embedding-based approaches use structured knowledge graphs to learn distributed representations of entities and relations. But there are still few comprehensive and elegant models that can integrate those large-scale heterogeneous resources to satisfy multiple subtasks of knowledge population including entity inference, relation prediction and triplet classification. Methods: This paper contributes a novel embedding model which estimates the probability of each candidate belief in a large-scale knowledge repository via simultaneously learning distributed representations for entities (h and t), relations (r) and the words in relation mentions (m). It facilitates knowledge population by means of simple vector operations to discover new beliefs. Given an imperfect belief, we can not only infer the missing entities and predict the unknown relations, but also identify the plausibility of the belief, just by leveraging the learned embeddings of remaining evidence. Results: To demonstrate the scalability and the effectiveness of our model, experiments have been conducted on several large-scale repositories which contain millions of beliefs from WordNet, Freebase and NELL, and the results are compared with other cutting-edge approaches via comparing the performance assessed by the tasks of entity inference, relation prediction and triplet classification with their respective metrics. Extensive experimental results show that the proposed model outperforms the state of the arts with significant improvements. Conclusions: The essence of the improvements comes from the capability of our model that encodes not only structured knowledge graph information, but also unstructured relation mentions, into continuous vector spaces, so that we can bridge the gap of one-hot representations, and expect to discover certain relevance among entities, relations and even words in relation mentions.
AB - Background: To populate knowledge repositories, such as WordNet, Freebase and NELL, two branches of research have grown separately for decades. On the one hand, corpus-based methods which leverage unstructured free texts have been explored for years; on the other hand, some recently emerged embedding-based approaches use structured knowledge graphs to learn distributed representations of entities and relations. But there are still few comprehensive and elegant models that can integrate those large-scale heterogeneous resources to satisfy multiple subtasks of knowledge population including entity inference, relation prediction and triplet classification. Methods: This paper contributes a novel embedding model which estimates the probability of each candidate belief in a large-scale knowledge repository via simultaneously learning distributed representations for entities (h and t), relations (r) and the words in relation mentions (m). It facilitates knowledge population by means of simple vector operations to discover new beliefs. Given an imperfect belief, we can not only infer the missing entities and predict the unknown relations, but also identify the plausibility of the belief, just by leveraging the learned embeddings of remaining evidence. Results: To demonstrate the scalability and the effectiveness of our model, experiments have been conducted on several large-scale repositories which contain millions of beliefs from WordNet, Freebase and NELL, and the results are compared with other cutting-edge approaches via comparing the performance assessed by the tasks of entity inference, relation prediction and triplet classification with their respective metrics. Extensive experimental results show that the proposed model outperforms the state of the arts with significant improvements. Conclusions: The essence of the improvements comes from the capability of our model that encodes not only structured knowledge graph information, but also unstructured relation mentions, into continuous vector spaces, so that we can bridge the gap of one-hot representations, and expect to discover certain relevance among entities, relations and even words in relation mentions.
KW - Belief embedding
KW - Entity inference
KW - Knowledge population
KW - Relation prediction
KW - Triplet classification
UR - http://www.scopus.com/inward/record.url?scp=84981163507&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84981163507&partnerID=8YFLogxK
U2 - 10.1007/s12559-016-9425-5
DO - 10.1007/s12559-016-9425-5
M3 - Article
AN - SCOPUS:84981163507
SN - 1866-9956
VL - 8
SP - 1087
EP - 1102
JO - Cognitive Computation
JF - Cognitive Computation
IS - 6
ER -