Probabilistic Belief Embedding for Large-Scale Knowledge Population

Miao Fan, Qiang Zhou, Andrew Abel, Thomas Fang Zheng, Ralph Grishman

Research output: Contribution to journalArticlepeer-review


Background: To populate knowledge repositories, such as WordNet, Freebase and NELL, two branches of research have grown separately for decades. On the one hand, corpus-based methods which leverage unstructured free texts have been explored for years; on the other hand, some recently emerged embedding-based approaches use structured knowledge graphs to learn distributed representations of entities and relations. But there are still few comprehensive and elegant models that can integrate those large-scale heterogeneous resources to satisfy multiple subtasks of knowledge population including entity inference, relation prediction and triplet classification. Methods: This paper contributes a novel embedding model which estimates the probability of each candidate belief <h,r,t,m> in a large-scale knowledge repository via simultaneously learning distributed representations for entities (h and t), relations (r) and the words in relation mentions (m). It facilitates knowledge population by means of simple vector operations to discover new beliefs. Given an imperfect belief, we can not only infer the missing entities and predict the unknown relations, but also identify the plausibility of the belief, just by leveraging the learned embeddings of remaining evidence. Results: To demonstrate the scalability and the effectiveness of our model, experiments have been conducted on several large-scale repositories which contain millions of beliefs from WordNet, Freebase and NELL, and the results are compared with other cutting-edge approaches via comparing the performance assessed by the tasks of entity inference, relation prediction and triplet classification with their respective metrics. Extensive experimental results show that the proposed model outperforms the state of the arts with significant improvements. Conclusions: The essence of the improvements comes from the capability of our model that encodes not only structured knowledge graph information, but also unstructured relation mentions, into continuous vector spaces, so that we can bridge the gap of one-hot representations, and expect to discover certain relevance among entities, relations and even words in relation mentions.

Original languageEnglish (US)
Pages (from-to)1087-1102
Number of pages16
JournalCognitive Computation
Issue number6
StatePublished - Dec 1 2016


  • Belief embedding
  • Entity inference
  • Knowledge population
  • Relation prediction
  • Triplet classification

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Computer Science Applications
  • Cognitive Neuroscience


Dive into the research topics of 'Probabilistic Belief Embedding for Large-Scale Knowledge Population'. Together they form a unique fingerprint.

Cite this