Using word embeddings in abstracts to accelerate metallocene catalysis polymerization research

David Ho, Albert S. Shkolnik, Neil J. Ferraro, Benjamin A. Rizkin, Ryan L. Hartman

Research output: Contribution to journalArticlepeer-review


Natural language processing (NLP) and word embeddings trained neural networks were investigated as a more efficient method to extract useful information on catalytic polymerizations. Thousands of abstracts on metallocene-catalyzed polymerizations were accessed through journal Application Programming Interfaces. These abstracts were then used to create a group of related models to produce word embeddings, making use of the word2vec algorithm. This algorithm turns vocabulary into high dimensional vectors using unsupervised training. These vectors can then be used to show relationships between chemicals, suggest catalysts and activators combinations, understand acronyms, and categorize chemical compounds based on their reagent classification. We hypothesize that one can determine which areas of metallocene catalysis are understudied by comparing the predicted abstract and catalysts combinations with those found in existing abstracts, thereby guiding research to major breakthroughs as scientific literature continues to grow.

Original languageEnglish (US)
Article number107026
JournalComputers and Chemical Engineering
StatePublished - Oct 4 2020


  • Machine learning
  • Metallocene catalysis
  • Natural language
  • Polymerization
  • Word embeddings

ASJC Scopus subject areas

  • General Chemical Engineering
  • Computer Science Applications


Dive into the research topics of 'Using word embeddings in abstracts to accelerate metallocene catalysis polymerization research'. Together they form a unique fingerprint.

Cite this