Using word embeddings in abstracts to accelerate metallocene catalysis polymerization research

David Ho, Albert S. Shkolnik, Neil J. Ferraro, Benjamin A. Rizkin, Ryan L. Hartman

Research output: Contribution to journalArticle

Abstract

Natural language processing (NLP) and word embeddings trained neural networks were investigated as a more efficient method to extract useful information on catalytic polymerizations. Thousands of abstracts on metallocene-catalyzed polymerizations were accessed through journal Application Programming Interfaces. These abstracts were then used to create a group of related models to produce word embeddings, making use of the word2vec algorithm. This algorithm turns vocabulary into high dimensional vectors using unsupervised training. These vectors can then be used to show relationships between chemicals, suggest catalysts and activators combinations, understand acronyms, and categorize chemical compounds based on their reagent classification. We hypothesize that one can determine which areas of metallocene catalysis are understudied by comparing the predicted abstract and catalysts combinations with those found in existing abstracts, thereby guiding research to major breakthroughs as scientific literature continues to grow.

Original languageEnglish (US)
Article number107026
JournalComputers and Chemical Engineering
Volume141
DOIs
StatePublished - Oct 4 2020

Keywords

  • Machine learning
  • Metallocene catalysis
  • Natural language
  • Polymerization
  • Word embeddings

ASJC Scopus subject areas

  • Chemical Engineering(all)
  • Computer Science Applications

Fingerprint Dive into the research topics of 'Using word embeddings in abstracts to accelerate metallocene catalysis polymerization research'. Together they form a unique fingerprint.

  • Cite this