Unsupervised extractive summarization using pointwise mutual information

Vishakh Padmakumar, He He

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Unsupervised approaches to extractive summarization usually rely on a notion of sentence importance defined by the semantic similarity between a sentence and the document. We propose new metrics of relevance and redundancy using pointwise mutual information (PMI) between sentences, which can be easily computed by a pre-trained language model. Intuitively, a relevant sentence allows readers to infer the document content (high PMI with the document), and a redundant sentence can be inferred from the summary (high PMI with the summary). We then develop a greedy sentence selection algorithm to maximize relevance and minimize redundancy of extracted sentences. We show that our method outperforms similarity-based methods on datasets in a range of domains including news, medical journal articles, and personal anecdotes.

Original languageEnglish (US)
Title of host publicationEACL 2021 - 16th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages2505-2512
Number of pages8
ISBN (Electronic)9781954085022
StatePublished - 2021
Event16th Conference of the European Chapter of the Associationfor Computational Linguistics, EACL 2021 - Virtual, Online
Duration: Apr 19 2021Apr 23 2021

Publication series

NameEACL 2021 - 16th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference

Conference

Conference16th Conference of the European Chapter of the Associationfor Computational Linguistics, EACL 2021
CityVirtual, Online
Period4/19/214/23/21

ASJC Scopus subject areas

  • Software
  • Computational Theory and Mathematics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Unsupervised extractive summarization using pointwise mutual information'. Together they form a unique fingerprint.

Cite this