Discriminative topic segmentation of text and speech

Mehryar Mohri, Pedro Moreno, Eugene Weinstein

Research output: Contribution to journalConference articlepeer-review


We explore automated discovery of topicallycoherent segments in speech or text sequences. We give two new discriminative topic segmentation algorithms which employ a new measure of text similarity based on word co-occurrence. Both algorithms function by finding extrema in the similarity signal over the text, with the latter algorithm using a compact support-vector based description of a window of text or speech observations in word similarity space to overcome noise introduced by speech recognition errors and off-topic content. In experiments over speech and text news streams, we show that these algorithms outperform previous methods. We observe that topic segmentation of speech recognizer output is a more difficult problem than that of text streams; however, we demonstrate that by using a lattice of competing hypotheses rather than just the one-best hypothesis as input to the segmentation algorithm, the performance of the algorithm can be improved.

Original languageEnglish (US)
Pages (from-to)533-540
Number of pages8
JournalJournal of Machine Learning Research
StatePublished - 2010
Event13th International Conference on Artificial Intelligence and Statistics, AISTATS 2010 - Sardinia, Italy
Duration: May 13 2010May 15 2010

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Statistics and Probability
  • Artificial Intelligence


Dive into the research topics of 'Discriminative topic segmentation of text and speech'. Together they form a unique fingerprint.

Cite this