Speech segmentation and spoken document processing

Mari Ostendorf, Benoit Favre, Ralph Grishman, Dilek Hakkani-Tür, Mary Harper, Dustin Hillard, Julia Hirschberg, Heng Ji, Jeremy G. Kahn, Yang Liu, Sameer Maskey, Evgeny Matusov, Hermann Ney, Andrew Rosenberg, Elizabeth Shriberg, Wen Wang, Chuck Wooters

Research output: Contribution to journalArticlepeer-review

Abstract

The concept of speech segmentation has many levels and are useful for improving automatic speech recognition (ASR) technology. There has been progress as well in the field of sentence segmentation with the combination of lexical information from a word recognizer, with spectral and prosodic cues. In addition, sentence segmentation is relevant for speech understanding applications especially for parsing and information extraction (IE) as well as machine translation, summarization and question answering at the application level. For a segmentation algorithm to work, audio diarization and structural segmentation are used. Audio diarization's goal is to segment an audio recording into acoustically homogeneous regions, given only features extracted from the audio signal. Another is speaker diarization, which involves computing a generalized log likelihood ratio at candidate boundaries. Structural segmentation has the goal of detecting boundary events and whole constituent modeling, and both are applicable for speech recognition because they exploit the alignment between words.

Original languageEnglish (US)
Pages (from-to)59-69
Number of pages11
JournalIEEE Signal Processing Magazine
Volume25
Issue number3
DOIs
StatePublished - May 2008

ASJC Scopus subject areas

  • Signal Processing
  • Electrical and Electronic Engineering
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'Speech segmentation and spoken document processing'. Together they form a unique fingerprint.

Cite this