TY - JOUR
T1 - Speech segmentation and spoken document processing
AU - Ostendorf, Mari
AU - Favre, Benoit
AU - Grishman, Ralph
AU - Hakkani-Tür, Dilek
AU - Harper, Mary
AU - Hillard, Dustin
AU - Hirschberg, Julia
AU - Ji, Heng
AU - Kahn, Jeremy G.
AU - Liu, Yang
AU - Maskey, Sameer
AU - Matusov, Evgeny
AU - Ney, Hermann
AU - Rosenberg, Andrew
AU - Shriberg, Elizabeth
AU - Wang, Wen
AU - Wooters, Chuck
PY - 2008/5
Y1 - 2008/5
N2 - The concept of speech segmentation has many levels and are useful for improving automatic speech recognition (ASR) technology. There has been progress as well in the field of sentence segmentation with the combination of lexical information from a word recognizer, with spectral and prosodic cues. In addition, sentence segmentation is relevant for speech understanding applications especially for parsing and information extraction (IE) as well as machine translation, summarization and question answering at the application level. For a segmentation algorithm to work, audio diarization and structural segmentation are used. Audio diarization's goal is to segment an audio recording into acoustically homogeneous regions, given only features extracted from the audio signal. Another is speaker diarization, which involves computing a generalized log likelihood ratio at candidate boundaries. Structural segmentation has the goal of detecting boundary events and whole constituent modeling, and both are applicable for speech recognition because they exploit the alignment between words.
AB - The concept of speech segmentation has many levels and are useful for improving automatic speech recognition (ASR) technology. There has been progress as well in the field of sentence segmentation with the combination of lexical information from a word recognizer, with spectral and prosodic cues. In addition, sentence segmentation is relevant for speech understanding applications especially for parsing and information extraction (IE) as well as machine translation, summarization and question answering at the application level. For a segmentation algorithm to work, audio diarization and structural segmentation are used. Audio diarization's goal is to segment an audio recording into acoustically homogeneous regions, given only features extracted from the audio signal. Another is speaker diarization, which involves computing a generalized log likelihood ratio at candidate boundaries. Structural segmentation has the goal of detecting boundary events and whole constituent modeling, and both are applicable for speech recognition because they exploit the alignment between words.
UR - http://www.scopus.com/inward/record.url?scp=85032751513&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85032751513&partnerID=8YFLogxK
U2 - 10.1109/MSP.2008.918023
DO - 10.1109/MSP.2008.918023
M3 - Article
AN - SCOPUS:85032751513
SN - 1053-5888
VL - 25
SP - 59
EP - 69
JO - IEEE Signal Processing Magazine
JF - IEEE Signal Processing Magazine
IS - 3
ER -