TY - GEN
T1 - Punctuating speech for information extraction
AU - Favre, Benoit
AU - Grishman, Ralph
AU - Hillard, Dustin
AU - Ji, Heng
AU - Hakkani-Tür, Dilek
AU - Ostendorf, Mari
PY - 2008
Y1 - 2008
N2 - This paper studies the effect of automatic sentence boundary detection and comma prediction on entity and relation extraction in speech. We show that punctuating the machine generated transcript according to maximum F-measure of period and comma annotation results in suboptimal information extraction. Precisely, period and comma decision thresholds can be chosen in order to improve the entity value score and the relation value score by 4% relative. Error analysis shows that preventing noun-phrase splitting by generating longer sentences and fewer commas can be harmful for IE performance. Indeed, it seems that missed punctuation allows syntactic parsers to merge noun-phrases and prevent the extraction of correct information.
AB - This paper studies the effect of automatic sentence boundary detection and comma prediction on entity and relation extraction in speech. We show that punctuating the machine generated transcript according to maximum F-measure of period and comma annotation results in suboptimal information extraction. Precisely, period and comma decision thresholds can be chosen in order to improve the entity value score and the relation value score by 4% relative. Error analysis shows that preventing noun-phrase splitting by generating longer sentences and fewer commas can be harmful for IE performance. Indeed, it seems that missed punctuation allows syntactic parsers to merge noun-phrases and prevent the extraction of correct information.
KW - Information extraction
KW - Punctuation prediction
KW - Speech
UR - http://www.scopus.com/inward/record.url?scp=51449122781&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=51449122781&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2008.4518784
DO - 10.1109/ICASSP.2008.4518784
M3 - Conference contribution
AN - SCOPUS:51449122781
SN - 1424414849
SN - 9781424414840
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 5013
EP - 5016
BT - 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP
T2 - 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP
Y2 - 31 March 2008 through 4 April 2008
ER -