TY - GEN
T1 - Vapor engine
T2 - ACM Conference on Human Information Interaction and Retrieval, CHIIR 2016
AU - Oard, Douglas W.
AU - Sankepally, Rashmi
AU - White, Jerome
AU - Harman, Craig
PY - 2016/3/13
Y1 - 2016/3/13
N2 - Typical search engines for spoken content begin with some form of language-specific audio processing such as phonetic word recognition. Many languages, however, lack the language tuned preprocessing tools that are needed to create indexing terms for speech. One approach in such cases is to rely on repetition, detected using acoustic features, to find terms that might be worth indexing. Experiments have shown that this approach yields term sets that might be sufficient for some applications in both spoken term detection and ranked retrieval experiments. Such approaches currently work only with spoken queries, however, and only when the searcher is able to speak in a manner similar to that of the speakers in the collection. This demonstration paper proposes Vapor Engine, a new tool for selectively transcribing repeated terms that can be automatically detected from spoken content in any language. These transcribed terms could then be matched to queries formulated using written terms. Vapor Engine is early in development: it currently supports only single-term queries and has not yet having been formally evaluated. This paper introduces the interface and summarizes the challenges it seeks to address.
AB - Typical search engines for spoken content begin with some form of language-specific audio processing such as phonetic word recognition. Many languages, however, lack the language tuned preprocessing tools that are needed to create indexing terms for speech. One approach in such cases is to rely on repetition, detected using acoustic features, to find terms that might be worth indexing. Experiments have shown that this approach yields term sets that might be sufficient for some applications in both spoken term detection and ranked retrieval experiments. Such approaches currently work only with spoken queries, however, and only when the searcher is able to speak in a manner similar to that of the speakers in the collection. This demonstration paper proposes Vapor Engine, a new tool for selectively transcribing repeated terms that can be automatically detected from spoken content in any language. These transcribed terms could then be matched to queries formulated using written terms. Vapor Engine is early in development: it currently supports only single-term queries and has not yet having been formally evaluated. This paper introduces the interface and summarizes the challenges it seeks to address.
UR - http://www.scopus.com/inward/record.url?scp=84974530709&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84974530709&partnerID=8YFLogxK
U2 - 10.1145/2854946.2854987
DO - 10.1145/2854946.2854987
M3 - Conference contribution
AN - SCOPUS:84974530709
T3 - CHIIR 2016 - Proceedings of the 2016 ACM Conference on Human Information Interaction and Retrieval
SP - 301
EP - 304
BT - CHIIR 2016 - Proceedings of the 2016 ACM Conference on Human Information Interaction and Retrieval
PB - Association for Computing Machinery, Inc
Y2 - 13 March 2016 through 17 March 2016
ER -