TY - JOUR
T1 - Interlingual annotation of parallel text corpora
T2 - A new framework for annotation and evaluation
AU - Dorr, Bonnie J.
AU - Passonneau, Rebecca J.
AU - Farwell, David
AU - Green, Rebecca
AU - Habash, Nizar
AU - Helmreich, Stephen
AU - Hovy, Eduard
AU - Levin, Lori
AU - Miller, Keith J.
AU - Mitamura, Teruko
AU - Rambow, Owen
AU - Siddharthan, Advaith
PY - 2010/7
Y1 - 2010/7
N2 - This paper focuses on an important step in the creation of a system of meaning representation and the development of semantically annotated parallel corpora, for use in applications such as machine translation, question answering, text summarization, and information retrieval. The work described below constitutes the first effort of any kind to annotate multiple translations of foreign-language texts with interlingual content. Three levels of representation are introduced: deep syntactic dependencies (IL0), intermediate semantic representations (IL1), and a normalized representation that unifies conversives, nonliteral language, and paraphrase (IL2). The resulting annotated, multilingually induced, parallel corpora will be useful as an empirical basis for a wide range of research, including the development and evaluation of interlingual NLP systems and paraphrase-extraction systems as well as a host of other research and development efforts in theoretical and applied linguistics, foreign language pedagogy, translation studies, and other related disciplines.
AB - This paper focuses on an important step in the creation of a system of meaning representation and the development of semantically annotated parallel corpora, for use in applications such as machine translation, question answering, text summarization, and information retrieval. The work described below constitutes the first effort of any kind to annotate multiple translations of foreign-language texts with interlingual content. Three levels of representation are introduced: deep syntactic dependencies (IL0), intermediate semantic representations (IL1), and a normalized representation that unifies conversives, nonliteral language, and paraphrase (IL2). The resulting annotated, multilingually induced, parallel corpora will be useful as an empirical basis for a wide range of research, including the development and evaluation of interlingual NLP systems and paraphrase-extraction systems as well as a host of other research and development efforts in theoretical and applied linguistics, foreign language pedagogy, translation studies, and other related disciplines.
UR - http://www.scopus.com/inward/record.url?scp=78650044500&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78650044500&partnerID=8YFLogxK
U2 - 10.1017/S1351324910000070
DO - 10.1017/S1351324910000070
M3 - Article
AN - SCOPUS:78650044500
SN - 1351-3249
VL - 16
SP - 197
EP - 243
JO - Natural Language Engineering
JF - Natural Language Engineering
IS - 3
ER -