TY - GEN
T1 - The margarita dialogue corpus: A data set for time-offset interactions and unstructured dialogue systems
AU - Chierici, Alberto M.
AU - Habash, Nizar
AU - Bicec, Margarita
N1 - Funding Information:
We would to thank David Traum for helpful conversations. We also would like to thank the NYUAD TOIA team (Dana Abu Ali, Muaz Ahmad, Hayat Al Hassan, Paula Dozsa, Ming Hu, and Jose Varias) for making the TOIA software they created available to us.
Publisher Copyright:
© European Language Resources Association (ELRA), licensed under CC-BY-NC
PY - 2020
Y1 - 2020
N2 - Time-Offset Interaction Applications (TOIAs) are systems that simulate face-to-face conversations between humans and digital human avatars recorded in the past. Developing a well-functioning TOIA involves several research areas: artificial intelligence, human-computer interaction, natural language processing, question answering, and dialogue systems. The first challenges are to define a sensible methodology for data collection and to create useful data sets for training the system to retrieve the best answer to a user's question. In this paper, we present three main contributions: a methodology for creating the knowledge base for a TOIA, a dialogue corpus, and baselines for single-turn answer retrieval. We develop the methodology using a two-step strategy. First, we let the avatar maker list pairs by intuition, guessing what possible questions a user may ask to the avatar. Second, we record actual dialogues between random individuals and the avatar-maker. We make the Margarita Dialogue Corpus available to the research community. This corpus comprises the knowledge base in text format, the video clips for each answer, and the annotated dialogues.
AB - Time-Offset Interaction Applications (TOIAs) are systems that simulate face-to-face conversations between humans and digital human avatars recorded in the past. Developing a well-functioning TOIA involves several research areas: artificial intelligence, human-computer interaction, natural language processing, question answering, and dialogue systems. The first challenges are to define a sensible methodology for data collection and to create useful data sets for training the system to retrieve the best answer to a user's question. In this paper, we present three main contributions: a methodology for creating the knowledge base for a TOIA, a dialogue corpus, and baselines for single-turn answer retrieval. We develop the methodology using a two-step strategy. First, we let the avatar maker list pairs by intuition, guessing what possible questions a user may ask to the avatar. Second, we record actual dialogues between random individuals and the avatar-maker. We make the Margarita Dialogue Corpus available to the research community. This corpus comprises the knowledge base in text format, the video clips for each answer, and the annotated dialogues.
KW - Annotation
KW - Corpus Creation
KW - Dialogue
KW - Question Answering
UR - https://www.mendeley.com/catalogue/45acffc4-dccf-37ea-8314-31433c21032a/
UR - http://www.scopus.com/inward/record.url?scp=85096592430&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85096592430&partnerID=8YFLogxK
M3 - Conference contribution
SN - 9791095546344
T3 - LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings
SP - 476
EP - 484
BT - LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings
A2 - Calzolari, Nicoletta
A2 - Bechet, Frederic
A2 - Blache, Philippe
A2 - Choukri, Khalid
A2 - Cieri, Christopher
A2 - Declerck, Thierry
A2 - Goggi, Sara
A2 - Isahara, Hitoshi
A2 - Maegaard, Bente
A2 - Mariani, Joseph
A2 - Mazo, Helene
A2 - Moreno, Asuncion
A2 - Odijk, Jan
A2 - Piperidis, Stelios
PB - European Language Resources Association (ELRA)
ER -