Characterizing discussions in the Spanish Wikipedia

Johnny Torres, Alfonsina Ochoa, Alberto Jimenez, Sixto Garcia, Enrique Pelaez, Xavier Ochoa

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Wikipedia, as the largest online encyclopedia, is edited collaboratively by hundreds of users. The content in some articles can have dispute, giving rise to discussions which are registered in the related talk pages. In this paper, we propose an annotation schema for Spanish Wikipedia talk pages in order to determine the type of opinions expressed in them. We apply the annotation schema to a corpus that includes a collection of discussions about 148 topics drawn from 25 Spanish Wikipedia talk pages. We make the resulting dataset publicly available for download on github1. Furthermore, we train and evaluate supervised machine learning models to automatically identify the annotation labels. Linear Support Vector classifier (LinearSVC) performs better compared to other baseline models, and achieves an accuracy F1 = 0.71 in our experiments.

Original languageEnglish (US)
Title of host publication2017 IEEE 2nd Ecuador Technical Chapters Meeting, ETCM 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1-6
Number of pages6
ISBN (Electronic)9781538638941
DOIs
StatePublished - Jan 4 2018
Event2nd IEEE Ecuador Technical Chapters Meeting, ETCM 2017 - Salinas, Ecuador
Duration: Oct 16 2017Oct 20 2017

Publication series

Name2017 IEEE 2nd Ecuador Technical Chapters Meeting, ETCM 2017
Volume2017-January

Other

Other2nd IEEE Ecuador Technical Chapters Meeting, ETCM 2017
Country/TerritoryEcuador
CitySalinas
Period10/16/1710/20/17

Keywords

  • Collaborative Writing
  • NLP
  • Wikipedia

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Science Applications
  • Information Systems
  • Information Systems and Management
  • Safety, Risk, Reliability and Quality
  • Control and Optimization

Fingerprint

Dive into the research topics of 'Characterizing discussions in the Spanish Wikipedia'. Together they form a unique fingerprint.

Cite this