Characterizing discussions in the Spanish Wikipedia

Johnny Torres, Alfonsina Ochoa, Alberto Jimenez, Sixto Garcia, Enrique Pelaez, Xavier Ochoa

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Wikipedia, as the largest online encyclopedia, is edited collaboratively by hundreds of users. The content in some articles can have dispute, giving rise to discussions which are registered in the related talk pages. In this paper, we propose an annotation schema for Spanish Wikipedia talk pages in order to determine the type of opinions expressed in them. We apply the annotation schema to a corpus that includes a collection of discussions about 148 topics drawn from 25 Spanish Wikipedia talk pages. We make the resulting dataset publicly available for download on github1. Furthermore, we train and evaluate supervised machine learning models to automatically identify the annotation labels. Linear Support Vector classifier (LinearSVC) performs better compared to other baseline models, and achieves an accuracy F1 = 0.71 in our experiments.

Original languageEnglish (US)
Title of host publication2017 IEEE 2nd Ecuador Technical Chapters Meeting, ETCM 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1-6
Number of pages6
Volume2017-January
ISBN (Electronic)9781538638941
DOIs
StatePublished - Jan 4 2018
Event2nd IEEE Ecuador Technical Chapters Meeting, ETCM 2017 - Salinas, Ecuador
Duration: Oct 16 2017Oct 20 2017

Other

Other2nd IEEE Ecuador Technical Chapters Meeting, ETCM 2017
CountryEcuador
CitySalinas
Period10/16/1710/20/17

Fingerprint

Wikipedia
Annotation
Schema
Learning systems
Labels
Classifiers
Support Vector
Supervised Learning
Baseline
Machine Learning
Classifier
Evaluate
Experiments
Model
Experiment

Keywords

  • Collaborative Writing
  • NLP
  • Wikipedia

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Science Applications
  • Information Systems
  • Information Systems and Management
  • Safety, Risk, Reliability and Quality
  • Control and Optimization

Cite this

Torres, J., Ochoa, A., Jimenez, A., Garcia, S., Pelaez, E., & Ochoa, X. (2018). Characterizing discussions in the Spanish Wikipedia. In 2017 IEEE 2nd Ecuador Technical Chapters Meeting, ETCM 2017 (Vol. 2017-January, pp. 1-6). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ETCM.2017.8247544

Characterizing discussions in the Spanish Wikipedia. / Torres, Johnny; Ochoa, Alfonsina; Jimenez, Alberto; Garcia, Sixto; Pelaez, Enrique; Ochoa, Xavier.

2017 IEEE 2nd Ecuador Technical Chapters Meeting, ETCM 2017. Vol. 2017-January Institute of Electrical and Electronics Engineers Inc., 2018. p. 1-6.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Torres, J, Ochoa, A, Jimenez, A, Garcia, S, Pelaez, E & Ochoa, X 2018, Characterizing discussions in the Spanish Wikipedia. in 2017 IEEE 2nd Ecuador Technical Chapters Meeting, ETCM 2017. vol. 2017-January, Institute of Electrical and Electronics Engineers Inc., pp. 1-6, 2nd IEEE Ecuador Technical Chapters Meeting, ETCM 2017, Salinas, Ecuador, 10/16/17. https://doi.org/10.1109/ETCM.2017.8247544
Torres J, Ochoa A, Jimenez A, Garcia S, Pelaez E, Ochoa X. Characterizing discussions in the Spanish Wikipedia. In 2017 IEEE 2nd Ecuador Technical Chapters Meeting, ETCM 2017. Vol. 2017-January. Institute of Electrical and Electronics Engineers Inc. 2018. p. 1-6 https://doi.org/10.1109/ETCM.2017.8247544
Torres, Johnny ; Ochoa, Alfonsina ; Jimenez, Alberto ; Garcia, Sixto ; Pelaez, Enrique ; Ochoa, Xavier. / Characterizing discussions in the Spanish Wikipedia. 2017 IEEE 2nd Ecuador Technical Chapters Meeting, ETCM 2017. Vol. 2017-January Institute of Electrical and Electronics Engineers Inc., 2018. pp. 1-6
@inproceedings{00effd4c2430495c8745f4359a49bbfd,
title = "Characterizing discussions in the Spanish Wikipedia",
abstract = "Wikipedia, as the largest online encyclopedia, is edited collaboratively by hundreds of users. The content in some articles can have dispute, giving rise to discussions which are registered in the related talk pages. In this paper, we propose an annotation schema for Spanish Wikipedia talk pages in order to determine the type of opinions expressed in them. We apply the annotation schema to a corpus that includes a collection of discussions about 148 topics drawn from 25 Spanish Wikipedia talk pages. We make the resulting dataset publicly available for download on github1. Furthermore, we train and evaluate supervised machine learning models to automatically identify the annotation labels. Linear Support Vector classifier (LinearSVC) performs better compared to other baseline models, and achieves an accuracy F1 = 0.71 in our experiments.",
keywords = "Collaborative Writing, NLP, Wikipedia",
author = "Johnny Torres and Alfonsina Ochoa and Alberto Jimenez and Sixto Garcia and Enrique Pelaez and Xavier Ochoa",
year = "2018",
month = "1",
day = "4",
doi = "10.1109/ETCM.2017.8247544",
language = "English (US)",
volume = "2017-January",
pages = "1--6",
booktitle = "2017 IEEE 2nd Ecuador Technical Chapters Meeting, ETCM 2017",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Characterizing discussions in the Spanish Wikipedia

AU - Torres, Johnny

AU - Ochoa, Alfonsina

AU - Jimenez, Alberto

AU - Garcia, Sixto

AU - Pelaez, Enrique

AU - Ochoa, Xavier

PY - 2018/1/4

Y1 - 2018/1/4

N2 - Wikipedia, as the largest online encyclopedia, is edited collaboratively by hundreds of users. The content in some articles can have dispute, giving rise to discussions which are registered in the related talk pages. In this paper, we propose an annotation schema for Spanish Wikipedia talk pages in order to determine the type of opinions expressed in them. We apply the annotation schema to a corpus that includes a collection of discussions about 148 topics drawn from 25 Spanish Wikipedia talk pages. We make the resulting dataset publicly available for download on github1. Furthermore, we train and evaluate supervised machine learning models to automatically identify the annotation labels. Linear Support Vector classifier (LinearSVC) performs better compared to other baseline models, and achieves an accuracy F1 = 0.71 in our experiments.

AB - Wikipedia, as the largest online encyclopedia, is edited collaboratively by hundreds of users. The content in some articles can have dispute, giving rise to discussions which are registered in the related talk pages. In this paper, we propose an annotation schema for Spanish Wikipedia talk pages in order to determine the type of opinions expressed in them. We apply the annotation schema to a corpus that includes a collection of discussions about 148 topics drawn from 25 Spanish Wikipedia talk pages. We make the resulting dataset publicly available for download on github1. Furthermore, we train and evaluate supervised machine learning models to automatically identify the annotation labels. Linear Support Vector classifier (LinearSVC) performs better compared to other baseline models, and achieves an accuracy F1 = 0.71 in our experiments.

KW - Collaborative Writing

KW - NLP

KW - Wikipedia

UR - http://www.scopus.com/inward/record.url?scp=85045744307&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85045744307&partnerID=8YFLogxK

U2 - 10.1109/ETCM.2017.8247544

DO - 10.1109/ETCM.2017.8247544

M3 - Conference contribution

VL - 2017-January

SP - 1

EP - 6

BT - 2017 IEEE 2nd Ecuador Technical Chapters Meeting, ETCM 2017

PB - Institute of Electrical and Electronics Engineers Inc.

ER -