Mining for gold: Identifying content-related MOOC discussion threads across domains through linguistic modeling

Alyssa Friend Wise, Yi Cui, Wan Qi Jin, Jovita Vytasek

Research output: Contribution to journalArticlepeer-review

Abstract

This study addresses overload and chaos in MOOC discussion forums by developing a model to categorize threads based on whether or not they are substantially related to course content. A linguistic model was built based on manually coded starting posts in threads from a statistics MOOC, and tested on the second offering of the course, another statistics MOOC, a psychology MOOC, a physiology MOOC, and a test set of reply posts. Results showed that content-related starting posts had distinct linguistic features that appeared unrelated to the domain. The model demonstrated good reliability for all starting posts in statistics and psychology as well as for reply posts (accuracy ranged from 0.80 to 0.85). Reliability for starting posts in physiology was lower but still provided reasonably good predictive ability (accuracy was 0.73). The classification model was useful across all time segments of the courses; the number of views and votes threads received were not helpful.

Original languageEnglish (US)
Pages (from-to)11-28
Number of pages18
JournalInternet and Higher Education
Volume32
DOIs
StatePublished - Jan 1 2017

Keywords

  • Computer-mediated communication
  • Content analysis
  • Discussion forums
  • Massive open online courses
  • Natural language processing
  • Social interaction

ASJC Scopus subject areas

  • Education
  • Computer Science Applications
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Mining for gold: Identifying content-related MOOC discussion threads across domains through linguistic modeling'. Together they form a unique fingerprint.

Cite this