Text classification for automatic detection of e-cigarette use and use for smoking cessation from twitter: A feasibility pilot

Yin Aphinyanaphongs, Armine Lulejian, Duncan Penfold Brown, Richard Bonneau, Paul Krebs

Research output: Contribution to journalConference articlepeer-review

Abstract

Rapid increases in e-cigarette use and potential exposure to harmful byproducts have shifted public health focus to e-cigarettes as a possible drug of abuse. Effective surveillance of use and prevalence would allow appropriate regulatory responses. An ideal surveillance system would collect usage data in real time, focus on populations of interest, include populations unable to take the survey, allow a breadth of questions to answer, and enable geo-location analysis. Social media streams may provide this ideal system. To realize this use case, a foundational question is whether we can detect ecigarette use at all. This work reports two pilot tasks using text classification to identify automatically Tweets that indicate e-cigarette use and/or e-cigarette use for smoking cessation. We build and define both datasets and compare performance of 4 state of the art classifiers and a keyword search for each task. Our results demonstrate excellent classifier performance of up to 0.90 and 0.94 area under the curve in each category. These promising initial results form the foundation for further studies to realize the ideal surveillance solution.

Original languageEnglish (US)
Pages (from-to)480-491
Number of pages12
JournalPacific Symposium on Biocomputing
DOIs
StatePublished - 2016
Event21st Pacific Symposium on Biocomputing, PSB 2016 - Big Island, United States
Duration: Jan 4 2016Jan 8 2016

ASJC Scopus subject areas

  • Biomedical Engineering
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Text classification for automatic detection of e-cigarette use and use for smoking cessation from twitter: A feasibility pilot'. Together they form a unique fingerprint.

Cite this