TY - JOUR
T1 - Text classification for automatic detection of e-cigarette use and use for smoking cessation from twitter
T2 - 21st Pacific Symposium on Biocomputing, PSB 2016
AU - Aphinyanaphongs, Yin
AU - Lulejian, Armine
AU - Brown, Duncan Penfold
AU - Bonneau, Richard
AU - Krebs, Paul
N1 - Publisher Copyright:
© 2016, World Scientific Publishing Co. Pte Ltd. All rights reserved.
PY - 2016
Y1 - 2016
N2 - Rapid increases in e-cigarette use and potential exposure to harmful byproducts have shifted public health focus to e-cigarettes as a possible drug of abuse. Effective surveillance of use and prevalence would allow appropriate regulatory responses. An ideal surveillance system would collect usage data in real time, focus on populations of interest, include populations unable to take the survey, allow a breadth of questions to answer, and enable geo-location analysis. Social media streams may provide this ideal system. To realize this use case, a foundational question is whether we can detect ecigarette use at all. This work reports two pilot tasks using text classification to identify automatically Tweets that indicate e-cigarette use and/or e-cigarette use for smoking cessation. We build and define both datasets and compare performance of 4 state of the art classifiers and a keyword search for each task. Our results demonstrate excellent classifier performance of up to 0.90 and 0.94 area under the curve in each category. These promising initial results form the foundation for further studies to realize the ideal surveillance solution.
AB - Rapid increases in e-cigarette use and potential exposure to harmful byproducts have shifted public health focus to e-cigarettes as a possible drug of abuse. Effective surveillance of use and prevalence would allow appropriate regulatory responses. An ideal surveillance system would collect usage data in real time, focus on populations of interest, include populations unable to take the survey, allow a breadth of questions to answer, and enable geo-location analysis. Social media streams may provide this ideal system. To realize this use case, a foundational question is whether we can detect ecigarette use at all. This work reports two pilot tasks using text classification to identify automatically Tweets that indicate e-cigarette use and/or e-cigarette use for smoking cessation. We build and define both datasets and compare performance of 4 state of the art classifiers and a keyword search for each task. Our results demonstrate excellent classifier performance of up to 0.90 and 0.94 area under the curve in each category. These promising initial results form the foundation for further studies to realize the ideal surveillance solution.
UR - http://www.scopus.com/inward/record.url?scp=85012206641&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85012206641&partnerID=8YFLogxK
U2 - 10.1142/9789814749411_0044
DO - 10.1142/9789814749411_0044
M3 - Conference article
C2 - 26776211
AN - SCOPUS:85012206641
SN - 2335-6928
SP - 480
EP - 491
JO - Pacific Symposium on Biocomputing
JF - Pacific Symposium on Biocomputing
Y2 - 4 January 2016 through 8 January 2016
ER -