TY - JOUR
T1 - Electronic medical record phenotyping using the anchor and learn framework
AU - Halpern, Yoni
AU - Horng, Steven
AU - Choi, Youngduck
AU - Sontag, David
N1 - Funding Information:
This work is partially supported by a Google Faculty Research Award, grant UL1 TR000038 from National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Eleanor and Miles Shore Foundation, and Center for Integration of Medicine and Innovative Technology (CIMIT) Award No. 12-1262 under US Army Medical Research Acquisition Activity Cooperative Agreement W81XWH-09-2-0001. Y.H. was supported by a postgraduate scholarship from the Natural Sciences and Engineering Research Council of Canada (NSERC). The information contained herein does not necessarily reflect the position or policy of the government, and no official endorsement should be inferred.
Publisher Copyright:
© The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association.
PY - 2016/7
Y1 - 2016/7
N2 - Background: Electronic medical records (EMRs) hold a tremendous amount of information about patients that is relevant to determining the optimal approach to patient care. As medicine becomes increasingly precise, a patient's electronic medical record phenotype will play an important role in triggering clinical decision support systems that can deliver personalized recommendations in real time. Learning with anchors presents a method of efficiently learning statistically driven phenotypes with minimal manual intervention. Materials and Methods: We developed a phenotype library that uses both structured and unstructured data from the EMR to represent patients for real-time clinical decision support. Eight of the phenotypes were evaluated using retrospective EMR data on emergency department patients using a set of prospectively gathered gold standard labels. Results: We built a phenotype library with 42 publicly available phenotype definitions. Using information from triage time, the phenotype classifiers have an area under the ROC curve (AUC) of infection 0.89, cancer 0.88, immuno-suppressed 0.85, septic shock 0.93, nursing home 0.87, anticoagulated 0.83, cardiac etiology 0.89, and pneumonia 0.90. Using information available at the time of disposition from the emergency department, the AUC values are infection 0.91, cancer 0.95, immunosuppressed 0.90, septic shock 0.97, nursing home 0.91, anticoagulated 0.94, cardiac etiology 0.92, and pneumonia 0.97. Discussion: The resulting phenotypes are interpretable and fast to build, and perform comparably to statistically learned phenotypes developed with 5000 manually labeled patients. Conclusion: Learning with anchors is an attractive option for building a large public repository of phenotype definitions that can be used for a range of health IT applications, including real-time decision support.
AB - Background: Electronic medical records (EMRs) hold a tremendous amount of information about patients that is relevant to determining the optimal approach to patient care. As medicine becomes increasingly precise, a patient's electronic medical record phenotype will play an important role in triggering clinical decision support systems that can deliver personalized recommendations in real time. Learning with anchors presents a method of efficiently learning statistically driven phenotypes with minimal manual intervention. Materials and Methods: We developed a phenotype library that uses both structured and unstructured data from the EMR to represent patients for real-time clinical decision support. Eight of the phenotypes were evaluated using retrospective EMR data on emergency department patients using a set of prospectively gathered gold standard labels. Results: We built a phenotype library with 42 publicly available phenotype definitions. Using information from triage time, the phenotype classifiers have an area under the ROC curve (AUC) of infection 0.89, cancer 0.88, immuno-suppressed 0.85, septic shock 0.93, nursing home 0.87, anticoagulated 0.83, cardiac etiology 0.89, and pneumonia 0.90. Using information available at the time of disposition from the emergency department, the AUC values are infection 0.91, cancer 0.95, immunosuppressed 0.90, septic shock 0.97, nursing home 0.91, anticoagulated 0.94, cardiac etiology 0.92, and pneumonia 0.97. Discussion: The resulting phenotypes are interpretable and fast to build, and perform comparably to statistically learned phenotypes developed with 5000 manually labeled patients. Conclusion: Learning with anchors is an attractive option for building a large public repository of phenotype definitions that can be used for a range of health IT applications, including real-time decision support.
KW - Clinical decision support systems
KW - Electronic health records
KW - Knowledge representation
KW - Machine learning
KW - Natural language processing
UR - http://www.scopus.com/inward/record.url?scp=84981275873&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84981275873&partnerID=8YFLogxK
U2 - 10.1093/jamia/ocw011
DO - 10.1093/jamia/ocw011
M3 - Article
AN - SCOPUS:84981275873
SN - 1067-5027
VL - 23
SP - 731
EP - 740
JO - Journal of the American Medical Informatics Association : JAMIA
JF - Journal of the American Medical Informatics Association : JAMIA
IS - 4
ER -