TY - JOUR
T1 - Differences between human and machine perception in medical diagnosis
AU - Makino, Taro
AU - Jastrzębski, Stanisław
AU - Oleszkiewicz, Witold
AU - Chacko, Celin
AU - Ehrenpreis, Robin
AU - Samreen, Naziya
AU - Chhor, Chloe
AU - Kim, Eric
AU - Lee, Jiyon
AU - Pysarenko, Kristine
AU - Reig, Beatriu
AU - Toth, Hildegard
AU - Awal, Divya
AU - Du, Linda
AU - Kim, Alice
AU - Park, James
AU - Sodickson, Daniel K.
AU - Heacock, Laura
AU - Moy, Linda
AU - Cho, Kyunghyun
AU - Geras, Krzysztof J.
N1 - Publisher Copyright:
© 2022, The Author(s).
PY - 2022/12
Y1 - 2022/12
N2 - Deep neural networks (DNNs) show promise in image-based medical diagnosis, but cannot be fully trusted since they can fail for reasons unrelated to underlying pathology. Humans are less likely to make such superficial mistakes, since they use features that are grounded on medical science. It is therefore important to know whether DNNs use different features than humans. Towards this end, we propose a framework for comparing human and machine perception in medical diagnosis. We frame the comparison in terms of perturbation robustness, and mitigate Simpson’s paradox by performing a subgroup analysis. The framework is demonstrated with a case study in breast cancer screening, where we separately analyze microcalcifications and soft tissue lesions. While it is inconclusive whether humans and DNNs use different features to detect microcalcifications, we find that for soft tissue lesions, DNNs rely on high frequency components ignored by radiologists. Moreover, these features are located outside of the region of the images found most suspicious by radiologists. This difference between humans and machines was only visible through subgroup analysis, which highlights the importance of incorporating medical domain knowledge into the comparison.
AB - Deep neural networks (DNNs) show promise in image-based medical diagnosis, but cannot be fully trusted since they can fail for reasons unrelated to underlying pathology. Humans are less likely to make such superficial mistakes, since they use features that are grounded on medical science. It is therefore important to know whether DNNs use different features than humans. Towards this end, we propose a framework for comparing human and machine perception in medical diagnosis. We frame the comparison in terms of perturbation robustness, and mitigate Simpson’s paradox by performing a subgroup analysis. The framework is demonstrated with a case study in breast cancer screening, where we separately analyze microcalcifications and soft tissue lesions. While it is inconclusive whether humans and DNNs use different features to detect microcalcifications, we find that for soft tissue lesions, DNNs rely on high frequency components ignored by radiologists. Moreover, these features are located outside of the region of the images found most suspicious by radiologists. This difference between humans and machines was only visible through subgroup analysis, which highlights the importance of incorporating medical domain knowledge into the comparison.
UR - http://www.scopus.com/inward/record.url?scp=85128912615&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85128912615&partnerID=8YFLogxK
U2 - 10.1038/s41598-022-10526-z
DO - 10.1038/s41598-022-10526-z
M3 - Article
C2 - 35477730
AN - SCOPUS:85128912615
SN - 2045-2322
VL - 12
JO - Scientific reports
JF - Scientific reports
IS - 1
M1 - 6877
ER -