TY - JOUR
T1 - Radiology Reports Improve Visual Representations Learned from Radiographs
AU - Huang, Haoxu
AU - Rawlekar, Samyak
AU - Chopra, Sumit
AU - Deniz, Cem M.
N1 - Publisher Copyright:
© 2023 CC-BY 4.0, H. Huang, S. Rawlekar, S. Chopra & C.M. Deniz. 2021; Seibold et al., 2022). In biomedical domain, the choice between multi-modal and self-supervised learning is still unclear for learning better visual representations. Although few studies similar to ConVIRT (Zhang et al., 2020) have looked into the quality of visual representation for biomedical images on multi-modal vs. self-supervised learning, their evaluations are conducted only on a small fraction of dataset for self-supervised learning on a part of most important pathology, which is not guaranteed to see which methods generally perform better. Hence, we believe it is important to compare the performances of multi-modal, self-supervised learning and their joint training in a more comprehensive way.
PY - 2023
Y1 - 2023
N2 - Although human's ability to visually understand the structure of the World plays a crucial role in perceiving the World and making appropriate decisions, human perception does not solely rely on vision but amalgamates the information from acoustic, verbal, and visual stimuli. An active area of research has been revolving around designing an efficient framework that adapts to multiple modalities and ideally improves the performance of existing tasks. While numerous frameworks have proved effective on natural datasets like ImageNet, a limited number of studies have been carried out in the biomedical domain. In this work, we extend the available frameworks for natural data to biomedical data by leveraging the abundant, unstructured multi-modal data available as radiology images and reports. We attempt to answer the question,”For multi-modal learning, self-supervised learning and joint learning using both learning strategies, which one improves the visual representation for downstream chest radiographs classification tasks the most?”. Our experiments indicated that in limited labeled data settings with 1% and 10% labeled data, the joint learning with multi-modal and self-supervised models outperforms self-supervised learning and is at par with multi-modal learning. Additionally, we found that multi-modal learning is generally more robust on out-of-distribution datasets. The code is publicly available online.
AB - Although human's ability to visually understand the structure of the World plays a crucial role in perceiving the World and making appropriate decisions, human perception does not solely rely on vision but amalgamates the information from acoustic, verbal, and visual stimuli. An active area of research has been revolving around designing an efficient framework that adapts to multiple modalities and ideally improves the performance of existing tasks. While numerous frameworks have proved effective on natural datasets like ImageNet, a limited number of studies have been carried out in the biomedical domain. In this work, we extend the available frameworks for natural data to biomedical data by leveraging the abundant, unstructured multi-modal data available as radiology images and reports. We attempt to answer the question,”For multi-modal learning, self-supervised learning and joint learning using both learning strategies, which one improves the visual representation for downstream chest radiographs classification tasks the most?”. Our experiments indicated that in limited labeled data settings with 1% and 10% labeled data, the joint learning with multi-modal and self-supervised models outperforms self-supervised learning and is at par with multi-modal learning. Additionally, we found that multi-modal learning is generally more robust on out-of-distribution datasets. The code is publicly available online.
KW - Multi-Modal Learning
KW - Out-of-Distribution
KW - Radiology
KW - Self-Supervised Learning
UR - http://www.scopus.com/inward/record.url?scp=85189292558&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85189292558&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85189292558
SN - 2640-3498
VL - 227
SP - 1385
EP - 1405
JO - Proceedings of Machine Learning Research
JF - Proceedings of Machine Learning Research
T2 - 6th International Conference on Medical Imaging with Deep Learning, MIDL 2023
Y2 - 10 July 2023 through 12 July 2023
ER -