TY - GEN
T1 - Efficient Medical Image Assessment via Self-supervised Learning
AU - Huang, Chun Yin
AU - Lei, Qi
AU - Li, Xiaoxiao
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - High-performance deep learning methods typically rely on large annotated training datasets, which are difficult to obtain in many clinical applications due to the high cost of medical image labeling. Existing data assessment methods commonly require knowing the labels in advance, which are not feasible to achieve our goal of ‘knowing which data to label.’ To this end, we formulate and propose a novel and efficient data assessment strategy, EXponentiAl Marginal sINgular valuE (EXAMINE ) score, to rank the quality of unlabeled medical image data based on their useful latent representations extracted via Self-supervised Learning (SSL) networks. Motivated by theoretical implication of SSL embedding space, we leverage a Masked Autoencoder [8] for feature extraction. Furthermore, we evaluate data quality based on the marginal change of the largest singular value after excluding the data point in the dataset. We conduct extensive experiments on a pathology dataset. Our results indicate the effectiveness and efficiency of our proposed methods for selecting the most valuable data to label.
AB - High-performance deep learning methods typically rely on large annotated training datasets, which are difficult to obtain in many clinical applications due to the high cost of medical image labeling. Existing data assessment methods commonly require knowing the labels in advance, which are not feasible to achieve our goal of ‘knowing which data to label.’ To this end, we formulate and propose a novel and efficient data assessment strategy, EXponentiAl Marginal sINgular valuE (EXAMINE ) score, to rank the quality of unlabeled medical image data based on their useful latent representations extracted via Self-supervised Learning (SSL) networks. Motivated by theoretical implication of SSL embedding space, we leverage a Masked Autoencoder [8] for feature extraction. Furthermore, we evaluate data quality based on the marginal change of the largest singular value after excluding the data point in the dataset. We conduct extensive experiments on a pathology dataset. Our results indicate the effectiveness and efficiency of our proposed methods for selecting the most valuable data to label.
UR - http://www.scopus.com/inward/record.url?scp=85140470276&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85140470276&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-17027-0_11
DO - 10.1007/978-3-031-17027-0_11
M3 - Conference contribution
AN - SCOPUS:85140470276
SN - 9783031170263
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 102
EP - 111
BT - Data Augmentation, Labelling, and Imperfections - 2nd MICCAI Workshop, DALI 2022, Held in Conjunction with MICCAI 2022, Proceedings
A2 - Nguyen, Hien V.
A2 - Huang, Sharon X.
A2 - Xue, Yuan
PB - Springer Science and Business Media Deutschland GmbH
T2 - 2nd MICCAI Workshop on Data Augmentation, Labelling, and Imperfections, DALI 2022, held in conjunction with 25th International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2022
Y2 - 22 September 2022 through 22 September 2022
ER -