TY - GEN
T1 - Fast Certification of Vision-Language Models Using Incremental Randomized Smoothing
AU - Nirala, Ashutosh
AU - Joshi, Ameya
AU - Sarkar, Soumik
AU - Hegde, Chinmay
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - A key benefit of deep vision-language models such as CLIP is that they enable zero-shot open vocabulary classification; the user has the ability to define novel class labels via natural language prompts at inference time. However, while CLIP-based zero-shot classifiers have demonstrated competitive performance across a range of domain shifts, they remain highly vulnerable to adversarial attacks. Therefore, ensuring the robustness of such models is crucial for their reliable deployment in the wild.In this work, we introduce Open Vocabulary Certification (OVC), a fast certification method designed for open-vocabulary models like CLIP via randomized smoothing techniques. Given a base "training"set of prompts and their corresponding certified CLIP classifiers, OVC relies on the observation that a classifier with a novel prompt can be viewed as a perturbed version of nearby classifiers in the base training set. Therefore, OVC can rapidly certify the novel classifier using a variation of incremental randomized smoothing. By using a caching trick, we achieve approximately two orders of magnitude acceleration in the certification process for novel prompts. To achieve further (heuristic) speedups, OVC approximates the embedding space at a given input using a multivariate normal distribution bypassing the need for sampling via forward passes through the vision backbone. We demonstrate the effectiveness of OVC on through experimental evaluation using multiple vision-language backbones on the CIFAR-10 and ImageNet test datasets.
AB - A key benefit of deep vision-language models such as CLIP is that they enable zero-shot open vocabulary classification; the user has the ability to define novel class labels via natural language prompts at inference time. However, while CLIP-based zero-shot classifiers have demonstrated competitive performance across a range of domain shifts, they remain highly vulnerable to adversarial attacks. Therefore, ensuring the robustness of such models is crucial for their reliable deployment in the wild.In this work, we introduce Open Vocabulary Certification (OVC), a fast certification method designed for open-vocabulary models like CLIP via randomized smoothing techniques. Given a base "training"set of prompts and their corresponding certified CLIP classifiers, OVC relies on the observation that a classifier with a novel prompt can be viewed as a perturbed version of nearby classifiers in the base training set. Therefore, OVC can rapidly certify the novel classifier using a variation of incremental randomized smoothing. By using a caching trick, we achieve approximately two orders of magnitude acceleration in the certification process for novel prompts. To achieve further (heuristic) speedups, OVC approximates the embedding space at a given input using a multivariate normal distribution bypassing the need for sampling via forward passes through the vision backbone. We demonstrate the effectiveness of OVC on through experimental evaluation using multiple vision-language backbones on the CIFAR-10 and ImageNet test datasets.
KW - certified robustness
KW - CLIP
KW - randomized smoothing
KW - Vision-language models
UR - http://www.scopus.com/inward/record.url?scp=85193791363&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85193791363&partnerID=8YFLogxK
U2 - 10.1109/SaTML59370.2024.00019
DO - 10.1109/SaTML59370.2024.00019
M3 - Conference contribution
AN - SCOPUS:85193791363
T3 - Proceedings - IEEE Conference on Safe and Trustworthy Machine Learning, SaTML 2024
SP - 252
EP - 271
BT - Proceedings - IEEE Conference on Safe and Trustworthy Machine Learning, SaTML 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 IEEE Conference on Safe and Trustworthy Machine Learning, SaTML 2024
Y2 - 9 April 2024 through 11 April 2024
ER -