TY - JOUR
T1 - SupCL-Seq: Supervised Contrastive Learning for Downstream Optimized Sequence Representations
AU - Sedghamiz, Hooman
AU - Raval, Shivam
AU - Santus, Enrico
AU - Alhanai, Tuka
AU - Ghassemi, Mohammad M.
N1 - Publisher Copyright:
© 2021 Association for Computational Linguistics.
PY - 2021/9/1
Y1 - 2021/9/1
N2 - While contrastive learning is proven to be an effective training
strategy in computer vision, Natural Language Processing (NLP) is only
recently adopting it as a self-supervised alternative to Masked Language
Modeling (MLM) for improving sequence representations. This paper
introduces SupCL-Seq, which extends the supervised contrastive learning
from computer vision to the optimization of sequence representations in
NLP. By altering the dropout mask probability in standard Transformer
architectures, for every representation (anchor), we generate augmented
altered views. A supervised contrastive loss is then utilized to
maximize the system's capability of pulling together similar samples
(e.g., anchors and their altered views) and pushing apart the samples
belonging to the other classes. Despite its simplicity, SupCLSeq leads
to large gains in many sequence classification tasks on the GLUE
benchmark compared to a standard BERTbase, including 6% absolute
improvement on CoLA, 5.4% on MRPC, 4.7% on RTE and 2.6% on STSB. We also
show consistent gains over self supervised contrastively learned
representations, especially in non-semantic tasks. Finally we show that
these gains are not solely due to augmentation, but rather to a
downstream optimized sequence representation. Code:
https://github.com/hooman650/SupCL-Seq
AB - While contrastive learning is proven to be an effective training
strategy in computer vision, Natural Language Processing (NLP) is only
recently adopting it as a self-supervised alternative to Masked Language
Modeling (MLM) for improving sequence representations. This paper
introduces SupCL-Seq, which extends the supervised contrastive learning
from computer vision to the optimization of sequence representations in
NLP. By altering the dropout mask probability in standard Transformer
architectures, for every representation (anchor), we generate augmented
altered views. A supervised contrastive loss is then utilized to
maximize the system's capability of pulling together similar samples
(e.g., anchors and their altered views) and pushing apart the samples
belonging to the other classes. Despite its simplicity, SupCLSeq leads
to large gains in many sequence classification tasks on the GLUE
benchmark compared to a standard BERTbase, including 6% absolute
improvement on CoLA, 5.4% on MRPC, 4.7% on RTE and 2.6% on STSB. We also
show consistent gains over self supervised contrastively learned
representations, especially in non-semantic tasks. Finally we show that
these gains are not solely due to augmentation, but rather to a
downstream optimized sequence representation. Code:
https://github.com/hooman650/SupCL-Seq
KW - Computer Science - Computation and Language
KW - Computer Science - Machine Learning
UR - http://www.scopus.com/inward/record.url?scp=85129125774&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85129125774&partnerID=8YFLogxK
M3 - Article
SP - 3398
EP - 3403
JO - Findings in EMNLP
JF - Findings in EMNLP
ER -