Show Your Work with Confidence: Confidence Bands for Tuning Curves

Nicholas Lourie, Kyunghyun Cho, He He

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The choice of hyperparameters greatly impacts performance in natural language processing. Often, it is hard to tell if a method is better than another or just better tuned. Tuning curves fix this ambiguity by accounting for tuning effort. Specifically, they plot validation performance as a function of the number of hyperparameter choices tried so far. While several estimators exist for these curves, it is common to use point estimates, which we show fail silently and give contradictory results when given too little data. Beyond point estimates, confidence bands are necessary to rigorously establish the relationship between different approaches. We present the first method to construct valid confidence bands for tuning curves. The bands are exact, simultaneous, and distribution-free, thus they provide a robust basis for comparing methods. Empirical analysis shows that while bootstrap confidence bands, which serve as a baseline, fail to approximate their target confidence, ours achieve it exactly. We validate our design with ablations, analyze the effect of sample size, and provide guidance on comparing models with our method. To promote confident comparisons in future work, we release opda: an easy-to-use library that you can install with pip. https://github.com/nicholaslourie/opda.

Original languageEnglish (US)
Title of host publicationLong Papers
EditorsKevin Duh, Helena Gomez, Steven Bethard
PublisherAssociation for Computational Linguistics (ACL)
Pages3455-3472
Number of pages18
ISBN (Electronic)9798891761148
StatePublished - 2024
Event2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2024 - Hybrid, Mexico City, Mexico
Duration: Jun 16 2024Jun 21 2024

Publication series

NameProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2024
Volume1

Conference

Conference2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2024
Country/TerritoryMexico
CityHybrid, Mexico City
Period6/16/246/21/24

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Hardware and Architecture
  • Information Systems
  • Software

Fingerprint

Dive into the research topics of 'Show Your Work with Confidence: Confidence Bands for Tuning Curves'. Together they form a unique fingerprint.

Cite this