AI as a Sport: On the Competitive Epistemologies of Benchmarking

Will Orr, Edward B. Kang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Artificial Intelligence (AI) systems are evaluated using competitive methods that rely on benchmark datasets to determine performance. These benchmark datasets, however, are often constructed through arbitrary processes that fall short in encapsulating the depth and breadth of the tasks they are intended to measure. In this paper, we interrogate the naturalization of benchmark datasets as veracious metrics by examining the historical development of benchmarking as an epistemic practice in AI research. Specifically, we highlight three key case studies that were crucial in establishing the existing reliance on benchmark datasets for evaluating the capabilities of AI systems: (1) the sharing of Highleyman's OCR dataset in the 1960s, which solidified a community of knowledge production around a shared benchmark dataset, (2) the Common Task Framework (CTF) of the 1980s, a state-led project to standardize benchmark datasets as legitimate indicators of technical progress; and (3) the Netflix Prize which further solidified benchmarking as a competitive goal within the ML research community. This genealogy highlights how contemporary dynamics and limitations of benchmarking developed from a longer history of collaboration, standardization, and competition. We end with reflections on how this history informs our understanding of benchmarking in the current era of generative artificial intelligence.

Original languageEnglish (US)
Title of host publication2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2024
PublisherAssociation for Computing Machinery, Inc
Pages1875-1884
Number of pages10
ISBN (Electronic)9798400704505
DOIs
StatePublished - Jun 3 2024
Event2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2024 - Rio de Janeiro, Brazil
Duration: Jun 3 2024Jun 6 2024

Publication series

Name2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2024

Conference

Conference2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2024
Country/TerritoryBrazil
CityRio de Janeiro
Period6/3/246/6/24

Keywords

  • Benchmark datasets.
  • Benchmarking for generative AI
  • History of benchmarking
  • Machine learning benchmarks
  • Machine learning competitions

ASJC Scopus subject areas

  • General Business, Management and Accounting

Fingerprint

Dive into the research topics of 'AI as a Sport: On the Competitive Epistemologies of Benchmarking'. Together they form a unique fingerprint.

Cite this