Benchmarks for Automated Commonsense Reasoning: A Survey

Research output: Contribution to journalArticlepeer-review

Abstract

More than one hundred benchmarks have been developed to test the commonsense knowledge and commonsense reasoning abilities of artificial intelligence (AI) systems. However, these benchmarks are often flawed, and many aspects of common sense remain untested. Consequently, there is currently no reliable way of measuring to what extent existing AI systems have achieved these abilities.This article surveys the development and uses of AI commonsense benchmarks. It enumerates 139 commonsense benchmarks that have been developed: 102 text-based, 18 image-based, 12 video-based, and 7 based in simulated physical environments. It gives more detailed descriptions of twelve of these, three from each category. It surveys the various methods used to construct commonsense benchmarks. It discusses the nature of common sense, the role of common sense in AI, the goals served by constructing commonsense benchmarks, desirable features of commonsense benchmarks, and flaws and gap in existing benchmarks. It concludes with a number of recommendations for future development of commonsense AI benchmarks; most importantly, that the creators of benchmarks invest the work needed to ensure that benchmark examples are consistently high quality.

Original languageEnglish (US)
Article number81
JournalACM Computing Surveys
Volume56
Issue number4
DOIs
StatePublished - Oct 23 2023

Keywords

  • Common sense
  • benchmarks
  • commonsense knowledge
  • commonsense reasoning
  • evaluation

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Benchmarks for Automated Commonsense Reasoning: A Survey'. Together they form a unique fingerprint.

Cite this