TY - JOUR
T1 - How to write science questions that are easy for people and hard for computers
AU - Davis, Ernest
N1 - Publisher Copyright:
Copyright © 2016, Association for the Advancement of Artificial Intelligence. All rights reserved.
PY - 2016
Y1 - 2016
N2 - As a challenge problem for AI systems, I propose the use of hand-constructed multiple-choice tests, with problems that are easy for people but hard for computers. Specifically, I discuss techniques for constructing such problems at the level of a fourth-grade child and at the level of a high school student. For the fourth-grade-level questions, I argue that questions that require the understanding of time, of impossible or pointless scenarios, of causality, of the human body, or of sets of objects, and questions that require combining facts or require simple inductive arguments of indeterminate length can be chosen to be easy for people, and are likely to be hard for AI programs, in the current state of the art. For the high school level, I argue that questions that relate the formal science to the realia of laboratory experiments or of real-world observations are likely to be easy for people and hard for AI programs. I argue that these are more useful benchmarks than existing standardized tests such as the SATs or New York Regents tests. Since the questions in standardized tests are designed to be hard for people, they often leave many aspects of what is hard for computers but easy for people untested.
AB - As a challenge problem for AI systems, I propose the use of hand-constructed multiple-choice tests, with problems that are easy for people but hard for computers. Specifically, I discuss techniques for constructing such problems at the level of a fourth-grade child and at the level of a high school student. For the fourth-grade-level questions, I argue that questions that require the understanding of time, of impossible or pointless scenarios, of causality, of the human body, or of sets of objects, and questions that require combining facts or require simple inductive arguments of indeterminate length can be chosen to be easy for people, and are likely to be hard for AI programs, in the current state of the art. For the high school level, I argue that questions that relate the formal science to the realia of laboratory experiments or of real-world observations are likely to be easy for people and hard for AI programs. I argue that these are more useful benchmarks than existing standardized tests such as the SATs or New York Regents tests. Since the questions in standardized tests are designed to be hard for people, they often leave many aspects of what is hard for computers but easy for people untested.
UR - http://www.scopus.com/inward/record.url?scp=85020005543&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85020005543&partnerID=8YFLogxK
U2 - 10.1609/aimag.v37i1.2637
DO - 10.1609/aimag.v37i1.2637
M3 - Article
AN - SCOPUS:85020005543
SN - 0738-4602
VL - 37
SP - 13
EP - 22
JO - AI Magazine
JF - AI Magazine
IS - 1
ER -