TY - JOUR
T1 - Vocal features
T2 - From voice identification to speech recognition by machine
AU - Li, Xiaochang
AU - Mills, Mara
N1 - Funding Information:
Xiaochang Li is a postdoctoral fellow in the “Epistemes of Modern Acoustics” research group at the Max Planck Institute for the History of Science, Berlin. Research and writing for this article were additionally aided by the generous support of the NYU Center for the Humanities and the Phyllis & Gerald LeBoff Research Fund. Mara Mills is Associate Professor of Media, Culture, and Communication at New York University, where she co-directs the Center for Disability Studies. Research and writing for this article was supported by the Alexander von Humboldt Foundation and the Max Planck Institute for the History of Science.
Funding Information:
36. Although automatic speech recognition had notable skeptics, it was not subject to outright backlash as was speaker ID. Famously, John R. Pierce, research director at Bell Labs and one of speech recognition’s early proponents, issued a scathing condemnation of the field in 1969: Pierce, “Whither Speech Recognition?” This did not prevent the Defense Advanced Research Projects Agency’s Information Processing Techniques Office (IPTO) from funding a five-year project in “Speech Understanding Research” in 1971. Lawrence Roberts, “Expanding AI Research,” 235. 37. R. K. Potter, “Visible Patterns of Sound,” 463–64. 38. Ibid., 464. 39. Pierce writing as J. J. Coupling, “Portrait of a Voice,” 101, 104, 105.
Publisher Copyright:
© 2019 by the Society for the History of Technology.
PY - 2019/4
Y1 - 2019/4
N2 - This article considers machine methods used in the collection, processing, and application of vocal recordings for speaker identification and speech recognition between 1908 and 1970. The first phonographic archives featured collections of “vocal portraits” that prompted international investigations into the essential features of human voices for individual identification. Visual records of speech later found the same applications, but as “voiceprint identification” via sound spectrography began to achieve legal and commercial success in the 1960s, the procedure attracted more widespread scientific attention, which ultimately discredited both its accuracy and its rationale. At the same time, spectrogram collections spurred a new application—speech recognition by machine. The changing status of the speech spectrogram, from a record of unique features of individual voices to a model of fundamental invariants in speech sounds, was rooted in the demands of automated processing and a corresponding shift from the sound archive to the acoustic database.
AB - This article considers machine methods used in the collection, processing, and application of vocal recordings for speaker identification and speech recognition between 1908 and 1970. The first phonographic archives featured collections of “vocal portraits” that prompted international investigations into the essential features of human voices for individual identification. Visual records of speech later found the same applications, but as “voiceprint identification” via sound spectrography began to achieve legal and commercial success in the 1960s, the procedure attracted more widespread scientific attention, which ultimately discredited both its accuracy and its rationale. At the same time, spectrogram collections spurred a new application—speech recognition by machine. The changing status of the speech spectrogram, from a record of unique features of individual voices to a model of fundamental invariants in speech sounds, was rooted in the demands of automated processing and a corresponding shift from the sound archive to the acoustic database.
UR - http://www.scopus.com/inward/record.url?scp=85068553659&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85068553659&partnerID=8YFLogxK
U2 - 10.1353/tech.2019.0066
DO - 10.1353/tech.2019.0066
M3 - Article
C2 - 31231075
AN - SCOPUS:85068553659
SN - 0040-165X
VL - 60
SP - S129-S160
JO - Technology and Culture
JF - Technology and Culture
IS - 2
ER -