TY - JOUR
T1 - Intelligible encoding of ASL image sequences at extremely low information rates
AU - Sperling, George
AU - Landy, Michael
AU - Cohen, Yoav
AU - Pavel, M.
N1 - Funding Information:
The work on the image processing of American Sign Language was supported by National Science Foundation, Science and Technology to Aid the Handicapped Grant No. PFR-80171189. The preparation of this article was supported by the NSF and by USAF, Life Sciences Directorate, Grant AFOSR 80-0279. Special thanks to August Vanderbeek whose knowledge of ASL, rapport with the deaf community, and hard work were an essential ingredient of these studies. We appreciate the help we have received by many persons in the deaf community and schools for the deaf, including Dr. Jerome Schein, Director of the Deafness Research and Training Center at NYU; the staff of Public School 47, and especially Mrs. O’Shay, Mr. Jeff Rothchild, and Ms. Pakula; Mr. Ziev of the New York Society for the Deaf; and Ms. Solomon of the Hebrew Association of the Deaf. Dr. Nancy Frishberg provided essential guidance in the construction of the intelligibility test materials. We would like to thank our patient deaf signers, Ellen Roth and Alec Naimen. We also thank 0. R. Mitchell, who made available his computer programs for block truncation coding. Finally, we wish to acknowledge the skillful technical assistance of Thomas Riedl and Robert Picardi.
PY - 1985/9
Y1 - 1985/9
N2 - American Sign Language (ASL) is a gestural language used by the hearing impaired. This paper describes experimental tests with deaf subjects that compared the most effective known methods of creating extremely compressed ASL images. The minimum requirements for intelligibility were determined for three basically different kinds of transformations: (1) gray-scale transformations that subsample the images in space and time; (2) two-level intensity quantization that converts the gray scale image into a black-and-white approximation; (3) transformations that convert the images into black and white outline drawings (cartoons). In Experiment 1, five subjects made quality ratings of 81 kinds of images that varied in spatial resolution, frame rate, and type of transformation. The most promising image size was 96 × 64 pixels (height × width). The 17 most promising image transformations were selected for formal intelligibility testing: 38 deaf subjects viewed 87 ASL sequences 1-2 s long of each transformation. The most effective code for gray-scale images is an analog raster code, which can produce images with 0.86 normalized intelligibility (I) at a bandwidth of 2,880 Hz and therefore is transmittable on ordinary 3 KHz telephone circuits. For the binary images, a number of coding schemes are described and compared, the most efficient being an extension of the quadtree method, here termed binquad coding which yielded I = 0.68 at 7,500 bits per second (bps). For cartoons, an even more efficient polygonal transformation with a victorgraph code yielding, for connected straight line segments, is proposed, together with a vectorgraph code yielding, for example, I = 0.56 at 3,900 bps and I = 0.70 at 6,000 bps. Polygonally transformed cartoons offer the possibility of telephonic ASL communication at 4,800 bps. Several combinations of binary image transformations and encoding schemes offer I > 80% at 9,600 bps.
AB - American Sign Language (ASL) is a gestural language used by the hearing impaired. This paper describes experimental tests with deaf subjects that compared the most effective known methods of creating extremely compressed ASL images. The minimum requirements for intelligibility were determined for three basically different kinds of transformations: (1) gray-scale transformations that subsample the images in space and time; (2) two-level intensity quantization that converts the gray scale image into a black-and-white approximation; (3) transformations that convert the images into black and white outline drawings (cartoons). In Experiment 1, five subjects made quality ratings of 81 kinds of images that varied in spatial resolution, frame rate, and type of transformation. The most promising image size was 96 × 64 pixels (height × width). The 17 most promising image transformations were selected for formal intelligibility testing: 38 deaf subjects viewed 87 ASL sequences 1-2 s long of each transformation. The most effective code for gray-scale images is an analog raster code, which can produce images with 0.86 normalized intelligibility (I) at a bandwidth of 2,880 Hz and therefore is transmittable on ordinary 3 KHz telephone circuits. For the binary images, a number of coding schemes are described and compared, the most efficient being an extension of the quadtree method, here termed binquad coding which yielded I = 0.68 at 7,500 bits per second (bps). For cartoons, an even more efficient polygonal transformation with a victorgraph code yielding, for connected straight line segments, is proposed, together with a vectorgraph code yielding, for example, I = 0.56 at 3,900 bps and I = 0.70 at 6,000 bps. Polygonally transformed cartoons offer the possibility of telephonic ASL communication at 4,800 bps. Several combinations of binary image transformations and encoding schemes offer I > 80% at 9,600 bps.
UR - http://www.scopus.com/inward/record.url?scp=0022130687&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0022130687&partnerID=8YFLogxK
U2 - 10.1016/0734-189X(85)90034-9
DO - 10.1016/0734-189X(85)90034-9
M3 - Article
AN - SCOPUS:0022130687
SN - 0734-189X
VL - 31
SP - 335
EP - 391
JO - Computer Vision, Graphics and Image Processing
JF - Computer Vision, Graphics and Image Processing
IS - 3
ER -