TY - GEN
T1 - An FPGA-based stream processor for embedded real-time vision with convolutional networks
AU - Farabet, Clément
AU - Poulet, Cyril
AU - LeCun, Yann
PY - 2009
Y1 - 2009
N2 - Many recent visual recognition systems can be seen as being composed of multiple layers of convolutional filter banks, interspersed with various types of non-linearities. This includes Convolutional Networks, HMAX-type architectures, as well as systems based on dense SIFT features or Histogram of Gradients. This paper describes a highlycompact and low power embedded system that can run such vision systems at very high speed. A custom board built around a Xilinx Virtex-4 FPGA was built and tested. It measures 70 x 80 mm, and the complete system - FPGA, camera, memory chips, flash - consumes 15 watts in peak, and is capable of more than 4 × 109 multiply-accumulate operations per second in real vision application. This enables real-time implementations of object detection, object recognition, and vision-based navigation algorithms in small-size robots, micro-UAVs, and hand-held devices. Real-time face detection is demonstrated, with speeds of 10 frames per second at VGA resolution.
AB - Many recent visual recognition systems can be seen as being composed of multiple layers of convolutional filter banks, interspersed with various types of non-linearities. This includes Convolutional Networks, HMAX-type architectures, as well as systems based on dense SIFT features or Histogram of Gradients. This paper describes a highlycompact and low power embedded system that can run such vision systems at very high speed. A custom board built around a Xilinx Virtex-4 FPGA was built and tested. It measures 70 x 80 mm, and the complete system - FPGA, camera, memory chips, flash - consumes 15 watts in peak, and is capable of more than 4 × 109 multiply-accumulate operations per second in real vision application. This enables real-time implementations of object detection, object recognition, and vision-based navigation algorithms in small-size robots, micro-UAVs, and hand-held devices. Real-time face detection is demonstrated, with speeds of 10 frames per second at VGA resolution.
UR - http://www.scopus.com/inward/record.url?scp=77953224252&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77953224252&partnerID=8YFLogxK
U2 - 10.1109/ICCVW.2009.5457611
DO - 10.1109/ICCVW.2009.5457611
M3 - Conference contribution
AN - SCOPUS:77953224252
SN - 9781424444427
T3 - 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops 2009
SP - 878
EP - 885
BT - 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops 2009
T2 - 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops 2009
Y2 - 27 September 2009 through 4 October 2009
ER -