An FPGA-based stream processor for embedded real-time vision with convolutional networks

Clément Farabet, Cyril Poulet, Yann LeCun

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Many recent visual recognition systems can be seen as being composed of multiple layers of convolutional filter banks, interspersed with various types of non-linearities. This includes Convolutional Networks, HMAX-type architectures, as well as systems based on dense SIFT features or Histogram of Gradients. This paper describes a highlycompact and low power embedded system that can run such vision systems at very high speed. A custom board built around a Xilinx Virtex-4 FPGA was built and tested. It measures 70 x 80 mm, and the complete system - FPGA, camera, memory chips, flash - consumes 15 watts in peak, and is capable of more than 4 × 109 multiply-accumulate operations per second in real vision application. This enables real-time implementations of object detection, object recognition, and vision-based navigation algorithms in small-size robots, micro-UAVs, and hand-held devices. Real-time face detection is demonstrated, with speeds of 10 frames per second at VGA resolution.

Original languageEnglish (US)
Title of host publication2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops 2009
Pages878-885
Number of pages8
DOIs
StatePublished - 2009
Event2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops 2009 - Kyoto, Japan
Duration: Sep 27 2009Oct 4 2009

Publication series

Name2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops 2009

Other

Other2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops 2009
Country/TerritoryJapan
CityKyoto
Period9/27/0910/4/09

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'An FPGA-based stream processor for embedded real-time vision with convolutional networks'. Together they form a unique fingerprint.

Cite this