Audio-visual perception of a lecturer in a smart seminar room

R. Stiefelhagen, K. Bernardin, H. K. Ekenel, J. McDonough, K. Nickel, M. Voit, M. Wölfel

Research output: Contribution to journalArticlepeer-review

Abstract

In this paper we present our work on audio-visual perception of a lecturer in a smart seminar room, which is equipped with various cameras and microphones. We present a novel approach to track the lecturer based on visual and acoustic observations in a particle filter framework. This approach does not require explicit triangulation of observations in order to estimate the 3D location of the lecturer, thus allowing for fast audio-visual tracking. We also show how automatic recognition of the lecturer's speech from far-field microphones can be improved using his or her tracked location in the room. Based on the tracked location of the lecturer, we can also detect his or her face in the various camera views for further analysis, such as his or her head orientation and identity. The paper describes the overall system and the various components (tracking, speech recognition, head orientation, identification) in detail and presents results on several multimodal recordings of seminars.

Original languageEnglish (US)
Pages (from-to)3518-3533
Number of pages16
JournalSignal Processing
Volume86
Issue number12
DOIs
StatePublished - Dec 2006

Keywords

  • Audio-visual tracking
  • Face recognition
  • Far-field speech recognition
  • Head pose estimation
  • Multimodal-multisensor interfaces
  • Smart rooms

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Audio-visual perception of a lecturer in a smart seminar room'. Together they form a unique fingerprint.

Cite this