TY - GEN
T1 - Kalman filters for audio-video source localization
AU - Gehrig, Tobias
AU - Nickel, Kai
AU - Ekenel, Hazim Kemal
AU - Klee, Ulrich
AU - McDonough, John
PY - 2005
Y1 - 2005
N2 - In prior work, we proposed using an extended Kalman filter to directly update position estimates in a speaker localization system based on time delays of arrival. We found that such a scheme provided superior tracking quality as compared with the conventional closed-form approximation methods. In this work, we enhance our audio localizer with video information. We propose an algorithm to incorporate detected face positions in different camera views into the Kalman filter without doing any explicit triangulation. This approach yields a robust source localizer that functions reliably both for segments wherein the speaker is silent, which would be detrimental for an audio only tracker, and wherein many faces appear, which would confuse a video only tracker. We tested our algorithm on a data set consisting of seminars held by actual speakers. Our experiments revealed that the audio-video localizer functioned better than a localizer based solely on audio or solely on video features.
AB - In prior work, we proposed using an extended Kalman filter to directly update position estimates in a speaker localization system based on time delays of arrival. We found that such a scheme provided superior tracking quality as compared with the conventional closed-form approximation methods. In this work, we enhance our audio localizer with video information. We propose an algorithm to incorporate detected face positions in different camera views into the Kalman filter without doing any explicit triangulation. This approach yields a robust source localizer that functions reliably both for segments wherein the speaker is silent, which would be detrimental for an audio only tracker, and wherein many faces appear, which would confuse a video only tracker. We tested our algorithm on a data set consisting of seminars held by actual speakers. Our experiments revealed that the audio-video localizer functioned better than a localizer based solely on audio or solely on video features.
UR - http://www.scopus.com/inward/record.url?scp=33645672078&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33645672078&partnerID=8YFLogxK
U2 - 10.1109/ASPAA.2005.1540183
DO - 10.1109/ASPAA.2005.1540183
M3 - Conference contribution
AN - SCOPUS:33645672078
SN - 0780391543
SN - 9780780391543
T3 - IEEE Workshop on Applications of Signal Processing to Audio and Acoustics
SP - 118
EP - 121
BT - 2005 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics
T2 - 2005 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics
Y2 - 16 October 2005 through 19 October 2005
ER -