Learning temporal structures for human activity recognition

Tiantian Xu, Edward K. Wong

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    We propose a hierarchical method for learning temporal structures for the recognition of complex human activities or actions in videos. Low level features (HOG, HOF, MBHx and MBHy) are first computed from video snippets to form concatenated feature vectors. A novel segmentation algorithm based on K-means clustering is then used to divide the video into segments, with each segment corresponding to a sub-action with uniform motion characteristics. Using low level features as inputs, a many-to-one encoder is trained to extract generalized features for the snippets in each segment. A second many-to-one encoder is then used to compute higher-level features from the generalized features. The higher-level features from individual segments are then concatenated together and used to train a third many-to-one encoder to extract a high-level feature representation for the entire video. The final descriptor is the concatenation of higher-level features from individual segments and the high-level feature for the entire video. Using the proposed descriptor and a mutli-class linear support vector machine (SVM), we achieved state-of-the-art results on datasets Olympic Sports and UCF50, and beat the state-of-the-art result on the challenging HMD51 dataset by a wide margin of 17%.

    Original languageEnglish (US)
    Title of host publicationBritish Machine Vision Conference 2017, BMVC 2017
    PublisherBMVA Press
    ISBN (Electronic)190172560X, 9781901725605
    DOIs
    StatePublished - 2017
    Event28th British Machine Vision Conference, BMVC 2017 - London, United Kingdom
    Duration: Sep 4 2017Sep 7 2017

    Publication series

    NameBritish Machine Vision Conference 2017, BMVC 2017

    Conference

    Conference28th British Machine Vision Conference, BMVC 2017
    CountryUnited Kingdom
    CityLondon
    Period9/4/179/7/17

    ASJC Scopus subject areas

    • Computer Vision and Pattern Recognition

    Fingerprint Dive into the research topics of 'Learning temporal structures for human activity recognition'. Together they form a unique fingerprint.

  • Cite this

    Xu, T., & Wong, E. K. (2017). Learning temporal structures for human activity recognition. In British Machine Vision Conference 2017, BMVC 2017 (British Machine Vision Conference 2017, BMVC 2017). BMVA Press. https://doi.org/10.5244/c.31.160