Action and Gesture Recognition

Recognition of human gestures and actions has become a research area of great interest as it has many potential application domains including human-computer interfaces, sign language interpretation, interactive video games, and automated surveillance of parking lots and ATMs. In recent years, many approaches to human activity recognition have been presented. There are two major types of variation in the way people perform different activities: temporal and spatial variation. Temporal variation is caused by the difference in duration of performing activities and it is usually effectively compensated by using Hidden Markov Models (HMM). Spatial variation, on the other hand, is due to the fact that people have different ways of performing activities and different physics. Compensating for spatial variations is more difficult and usually requires lengthy and non-optimal training.

In general, motion-based activity recognition systems are vulnerable to spatial variation caused by unintended motion. We feel that this problem can be handled efficiently with an appearance-based approach. It is an interesting idea to discard motion information and use purely posture information without any explicit human model for activity recognition. However, the usage of exemplar poses becomes impractical as the number of activities increases.


Fig. 1. Activity recognition

Action occurring in an image sequence can be seen as temporal evolution of deforming objects, which can be well described with dynamic textures. Therefore, our method based on spatiotemporal dynamic texture descriptors provides an excellent starting point for this research. We would like to find an effective method to recognize activities in order to develop solutions needed for building emerging applications, for example to help monitoring elderly people living alone at home. For instance, knowledge of whether a person is sitting, standing, (as shown in Fig. 1), or walking is likely to matter as well as how he is using his hands, or which way he is facing, and whether he is talking, if his activity is normal or abnormal. Based on this kind of information the environment may adapt to the activities and provide desired responses to relevant events.

Early work based on sequences of postures:

In 2005, we proposed a silhouette method, in which affine invariant Fourier descriptors are used to describe the human pose in a frame. A support vector machine classifier is used for recognizing the posture class and Hidden Markov Models are used for classifying posture sequences.

Kellokumpu V, Pietikäinen M & Heikkilä J (2005) Human activity recognition using sequences of postures. Proc. IAPR Conference on Machine Vision Applications (MVA 2005), Tsukuba Science City, Japan, 570-573.

Recognition of actions with texture-based descriptors

Human motion can be seen as a type of texture pattern. We adopted the ideas of spatiotemporal analysis and the use local features for motion description. Two methods are proposed and the application was extended to gait recognition.

1) Static texture operators with temporal templates for activity recognition

This method uses temporal templates to capture movement dynamics and then uses texture features to characterize the observed movements (Fig. 2).


Fig. 2: Illustration of the formation of the feature histogram. In this frame the top two subimages in MHI have high weights compared to the bottom two


Activity recognition

2) dynamic texture based method for activity recognition

We extend the first method into a spatiotemporal space and describe human movements with dynamic texture features (Fig. 3). Following recent trends in computer vision, the method is designed to work with image data rather than silhouettes. The proposed methods are computationally simple and suitable for various applications.

3) Dynamic texture descriptor for gait recognition.

We have done the research on gait recognition for signifying the identification of individuals in image sequences 'by the way they walk'. We presented a novel approach for human gait recognition that inherently combines appearance and motion. Dynamic texture descriptors, Local Binary Patterns from Three Orthogonal Planes (LBP-TOP), are used to describe human gait in a spatiotemporal way (Fig. 4). We also propose a new coding of multi-resolution uniform Local Binary Patterns and use it in the construction of spatiotemporal LBP histograms. We show the suitability of the representation for gait recognition and test our method on a popular CMU MoBo dataset.


Fig. 4: Illustration of a person walking and the corresponding xt and yt planes from a single row and column. The different frames correspond to the xy planes.


Kellokumpu V, Zhao G & Pietikäinen M (2010) Recognition of human actions using texture descriptors. Machine Vision and Applications, in press.

Kellokumpu V, Zhao G & Pietikäinen M (2008) Texture based description of movements for activity analysis. Proc. Third International Conference on Computer Vision Theory and Applications (VISAPP 2008), Madeira, Portugal, 1:206-213.

Kellokumpu V, Zhao G & Pietikäinen M (2008) Human activity recognition using a dynamic texture based method.Proc. The British Machine Vision Conference (BMVC 2008), Leeds, UK, 10 p.)

Kellokumpu V, Zhao G, Li SZ & Pietikäinen M (2009) Dynamic texture based gait recognition. In: Advances in Biometrics, ICB 2009 Proceedings, Lecture Notes in Computer Science 5558, 1000-1009.

Related projects:


CMV/Research/ActionAndGestureRecognition (last edited 2011-11-19 15:09:25 by WebMaster)