Recognizing the content of human actions is an important skill in our adaptation to the environment. We rely on our visual system to constantly interpret the actions, intentions, physical and emotional states of the people around us. In this work, we provide a computational investigation of the visual representations that enable the visual system to decode this rich modality of information. Using biological motion as the starting point, we investigated the role of mid-level motion, disparity and form cues in biological motion and action recognition at large. The joint representations of motion and disparity seem to approximate the selectivity properties of primate medial temporal area cells. When we evaluated the mid-level visual features in an action recognition task, we found that the form cue carries sufficient diagnostic information for actions performed by realistic actors. However, form cues fail drastically for point-light displays where the actor composed of 10-12 moving markers. Our results suggest that learning the correspondence between motion and disparity might cues explain how people can readily recognize the actions shown in point-light displays without any training. Reflecting human psychophysics performance, the correspondence between motion and disparity also provides insight on how people perceive the under-constrained, ambiguous depth structure in point-light displays as a congruent figure. In conclusion, we reason that the joint learning of motion and disparity can be the basis of structural representations used in action recognition.
"Exploring the Role of Motion and Depth in Action Perception"
Cognitive Sciences Theses and Dissertations.
Brown Digital Repository. Brown University Library.