Clustering Contextual Facial Display Sequences
Abstract
We describe a method for learning classes of facial motion patterns
from video of a human interacting with a computerized embodied agent.
The method also learns correlations between the uncovered motion classes and the
current interaction context. Our work is motivated by two hypotheses. First,
a computer user's facial displays will be context dependent,
especially in the presence of an embodied agent. Second, each interactant
will use their face in different ways, for different purposes.
Our method describes facial motion using optical flow over the entire face, projected to the complete
orthogonal basis of Zernike polynomials. A context-dependent mixture of hidden
Markov models (cmHMM) clusters the resulting temporal sequences of feature vectors
into facial display classes. We apply the clustering technique to sequences of
continuous video, in which a single face is tracked and spatially segmented.
We discuss the classes of patterns uncovered for a number of subjects.