We present a semi-automated system for fabricating figurines with faces that are personalised to ... more We present a semi-automated system for fabricating figurines with faces that are personalised to the individual likeness of the customer. The efficacy of the system has been demonstrated by commercial deployments at Walt Disney World Resort and Star Wars Celebration VI in Orlando Florida. Although the system is semi automated, human intervention is limited to a few simple tasks to maintain the high throughput and consistent quality required for commercial application. In contrast to existing systems that fabricate custom heads that are assembled to pre-fabricated plastic bodies, our system seamlessly integrates 3D facial data with a predefined figurine body into a unique and continuous object that is fabricated as a single piece. The combination of state-of-the-art 3D capture, modelling, and printing that are the core of our system provide the flexibility to fabricate figurines whose complexity is only limited by the creativity of the designer.
Linear models, particularly those based on principal component analysis (PCA), have been used suc... more Linear models, particularly those based on principal component analysis (PCA), have been used successfully on a broad range of human face-related applications. Although PCA models achieve high compression, they have not been widely used for animation in a production environment because their bases lack a semantic interpretation. Their parameters are not an intuitive set for animators to work with. In
Abstract Systems that attempt to recover the spoken word from image sequences usually require com... more Abstract Systems that attempt to recover the spoken word from image sequences usually require complicated models of the mouth and its motions. Here we describe a new approach based on a fast mathematical morphology transform called the sieve. We form statistics of scale measurements in one and two dimensions and these are used as a feature vector for standard Hidden Markov Models (HMMs)
ABSTRACT This paper compares three methods of lipreading for visual and audio-visual speech recog... more ABSTRACT This paper compares three methods of lipreading for visual and audio-visual speech recognition. Lip shape information is obtained using an Active Shape Model (ASM) lip tracker but is not as effective as modelling the combined shape and enclosed greylevel surface using an Active Appearance Model (AAM). A nontracked alternative is a nonlinear transform of the image using a multiscale spatial analysis (MSA).
Two quite different strategies for characterising mouth shapes for visual speech recognition (lip... more Two quite different strategies for characterising mouth shapes for visual speech recognition (lipreading) are compared. The first strategy extracts the parameters required to fit an active shape model (ASM) to the outline of the lips. The second uses a feature derived from a one-dimensional multiscale spatial analysis (MSA) of the mouth region using a new processor derived from mathematical morphology and median filtering.
Abstract The multimodal nature of speech is often ignored in human-computer interaction, but lip ... more Abstract The multimodal nature of speech is often ignored in human-computer interaction, but lip deformations and other body motion, such as those of the head, convey additional information. We integrate speech cues from many sources and this improves intelligibility, especially when the acoustic signal is degraded. The paper shows how this additional, often complementary, visual speech information can be used for speech recognition.
2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010
Abstract Videos of multi-player team sports provide a challenging domain for dynamic scene analys... more Abstract Videos of multi-player team sports provide a challenging domain for dynamic scene analysis. Player actions and interactions are complex as they are driven by many factors, such as the short-term goals of the individual player, the overall team strategy, the rules of the sport, and the current context of the game. We show that constrained multi-agent events can be analyzed and even predicted from video. Such analysis requires estimating the global movements of all players in the scene at any time, and is needed for modeling and ...
Proceedings of the 10th European Conference on Visual Media Production - CVMP '13, 2013
ABSTRACT Animation of models often introduces distortions to their parameterisation, as these are... more ABSTRACT Animation of models often introduces distortions to their parameterisation, as these are typically optimised for a single frame. The net effect is that under deformation, the mapped features, i.e. UV texture maps, bump maps or displacement maps, may appear to stretch or scale in an undesirable way. Ideally, what we would like is for the appearance of such features to remain feasible given any underlying deformation. In this paper we introduce a real-time technique that reduces such distortions based on a distortion control (rigidity) map. In two versions of our proposed technique, the parameter space is warped in either an axis or a non-axis aligned manner based on the minimisation of a non-linear distortion metric. This in turn is solved using a highly optimised hybrid CPU-GPU strategy. The result is real-time dynamic content-aware texturing that reduces distortions in a controlled way. The technique can be applied to reduce distortions in a variety of scenarios, including reusing a low geometric complexity animated sequence with a multitude of detail maps, dynamic procedurally defined features mapped on deformable geometry and animation authoring previews on texture-mapped models.
IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2000
For facial expression recognition systems to be applicable in the real world, they need to be abl... more For facial expression recognition systems to be applicable in the real world, they need to be able to detect and track a previously unseen person's face and its facial movements accurately in realistic environments. A highly plausible solution involves performing a "dense" form of alignment, where 60-70 fiducial facial points are tracked with high accuracy. The problem is that, in practice, this type of dense alignment had so far been impossible to achieve in a generic sense, mainly due to poor reliability and robustness. Instead, many expression detection methods have opted for a "coarse" form of face alignment, followed by an application of a biologically inspired appearance descriptor such as the histogram of oriented gradients or Gabor magnitudes. Encouragingly, recent advances to a number of dense alignment algorithms have demonstrated both high reliability and accuracy for unseen subjects [e.g., constrained local models (CLMs)]. This begs the question: Aside from countering against illumination variation, what do these appearance descriptors do that standard pixel representations do not? In this paper, we show that, when close to perfect alignment is obtained, there is no real benefit in employing these different appearance-based representations (under consistent illumination conditions). In fact, when misalignment does occur, we show that these appearance descriptors do work well by encoding robustness to alignment error. For this work, we compared two popular methods for dense alignment-subject-dependent active appearance models versus subject-independent CLMs-on the task of action-unit detection. These comparisons were conducted through a battery of experiments across various publicly available data sets (i.e., CK+, Pain, M3, and GEMEP-FERA). We also report our performance in the recent 2011 Facial Expression Recognition and Analysis Challenge for the subject-independent task.
Described herein are methods, systems, apparatuses and products for utilizing motion fields to pr... more Described herein are methods, systems, apparatuses and products for utilizing motion fields to predict evolution in dynamic scenes. One aspect provides for accessing active object position data including positioning information of a plurality of individual active objects; extracting a plurality of individual active object motions from the active object position data; constructing a motion field using the plurality of individual active object motions; and using the motion field to predict one or more points of convergence at one or more spatial ...
We present a semi-automated system for fabricating figurines with faces that are personalised to ... more We present a semi-automated system for fabricating figurines with faces that are personalised to the individual likeness of the customer. The efficacy of the system has been demonstrated by commercial deployments at Walt Disney World Resort and Star Wars Celebration VI in Orlando Florida. Although the system is semi automated, human intervention is limited to a few simple tasks to maintain the high throughput and consistent quality required for commercial application. In contrast to existing systems that fabricate custom heads that are assembled to pre-fabricated plastic bodies, our system seamlessly integrates 3D facial data with a predefined figurine body into a unique and continuous object that is fabricated as a single piece. The combination of state-of-the-art 3D capture, modelling, and printing that are the core of our system provide the flexibility to fabricate figurines whose complexity is only limited by the creativity of the designer.
Linear models, particularly those based on principal component analysis (PCA), have been used suc... more Linear models, particularly those based on principal component analysis (PCA), have been used successfully on a broad range of human face-related applications. Although PCA models achieve high compression, they have not been widely used for animation in a production environment because their bases lack a semantic interpretation. Their parameters are not an intuitive set for animators to work with. In
Abstract Systems that attempt to recover the spoken word from image sequences usually require com... more Abstract Systems that attempt to recover the spoken word from image sequences usually require complicated models of the mouth and its motions. Here we describe a new approach based on a fast mathematical morphology transform called the sieve. We form statistics of scale measurements in one and two dimensions and these are used as a feature vector for standard Hidden Markov Models (HMMs)
ABSTRACT This paper compares three methods of lipreading for visual and audio-visual speech recog... more ABSTRACT This paper compares three methods of lipreading for visual and audio-visual speech recognition. Lip shape information is obtained using an Active Shape Model (ASM) lip tracker but is not as effective as modelling the combined shape and enclosed greylevel surface using an Active Appearance Model (AAM). A nontracked alternative is a nonlinear transform of the image using a multiscale spatial analysis (MSA).
Two quite different strategies for characterising mouth shapes for visual speech recognition (lip... more Two quite different strategies for characterising mouth shapes for visual speech recognition (lipreading) are compared. The first strategy extracts the parameters required to fit an active shape model (ASM) to the outline of the lips. The second uses a feature derived from a one-dimensional multiscale spatial analysis (MSA) of the mouth region using a new processor derived from mathematical morphology and median filtering.
Abstract The multimodal nature of speech is often ignored in human-computer interaction, but lip ... more Abstract The multimodal nature of speech is often ignored in human-computer interaction, but lip deformations and other body motion, such as those of the head, convey additional information. We integrate speech cues from many sources and this improves intelligibility, especially when the acoustic signal is degraded. The paper shows how this additional, often complementary, visual speech information can be used for speech recognition.
2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010
Abstract Videos of multi-player team sports provide a challenging domain for dynamic scene analys... more Abstract Videos of multi-player team sports provide a challenging domain for dynamic scene analysis. Player actions and interactions are complex as they are driven by many factors, such as the short-term goals of the individual player, the overall team strategy, the rules of the sport, and the current context of the game. We show that constrained multi-agent events can be analyzed and even predicted from video. Such analysis requires estimating the global movements of all players in the scene at any time, and is needed for modeling and ...
Proceedings of the 10th European Conference on Visual Media Production - CVMP '13, 2013
ABSTRACT Animation of models often introduces distortions to their parameterisation, as these are... more ABSTRACT Animation of models often introduces distortions to their parameterisation, as these are typically optimised for a single frame. The net effect is that under deformation, the mapped features, i.e. UV texture maps, bump maps or displacement maps, may appear to stretch or scale in an undesirable way. Ideally, what we would like is for the appearance of such features to remain feasible given any underlying deformation. In this paper we introduce a real-time technique that reduces such distortions based on a distortion control (rigidity) map. In two versions of our proposed technique, the parameter space is warped in either an axis or a non-axis aligned manner based on the minimisation of a non-linear distortion metric. This in turn is solved using a highly optimised hybrid CPU-GPU strategy. The result is real-time dynamic content-aware texturing that reduces distortions in a controlled way. The technique can be applied to reduce distortions in a variety of scenarios, including reusing a low geometric complexity animated sequence with a multitude of detail maps, dynamic procedurally defined features mapped on deformable geometry and animation authoring previews on texture-mapped models.
IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2000
For facial expression recognition systems to be applicable in the real world, they need to be abl... more For facial expression recognition systems to be applicable in the real world, they need to be able to detect and track a previously unseen person's face and its facial movements accurately in realistic environments. A highly plausible solution involves performing a "dense" form of alignment, where 60-70 fiducial facial points are tracked with high accuracy. The problem is that, in practice, this type of dense alignment had so far been impossible to achieve in a generic sense, mainly due to poor reliability and robustness. Instead, many expression detection methods have opted for a "coarse" form of face alignment, followed by an application of a biologically inspired appearance descriptor such as the histogram of oriented gradients or Gabor magnitudes. Encouragingly, recent advances to a number of dense alignment algorithms have demonstrated both high reliability and accuracy for unseen subjects [e.g., constrained local models (CLMs)]. This begs the question: Aside from countering against illumination variation, what do these appearance descriptors do that standard pixel representations do not? In this paper, we show that, when close to perfect alignment is obtained, there is no real benefit in employing these different appearance-based representations (under consistent illumination conditions). In fact, when misalignment does occur, we show that these appearance descriptors do work well by encoding robustness to alignment error. For this work, we compared two popular methods for dense alignment-subject-dependent active appearance models versus subject-independent CLMs-on the task of action-unit detection. These comparisons were conducted through a battery of experiments across various publicly available data sets (i.e., CK+, Pain, M3, and GEMEP-FERA). We also report our performance in the recent 2011 Facial Expression Recognition and Analysis Challenge for the subject-independent task.
Described herein are methods, systems, apparatuses and products for utilizing motion fields to pr... more Described herein are methods, systems, apparatuses and products for utilizing motion fields to predict evolution in dynamic scenes. One aspect provides for accessing active object position data including positioning information of a plurality of individual active objects; extracting a plurality of individual active object motions from the active object position data; constructing a motion field using the plurality of individual active object motions; and using the motion field to predict one or more points of convergence at one or more spatial ...
Uploads
Papers by I. Matthews