The recognition of actions from video sequences has many applications in health monitoring, assis... more The recognition of actions from video sequences has many applications in health monitoring, assisted living, surveillance, and smart homes. Despite advances in sensing, in particular related to 3D video, the methodologies to process the data are still subject to research. We demonstrate superior results by a system which combines recurrent neural networks with convolutional neural networks in a voting approach. The gated-recurrent-unit-based neural networks are particularly well-suited to distinguish actions based on long-term information from optical tracking data; the 3D-CNNs focus more on detailed, recent information from video data. The resulting features are merged in an SVM which then classifies the movement. In this architecture, our method improves recognition rates of state-of-the-art methods by 14% on standard data sets.
We introduce Deep Variational Bayes Filters (DVBF), a new method for unsupervised learning and id... more We introduce Deep Variational Bayes Filters (DVBF), a new method for unsupervised learning and identification of latent Markovian state space models. Leveraging recent advances in Stochastic Gradient Variational Bayes, DVBF can overcome intractable inference distributions via variational inference. Thus, it can handle highly nonlinear input data with temporal and spatial dependencies such as image sequences without domain knowledge. Our experiments show that enabling back-propagation through transitions enforces state space assumptions and significantly improves information content of the latent embedding. This also enables realistic long-term prediction.
Convolutional neural networks have been applied to a wide variety of computer vision tasks. Recen... more Convolutional neural networks have been applied to a wide variety of computer vision tasks. Recent advances in semantic segmentation have enabled their application to medical image segmentation. While most CNNs use two-dimensional kernels, recent CNN-based publications on medical image segmentation featured three-dimensional kernels, allowing full access to the three-dimensional structure of medical images. Though closely related to semantic segmentation, medical image segmentation includes specific challenges that need to be addressed, such as the scarcity of labelled data, the high class imbalance found in the ground truth and the high memory demand of three-dimensional images. In this work, a CNN-based method with three-dimensional filters is demonstrated and applied to hand and brain MRI. Two modifications to an existing CNN architecture are discussed, along with methods on addressing the aforementioned challenges. While most of the existing literature on medical image segmentation focuses on soft tissue and the major organs, this work is validated on data both from the central nervous system as well as the bones of the hand.
This short communication presents preliminary results from an extensive investigation of joint mo... more This short communication presents preliminary results from an extensive investigation of joint modelling for the human hand. We use finger and hand movement data recorded from both hands of 110 subjects using passive reflective markers on the skin. Furthermore, we use data which was recorded from a single Thiel-fixated cadaver hand using also passive reflective markers but fixed to the bone. Our data clearly demonstrate that, for wrist and finger joints, hinge joint models are suciently accurate to describe their movement in Cartesian space.
2014 IEEE International Conference on Robotics and Automation (ICRA), 2014
ABSTRACT Estimating human fingertip forces is required to understand force distribution in graspi... more ABSTRACT Estimating human fingertip forces is required to understand force distribution in grasping and manipulation. Human grasping behavior can then be used to develop force-and impedance-based grasping and manipulation strategies for robotic hands. However, estimating human grip force naturally is only possible with instrumented objects or unnatural gloves, thus greatly limiting the type of objects used. In this paper we describe an approach which uses images of the human fingertip to reconstruct grip force and torque at the finger. Our approach does not use finger-mounted equipment, but instead a steady camera observing the fingers of the hand from a distance. This allows for finger force estimation without any physical interference with the hand or object itself, and is therefore universally applicable. We construct a 3-dimensional finger model from 2D images. Convolutional Neural Networks (CNN) are used to predict the 2D image to a 3D model transformation matrix. Two methods of CNN are designed for separate and combined outputs of orientation and position. After learning, our system shows an alignment accuracy over 98% on unknown data. In the final step, a Gaussian process estimates finger force and torque from the aligned images based on color changes and deformations of the nail and its surrounding skin. Experimental results shows that the accuracy achieves about 95% in the force estimation and 90% in the torque.
Unsupervised feature learning has shown impressive results for a wide range of input modalities, ... more Unsupervised feature learning has shown impressive results for a wide range of input modalities, in particular for object classification tasks in computer vision. Using a large amount of unlabeled data, unsupervised feature learning methods are utilized to construct high-level representations that are discriminative enough for subsequently trained supervised classification algorithms. However, it has never been \emph{quantitatively} investigated yet how well unsupervised learning methods can find \emph{low-level representations} for image patches without any additional supervision. In this paper we examine the performance of pure unsupervised methods on a low-level correspondence task, a problem that is central to many Computer Vision applications. We find that a special type of Restricted Boltzmann Machines (RBMs) performs comparably to hand-crafted descriptors. Additionally, a simple binarization scheme produces compact representations that perform better than several state-of-the...
2014 IEEE International Conference on Robotics and Automation (ICRA), 2014
ABSTRACT Estimating human fingertip forces is required to understand force distribution in graspi... more ABSTRACT Estimating human fingertip forces is required to understand force distribution in grasping and manipulation. Human grasping behavior can then be used to develop force-and impedance-based grasping and manipulation strategies for robotic hands. However, estimating human grip force naturally is only possible with instrumented objects or unnatural gloves, thus greatly limiting the type of objects used. In this paper we describe an approach which uses images of the human fingertip to reconstruct grip force and torque at the finger. Our approach does not use finger-mounted equipment, but instead a steady camera observing the fingers of the hand from a distance. This allows for finger force estimation without any physical interference with the hand or object itself, and is therefore universally applicable. We construct a 3-dimensional finger model from 2D images. Convolutional Neural Networks (CNN) are used to predict the 2D image to a 3D model transformation matrix. Two methods of CNN are designed for separate and combined outputs of orientation and position. After learning, our system shows an alignment accuracy over 98% on unknown data. In the final step, a Gaussian process estimates finger force and torque from the aligned images based on color changes and deformations of the nail and its surrounding skin. Experimental results shows that the accuracy achieves about 95% in the force estimation and 90% in the torque.
ABSTRACT Estimating human fingertip forces is required to understand force distribution in graspi... more ABSTRACT Estimating human fingertip forces is required to understand force distribution in grasping and manipulation. Human grasping behavior can then be used to develop force-and impedance-based grasping and manipulation strategies for robotic hands. However, estimating human grip force naturally is only possible with instrumented objects or unnatural gloves, thus greatly limiting the type of objects used. In this paper we describe an approach which uses images of the human fingertip to reconstruct grip force and torque at the finger. Our approach does not use finger-mounted equipment, but instead a steady camera observing the fingers of the hand from a distance. This allows for finger force estimation without any physical interference with the hand or object itself, and is therefore universally applicable. We construct a 3-dimensional finger model from 2D images. Convolutional Neural Networks (CNN) are used to predict the 2D image to a 3D model transformation matrix. Two methods of CNN are designed for separate and combined outputs of orientation and position. After learning, our system shows an alignment accuracy over 98% on unknown data. In the final step, a Gaussian process estimates finger force and torque from the aligned images based on color changes and deformations of the nail and its surrounding skin. Experimental results shows that the accuracy achieves about 95% in the force estimation and 90% in the torque.
The recognition of actions from video sequences has many applications in health monitoring, assis... more The recognition of actions from video sequences has many applications in health monitoring, assisted living, surveillance, and smart homes. Despite advances in sensing, in particular related to 3D video, the methodologies to process the data are still subject to research. We demonstrate superior results by a system which combines recurrent neural networks with convolutional neural networks in a voting approach. The gated-recurrent-unit-based neural networks are particularly well-suited to distinguish actions based on long-term information from optical tracking data; the 3D-CNNs focus more on detailed, recent information from video data. The resulting features are merged in an SVM which then classifies the movement. In this architecture, our method improves recognition rates of state-of-the-art methods by 14% on standard data sets.
We introduce Deep Variational Bayes Filters (DVBF), a new method for unsupervised learning and id... more We introduce Deep Variational Bayes Filters (DVBF), a new method for unsupervised learning and identification of latent Markovian state space models. Leveraging recent advances in Stochastic Gradient Variational Bayes, DVBF can overcome intractable inference distributions via variational inference. Thus, it can handle highly nonlinear input data with temporal and spatial dependencies such as image sequences without domain knowledge. Our experiments show that enabling back-propagation through transitions enforces state space assumptions and significantly improves information content of the latent embedding. This also enables realistic long-term prediction.
Convolutional neural networks have been applied to a wide variety of computer vision tasks. Recen... more Convolutional neural networks have been applied to a wide variety of computer vision tasks. Recent advances in semantic segmentation have enabled their application to medical image segmentation. While most CNNs use two-dimensional kernels, recent CNN-based publications on medical image segmentation featured three-dimensional kernels, allowing full access to the three-dimensional structure of medical images. Though closely related to semantic segmentation, medical image segmentation includes specific challenges that need to be addressed, such as the scarcity of labelled data, the high class imbalance found in the ground truth and the high memory demand of three-dimensional images. In this work, a CNN-based method with three-dimensional filters is demonstrated and applied to hand and brain MRI. Two modifications to an existing CNN architecture are discussed, along with methods on addressing the aforementioned challenges. While most of the existing literature on medical image segmentation focuses on soft tissue and the major organs, this work is validated on data both from the central nervous system as well as the bones of the hand.
This short communication presents preliminary results from an extensive investigation of joint mo... more This short communication presents preliminary results from an extensive investigation of joint modelling for the human hand. We use finger and hand movement data recorded from both hands of 110 subjects using passive reflective markers on the skin. Furthermore, we use data which was recorded from a single Thiel-fixated cadaver hand using also passive reflective markers but fixed to the bone. Our data clearly demonstrate that, for wrist and finger joints, hinge joint models are suciently accurate to describe their movement in Cartesian space.
2014 IEEE International Conference on Robotics and Automation (ICRA), 2014
ABSTRACT Estimating human fingertip forces is required to understand force distribution in graspi... more ABSTRACT Estimating human fingertip forces is required to understand force distribution in grasping and manipulation. Human grasping behavior can then be used to develop force-and impedance-based grasping and manipulation strategies for robotic hands. However, estimating human grip force naturally is only possible with instrumented objects or unnatural gloves, thus greatly limiting the type of objects used. In this paper we describe an approach which uses images of the human fingertip to reconstruct grip force and torque at the finger. Our approach does not use finger-mounted equipment, but instead a steady camera observing the fingers of the hand from a distance. This allows for finger force estimation without any physical interference with the hand or object itself, and is therefore universally applicable. We construct a 3-dimensional finger model from 2D images. Convolutional Neural Networks (CNN) are used to predict the 2D image to a 3D model transformation matrix. Two methods of CNN are designed for separate and combined outputs of orientation and position. After learning, our system shows an alignment accuracy over 98% on unknown data. In the final step, a Gaussian process estimates finger force and torque from the aligned images based on color changes and deformations of the nail and its surrounding skin. Experimental results shows that the accuracy achieves about 95% in the force estimation and 90% in the torque.
Unsupervised feature learning has shown impressive results for a wide range of input modalities, ... more Unsupervised feature learning has shown impressive results for a wide range of input modalities, in particular for object classification tasks in computer vision. Using a large amount of unlabeled data, unsupervised feature learning methods are utilized to construct high-level representations that are discriminative enough for subsequently trained supervised classification algorithms. However, it has never been \emph{quantitatively} investigated yet how well unsupervised learning methods can find \emph{low-level representations} for image patches without any additional supervision. In this paper we examine the performance of pure unsupervised methods on a low-level correspondence task, a problem that is central to many Computer Vision applications. We find that a special type of Restricted Boltzmann Machines (RBMs) performs comparably to hand-crafted descriptors. Additionally, a simple binarization scheme produces compact representations that perform better than several state-of-the...
2014 IEEE International Conference on Robotics and Automation (ICRA), 2014
ABSTRACT Estimating human fingertip forces is required to understand force distribution in graspi... more ABSTRACT Estimating human fingertip forces is required to understand force distribution in grasping and manipulation. Human grasping behavior can then be used to develop force-and impedance-based grasping and manipulation strategies for robotic hands. However, estimating human grip force naturally is only possible with instrumented objects or unnatural gloves, thus greatly limiting the type of objects used. In this paper we describe an approach which uses images of the human fingertip to reconstruct grip force and torque at the finger. Our approach does not use finger-mounted equipment, but instead a steady camera observing the fingers of the hand from a distance. This allows for finger force estimation without any physical interference with the hand or object itself, and is therefore universally applicable. We construct a 3-dimensional finger model from 2D images. Convolutional Neural Networks (CNN) are used to predict the 2D image to a 3D model transformation matrix. Two methods of CNN are designed for separate and combined outputs of orientation and position. After learning, our system shows an alignment accuracy over 98% on unknown data. In the final step, a Gaussian process estimates finger force and torque from the aligned images based on color changes and deformations of the nail and its surrounding skin. Experimental results shows that the accuracy achieves about 95% in the force estimation and 90% in the torque.
ABSTRACT Estimating human fingertip forces is required to understand force distribution in graspi... more ABSTRACT Estimating human fingertip forces is required to understand force distribution in grasping and manipulation. Human grasping behavior can then be used to develop force-and impedance-based grasping and manipulation strategies for robotic hands. However, estimating human grip force naturally is only possible with instrumented objects or unnatural gloves, thus greatly limiting the type of objects used. In this paper we describe an approach which uses images of the human fingertip to reconstruct grip force and torque at the finger. Our approach does not use finger-mounted equipment, but instead a steady camera observing the fingers of the hand from a distance. This allows for finger force estimation without any physical interference with the hand or object itself, and is therefore universally applicable. We construct a 3-dimensional finger model from 2D images. Convolutional Neural Networks (CNN) are used to predict the 2D image to a 3D model transformation matrix. Two methods of CNN are designed for separate and combined outputs of orientation and position. After learning, our system shows an alignment accuracy over 98% on unknown data. In the final step, a Gaussian process estimates finger force and torque from the aligned images based on color changes and deformations of the nail and its surrounding skin. Experimental results shows that the accuracy achieves about 95% in the force estimation and 90% in the torque.
Uploads
Papers by Patrick van der Smagt