Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
German I. Parisi
  • Hamburg, Germany

German I. Parisi

Human-Robot Interaction (HRI) has become an increasingly interesting area of study among developmental roboticists since robot learning can be speeded up with the use of parent-like trainers who deliver useful advice allowing robots to... more
Human-Robot Interaction (HRI) has become an increasingly interesting area of study among developmental roboticists since robot learning can be speeded up with the use of parent-like trainers who deliver useful advice allowing robots to learn a specific task in less time than a robot exploring the environment autonomously. In this regard, the parent-like trainer guides the apprentice robot with actions that allow to enhance its performance in the same manner as external caregivers may support infants in the accomplishment of a given task, with the provided support frequently decreasing over time. This teaching technique has become known as parental scaffolding. When interacting with their caregivers, infants are subject to different environmental stimuli which can be present in various modalities. In general terms, it is possible to think about some of those stimuli as guidance that the parent-like trainer delivers to the apprentice agent. Nevertheless, when more modalities are considered, issues can emerge regarding the interpretation and integration of multi-modal information, especially when multiple sources are conflicting or ambiguous. As a consequence, the advice may not be clear and misunderstood, and hence, may lead the apprentice agent to a decreased performance when solving a task.
Research Interests:
Robots in domestic environments are receiving more attention, especially in scenarios where they should interact with parent-like trainers for dynamically acquiring and refining knowledge. A prominent paradigm for dynamically learning new... more
Robots in domestic environments are receiving more attention, especially in scenarios where they should interact with parent-like trainers for dynamically acquiring and refining knowledge. A prominent paradigm for dynamically learning new tasks has been reinforcement learning. However , due to excessive time needed for the learning process, a promising extension has been made by incorporating an external parent-like trainer into the learning cycle in order to scaffold and speed up the apprenticeship using advice about what actions should be performed for achieving a goal. In interactive reinforcement learning, different uni-modal control interfaces have been proposed that are often quite limited and do not take into account multiple sensor modalities. In this paper, we propose the integration of audiovisual patterns to provide advice to the agent using multi-modal information. In our approach, advice can be given using either speech, gestures, or a combination of both. We introduce a neural network-based approach to integrate multi-modal information from uni-modal modules based on their confidence. Results show that multi-modal integration leads to a better performance of interactive reinforcement learning with the robot being able to learn faster with greater rewards compared to uni-modal scenarios.
Research Interests:
Lifelong learning is fundamental in autonomous robotics for the incremental acquisition of knowledge through experience. However, most of the current deep neural models for action recognition from videos do not account for lifelong... more
Lifelong learning is fundamental in autonomous robotics for the incremental acquisition of knowledge through experience. However, most of the current deep neural models for action recognition from videos do not account for lifelong learning, but rather learn a batch of training actions. Consequently, there is the need to design learning systems with the ability to incrementally process available perceptual cues and to adapt their behavioral responses over time. We propose a self-organizing neural network architecture for incrementally learning action sequences from videos. The architecture comprises growing self-organizing networks equipped with recurrent connectivity for dealing with time-varying patterns. We use a set of hierarchically-arranged recurrent networks for the unsupervised learning of action representations with increasingly large spatiotemporal receptive fields. The recurrent dynamics modulating neural growth drive the adaptation of the networks to the non-stationary input distribution during the learning phase. We show how our model accounts for an action classification task with a benchmark dataset also in the case of occasionally missing or incorrect sample labels.
Research Interests:
Spatial attention in humans and animals involves the visual pathway and the superior colliculus, which integrate multimodal information. Recent research has shown that affective stimuli play an important role in attentional mechanisms,... more
Spatial attention in humans and animals involves the visual pathway and the superior colliculus, which integrate multimodal information. Recent research has shown that affective stimuli play an important role in attentional mechanisms, and behavioral studies show that the focus of attention in a given region of the visual field is increased when affective stimuli are present. This work proposes a neurocomputational model that learns to attend to emotional expressions and to modulate emotion recognition. Our model consists of a deep architecture which implements convolutional neural networks to learn the location of emotional expressions in a cluttered scene. We performed a number of experiments for detecting regions of interest, based on emotion stimuli, and show that the attention model improves emotion expression recognition when used as emotional attention modulator. Finally, we analyze the internal representations of the learned neural filters and discuss their role in the performance of our model.
Research Interests:
The integration of multisensory information plays a crucial role in autonomous robotics to forming robust and meaningful representations of the environment. In this work, we investigate how robust multimodal representations can naturally... more
The integration of multisensory information plays a crucial role in autonomous robotics to forming robust and meaningful representations of the environment. In this work, we investigate how robust multimodal representations can naturally develop in a self-organizing manner from co-occurring multisensory inputs. We propose a hierarchical architecture with growing self-organizing neural networks for learning human actions from audiovisual inputs. The hierarchical processing of visual inputs allows to obtain progressively specialized neurons encoding latent spatiotemporal dynamics of the input, consistent with neurophysiological evidence for increasingly large temporal receptive windows in the human cortex. Associative links to bind unimodal representations are incrementally learned by a semi-supervised algorithm with bidirectional connectivity. Multimodal representations of actions are obtained using the co-activation of action features from video sequences and labels from automatic speech recognition. Experimental results on a dataset of 10 full-body actions show that our system achieves state-of-the-art classification performance without requiring the manual segmentation of training samples, and that congruent visual representations can be retrieved from recognized speech in the absence of visual stimuli. Together, these results show that our hierarchical neural architecture accounts for the development of robust multimodal representations from dynamic audiovisual inputs.
Open-ended learning is fundamental in autonomous robotics for the incremental acquisition of knowledge through experience. However, most of the proposed computational models for action recognition do not account for incremental learning,... more
Open-ended learning is fundamental in autonomous robotics for the incremental acquisition of knowledge through experience. However, most of the proposed computational models for action recognition do not account for incremental learning, but rather learn a batch of training actions without adapting to new inputs presented after training sessions. Therefore, this is the need to provide robots with the ability to incrementally process a set of available perceptual cues and to adapt their behavioural responses over time. In this work, we propose a neural network architecture with multilayer-predictive processing for incrementally learning action sequences. Our architecture comprises a hierarchy of self-organizing networks that progressively learn the spatiotemporal structure of the input using Hebbian-like plasticity. Along the hierarchical flow with increasingly larger temporal receptive fields, feedback connections from higher-order networks carry predictions of lower-level neural activation patterns, whereas feedforward connections convey residual errors between the predictions and the lower-level activity. This mechanism is used to modulate the amount of learning necessary to adapt to the dynamic input distribution and develop robust action representations. We present a simplified hierarchical architecture with two layers and describe a number of planned experiments for classifying human actions in an open-ended learning scenario.
The recognition of actions that involve the use of objects has remained a challenging task. In this paper, we present a hierarchical self-organizing neural architecture for learning to recognize transitive actions from RGB-D videos. We... more
The recognition of actions that involve the use of objects has remained a challenging task. In this paper, we present a hierarchical self-organizing neural architecture for learning to recognize transitive actions from RGB-D videos. We process separately body poses extracted from depth map sequences and object features from RGB images. These cues are subsequently integrated to learn action–object mappings in a self-organized manner in order to overcome the visual ambiguities introduced by the processing of body postures alone. Experimental results on a dataset of daily actions show that the integration of action–object pairs significantly increases classification performance.
The correct execution of well-defined movements plays a crucial role in physical rehabilitation and sports. While there is an extensive number of well-established approaches for human action recognition, the task of assessing the quality... more
The correct execution of well-defined movements plays a crucial role in physical rehabilitation and sports. While there is an extensive number of well-established approaches for human action recognition, the task of assessing the quality of actions and providing feedback for correcting inaccurate movements has remained an open issue in the literature. We present a learning-based method for efficiently providing feedback on a set of training movements captured by a depth sensor. We propose a novel recursive neural network that uses growing self-organization for the efficient learning of body motion sequences. The quality of actions is then computed in terms of how much a performed movement matches the correct continuation of a learned sequence. The proposed system provides visual assistance to the person performing an exercise by displaying real-time feedback, thus enabling the user to correct inaccurate postures and motion intensity. We evaluate our approach with a data set containing 3 powerlifting exercises performed by 17 athletes. Experimental results show that our novel architecture outperforms our previous approach for the correct prediction of routines and the detection of mistakes both in a single-and multiple-subject scenario.
Affordances are a useful method to anticipate the effect of an action performed by an agent. In this work, we present a robotic-cleaning task using contextual affordances implemented through a self-organizing neural network to predict the... more
Affordances are a useful method to anticipate the effect of an action performed by an agent. In this work, we present a robotic-cleaning task using contextual affordances implemented through a self-organizing neural network to predict the effect of the performed actions and avoid failed states. Current results on a simulated robot environment show that our architecture is able to predict future states with high accuracy.
The integration of multisensory information plays a crucial role in autonomous robotics. In this work, we investigate how robust multimodal representations can naturally develop in a self-organized manner from co-occurring multisensory... more
The integration of multisensory information plays a crucial role in autonomous robotics. In this work, we investigate how robust multimodal representations can naturally develop in a self-organized manner from co-occurring multisensory inputs.
We propose a hierarchical learning architecture with growing self-organizing neural networks for learning human actions from audiovisual inputs. Associative links between unimodal representations are incrementally learned by a semi-supervised algorithm with bidirectional connectivity that takes into account inherent spatiotemporal dynamics of the input. Experiments on a dataset of 10 full-body actions show that our architecture is able to learn action-word mappings without the need of segmenting training samples for ground-truth labelling. Instead, multimodal representations of actions are obtained using the co-activation of action features from video sequences and labels from automatic speech recognition. Promising experimental results encourage the extension of our architecture in several directions.
Falls represent a major problem in the public health care domain, especially among the elderly population. Therefore, there is a motivation to provide technological solutions for assisted living in home environments. We introduce a... more
Falls represent a major problem in the public health care domain, especially among the elderly population. Therefore, there is a motivation to provide technological solutions for assisted living in home environments. We introduce a neurocognitive robot assistant that monitors a person in a household environment. In contrast to the use of a static-view sensor, a mobile humanoid robot will keep the moving person in view and track his/her position and body motion  haracteristics. A learning neural system is responsible for processing the visual information from a depth sensor and denoising the live video stream to reliably detect fall events in real time. Whenever a fall event occurs, the humanoid will approach the person and ask whether assistance is required. The robot will then take an image of the fallen person that can be sent to the person’s caregiver for further human evaluation and agile intervention. In this paper, we present a number of experiments with a mobile robot in a home-like environment along with an evaluation of our fall detection framework. The experimental results show the promising contribution of our system to assistive robotics for fall detection of the elderly at home.
The correct execution of well-defined movements in sport disciplines may increase the body’s mechanical efficiency and reduce the risk of injury. While there exists an extensive number of learning-based approaches for the recognition of... more
The correct execution of well-defined movements in sport disciplines may increase the body’s mechanical efficiency
and reduce the risk of injury. While there exists an extensive
number of learning-based approaches for the recognition of
human actions, the task of computing and providing feedback
for correcting inaccurate movements has received significantly
less attention in the literature. We present a learning system for
automatically providing feedback on a set of learned  movements captured with a depth sensor. The proposed system provides visual assistance to the person performing an exercise by displaying real-time feedback to correct possible inaccurate postures and motion. The learning architecture uses recursive neural network self-organization extended for predicting the correct continuation of the training movements. We introduce three mechanisms for computing feedback on the correctness of overall movement and individual body joints. For evaluation purposes, we collected a data set with 17 athletes performing 3 powerlifting exercises. Our results show promising system performance for the detection of mistakes in  movements on this data set.
Research Interests:
Gesture recognition is an important task in Human-Robot Interaction (HRI) and the research effort towards robust and high-performance recognition algorithms is increasing. In this work, we present a neural network approach for learning an... more
Gesture recognition is an important task in Human-Robot Interaction (HRI) and the research effort towards robust and high-performance recognition algorithms is increasing. In this work, we present a neural network approach for learning an arbitrary number of labeled training gestures to be recognized in real time. The representation of gestures is hand-independent and gestures with both hands are also considered. We use depth information to extract salient motion features and encode gestures as sequences of motion patterns. Preprocessed sequences are then clustered by a hierarchical learning architecture based on self-organizing maps. We present experimental results on two different data sets: command-like gestures for HRI scenarios and communicative gestures that include cultural peculiarities, often excluded in gesture recognition research. For better recognition rates, noisy observations introduced by tracking errors are detected and removed from the training sets. Obtained results motivate further investigation of efficient neural network methodologies for gesture-based communication.
In this video we introduce a robot assistant that monitors a person in a household environment to promptly detect fall events. In contrast to the use of a fixed sensor, the humanoid robot will track and keep the moving person in the scene... more
In this video we introduce a robot assistant that monitors a person in a household environment to promptly detect fall events. In contrast to the use of a fixed sensor, the humanoid robot will track and keep the moving person in the scene while performing daily activities. For this purpose, we extended the humanoid Nao with a depth sensor attached to its head. The tracking framework implemented with OpenNI segments and tracks the person’s position and body posture. We use a learning neural framework for processing the extracted body features and detecting abnormal behaviors, e.g. a fall event. The neural architecture consists of a hierarchy of self-organizing neural networks for attenuating noise caused by tracking errors and detecting fall events from video stream in real time. The tracking  pplication, the neural framework, and the humanoid actuators communicate over Robot Operating System (ROS). We use communication over the ROS network implemented with publisher-subscriber nodes. When a fall event is  detected, Nao will approach the person and ask whether assistance is needed. In any case, Nao will take a picture of the scene that can be sent to the caregiver or a relative for further human evaluation and agile intervention. The combination of this sensor technology with our neural network approach allows to tailor the robust detection of falls independently from the background surroundings and in the presence of noise (tracking errors and occlusions) introduced by a real-world scenario. The video shows experiments run in a home-like environment.
We propose a novel biologically inspired framework for the recognition of human full-body actions. First, we extract body pose and motion features from depth map sequences. We then cluster pose-motion cues with a two-stream hierarchical... more
We propose a novel biologically inspired framework for the recognition of human full-body actions. First, we extract body pose and motion features from depth map sequences. We then cluster pose-motion cues with a two-stream hierarchical architecture based on growing neural gas (GNG). Multi-cue trajectories are finally combined to provide prototypical action dynamics in the joint feature space. We extend the unsupervised GNG with two labelling functions for classifying clustered trajectories. Noisy samples are automatically detected and removed from the training and the testing set. Experiments on a set of 10 human actions show that the use of multi-cue learning leads to substantially increased recognition accuracy over the single-cue approach and the learning of joint pose-motion vectors.
The visual recognition of complex, articulated human movements is fundamental for a wide range of artificial systems oriented toward human-robot communication, action classification, and action-driven perception. These challenging tasks... more
The visual recognition of complex, articulated human movements is fundamental for a wide range of artificial systems oriented toward human-robot communication, action classification, and action-driven perception. These challenging tasks may generally involve the processing of a huge amount of visual information and learning-based mechanisms for generalizing a set of training actions and classifying new samples. To operate in natural environments, a crucial property is the efficient and robust recognition of actions, also under noisy conditions caused by, for instance, systematic sensor errors and temporarily occluded persons. Studies of the mammalian visual system and its outperforming ability to process biological motion information suggest separate neural pathways for the distinct processing of pose and motion features at multiple levels and the subsequent integration of these visual cues for action perception. We present a neurobiologically-motivated approach to achieve noise-tolerant action recognition in real time. Our model consists of self-organizing Growing When Required (GWR) networks that obtain progressively generalized representations of sensory inputs and learn inherent spatio-temporal dependencies. During the training, the GWR networks dynamically change their topological structure to better match the input space. We first extract pose and motion features from video sequences and then cluster actions in terms of prototypical pose-motion trajectories. Multi-cue trajectories from matching action frames are subsequently combined to provide action dynamics in the joint feature space. Reported experiments show that our approach outperforms previous results on a dataset of full-body actions captured with a depth sensor, and ranks among the best results for a public benchmark of domestic daily actions.
Research Interests:
Gesture recognition is an important task in Human-Robot Interaction (HRI) and the research effort towards robust and high-performance recognition algorithms is increasing. In this work, we present a neural network approach for learning an... more
Gesture recognition is an important task in Human-Robot Interaction (HRI) and the research effort towards robust and high-performance recognition algorithms is increasing. In this work, we present a neural network approach for learning an arbitrary number of labeled training gestures to be recognized in real time. The representation of gestures is hand-independent and gestures with both hands are also considered. We use depth information to extract salient motion features and encode gestures as sequences of motion patterns. Preprocessed sequences are then clustered by a hierarchical learning architecture based on self-organizing maps. We present experimental results on two different data sets: command-like gestures for HRI scenarios and communicative gestures that include cultural peculiarities, often excluded in gesture recognition research. For better recognition rates, noisy observations introduced by tracking errors are detected and removed from the training sets. Obtained results motivate further investigation of efficient neural network methodologies for gesture-based communication.