The purpose of this thesis is to propose and evaluate a markerless system for capturing hand movements in real time to allow 3D interactions in virtual environment (VE). Tools such as keyboard and mouse are not enough for interacting in... more
The purpose of this thesis is to propose and evaluate a markerless system for capturing hand movements in real time to allow 3D interactions in virtual environment (VE). Tools such as keyboard and mouse are not enough for interacting in 3D VE; current motion capture systems are expensive and require wearing equipments. Systems based on cameras and image processing partially fill these gaps, but do not yet allow an accurate, efficient and real-time 3D motion capture. Our system provides a solution to this problem with a 3D camera. We have implemented modalities that allow a more natural interaction with objects and VE. The goal of our system is to obtain performances at least equal to those of common tools in virtual reality while providing a better overall acceptability (i.e., usefulness, usability, immersion). To achieve this goal, we conducted three experimental studies involving over 100 participants. With the first study, we compared the first version of our system (based on a 3D Camera MESA SwissRanger) to a traditional mouse, for a selection task. The second experiment was focused on the study of the realization of objects-manipulation tasks (position, orientation, scaling) and navigation tasks in VE. For this study, we compared the improved version of our system (based on Microsoft Kinect) with data gloves associated with magnetic sensors. An additional study concerns the evaluation of new modalities of interaction, implemented based on participants’ feedback of the second study. Keywords: Hand motion capture, 3D interaction, bimanual interaction, markerless device, real-time, 3D camera.
This study was designed to test the relationship between matching and mirroring (MM) and homophilous perceptions (PHM) in leadership socialization. Elevated PHM levels were hypothesized to affect workplace acceptance levels. The need for... more
This study was designed to test the relationship between matching and mirroring (MM) and homophilous perceptions (PHM) in leadership socialization. Elevated PHM levels were hypothesized to affect workplace acceptance levels. The need for testing leadership socialization skills was magnified with the current demographic shift known as the leadership succession crisis, creating problems with onboarding strategies. The theoretical foundations of the study were based on the social identity theory, the social presence theory, the leader-member exchange theory, and the similarity-attraction paradigm. The study conducted at Workforce Solutions North Texas in Wichita Falls, Texas was sampled based on the calculated strength of the effect in a pilot study. Test group participants engaged in MM enhanced social conversation with a coached candidate and control group participants conversed with an uncoached participant from the general population engaging in normal conversation. MM processes were differentiated from natural synchronic tendencies using specialized software and Kinect® sensors. A contrasted group, quasi-experiment was examined with an analysis of covariance. No statistically significant difference was found between groups on PHM levels, correcting for age, gender, ethnicity, height, glasses, hobbies, and professions. However, PHM and coworker acceptance were statistically significant but with no difference between groups. Further research is needed to test PHM as a metric for rapport in socialization strategies. Nevertheless, the homophily lens rather than the rapport lens can help organizational development and human resource professionals quantify and develop more effective socialization strategies aimed at solving problems associated with the leadership succession crisis.
New low cost sensors and open free libraries for 3D image processing are making important advances in robot vision applications possible, such as three-dimensional object recognition, semantic mapping, navigation and localization of... more
New low cost sensors and open free libraries for 3D image processing are making important advances in robot vision applications possible, such as three-dimensional object recognition, semantic mapping, navigation and localization of robots, human detection and/or gesture recognition for human-machine interaction. In this paper, a novel method for recognizing and tracking the fingers of a human hand is presented. This method is based on point clouds from range images captured by a RGBD sensor. It works in real time and it does not require visual marks, camera calibration or previous knowledge of the environment. Moreover, it works successfully even when multiple objects appear in the scene or when the ambient light is changed. Furthermore, this method was designed to develop a human interface to control domestic or industrial devices, remotely. In this paper, the method was tested by operating a robotic hand. Firstly, the human hand was recognized and the fingers were detected. Secondly, the movement of the fingers was analysed and mapped to be imitated by a robotic hand.
Agricultural soil roughness is pertinent to important agricultural phenomena, such as evaporation, infiltration or compression. Monitoring roughness variations would make possible the improvement of tillage operations. In the present... more
Agricultural soil roughness is pertinent to important agricultural phenomena, such as evaporation, infiltration or compression. Monitoring roughness variations would make possible the improvement of tillage operations. In the present work, implementation of the Microsoft KinectTM RGB-depth camera for dynamic characterization of soil microrelief is proposed and discussed. The metrological performance and the effect of the operating conditions on three-dimensional reconstruction was analyzed considering both laboratory tests on calibrated reference surfaces and field tests on different agricultural soil surfaces. Data set analysis was made on the basis of surface roughness parameters, as defined by ISO 25178 (2012) series: average roughness, root mean square roughness, skewness and kurtosis. Correlation between different tillage conditions and roughness parameters describing soil morphology was finally discussed.
It is widely known the fact that traditional Interactive Whiteboard (IWB) are very expensive. At University of Trento there are different teams which work to possible new solutions to this problem examining new technologies and new... more
It is widely known the fact that traditional Interactive Whiteboard (IWB) are very expensive. At University of Trento there are different teams which work to possible new solutions to this problem examining new technologies and new paradigms of interaction such as the projects we will see in 2.3. 3 and 2.3. 4. These projects are based on the introduction of gaming devices as a concrete tool to support teaching and learning activities. They can be considered the forerunners in this type of approach to the problem. They constitute the ...
This paper presents a human action recognition system that runs in real-time and uses a depth camera and an inertial sensor simultaneously based on a previously developed sensor fusion method. Computationally efficient depth image... more
This paper presents a human action recognition system that runs in real-time and uses a depth camera and an inertial sensor simultaneously based on a previously developed sensor fusion method. Computationally efficient depth image features and inertial signals features are fed into two computationally efficient collaborative representative classifiers. A decision-level fusion is then performed. The developed real-time system is evaluated using a publicly available multimodal human action recognition dataset by considering a comprehensive set of human actions. The overall classification rate of the developed real-time system is shown to be more than 97% which is at least 9% higher than when each sensing modality is used individually. The results from both offline and real-time experimentations demonstrate the effectiveness of the system and its real-time throughputs.
ABSTRACT In 3D human motion pose-based analysis, the main problem is how to classify multi-class label activities based on primitive action (pose) inputs efficiently for both accuracy and processing time. Because, pose is not unique and... more
ABSTRACT In 3D human motion pose-based analysis, the main problem is how to classify multi-class label activities based on primitive action (pose) inputs efficiently for both accuracy and processing time. Because, pose is not unique and the same pose can be anywhere on different activity classes. In this paper, we evaluate the effectiveness of Extreme Learning Machine (ELM) in 3D human motion analysis based on pose cluster. ELM has reputation as eager classifier with fast training and testing time but the classification result originally has still low testing accuracy even by increasing the hidden nodes number and adding more training data. To achieve better accuracy, we pursue a feature selection method to reduce the dimension of pose cluster training data in time sequence. We propose to use frequency of pose occurrence. This method is similar like bag of words which is a sparse vector of occurrence counts of poses in histogram as features for training data (bag of poses). By using bag of poses as the optimum feature selection, the ELM performance can be improved without adding network complexity (Hidden nodes number and training data).
This paper describes the development of a low-cost mini-robot that is controlled by visual gestures. The prototype allows a person with disabilities to perform visual inspections indoors and in domestic spaces. Such a device could be used... more
This paper describes the development of a low-cost mini-robot that is controlled by visual gestures. The prototype allows a person with disabilities to perform visual inspections indoors and in domestic spaces. Such a device could be used as the operator's eyes obviating the need for him to move about. The robot is equipped with a motorised webcam that is also controlled by visual gestures. This camera is used to monitor tasks in the home using the mini-robot while the operator remains quiet and motionless. The prototype was evaluated through several experiments testing the ability to use the mini-robot's kinematics and communication systems to make it follow certain paths. The mini-robot can be programmed with specific orders and can be tele-operated by means of 3D hand gestures to enable the operator to perform movements and monitor tasks from a distance.
This paper presents a new method for human activity recognition using depth sequences. Each depth sequence is represented by three depth motion maps (DMMs) from three projection views (front, side and top) to capture motion cues. A... more
This paper presents a new method for human activity recognition using depth sequences. Each depth sequence is represented by three depth motion maps (DMMs) from three projection views (front, side and top) to capture motion cues. A feature extraction method utilizing spatial and orientational auto-correlations of image local gradients is introduced to extract features from DMMs. The gradient local auto-correlations (GLAC) method employs second order statistics (i.e., auto-correlations) to capture richer information from images than the histogram-based methods (e.g., histogram of oriented gradients) which use first order statistics (i.e., histograms). Based on the extreme learning machine, a fusion framework that incorporates feature-level fusion into decision-level fusion is proposed to effectively combine the GLAC features from DMMs. Experiments on the MSRAction3D and MSRGesture3D datasets demonstrate the effectiveness of the proposed activity recognition algorithm.
In 3D human motion pose-based analysis, the main problem is how to classify multi-class label activities based on primitive action (pose) inputs efficiently for both accuracy and processing time. Because, pose is not unique and the same... more
In 3D human motion pose-based analysis, the main problem is how to classify multi-class label activities based on primitive action (pose) inputs efficiently for both accuracy and processing time. Because, pose is not unique and the same pose can be anywhere on different activity classes. In this paper, we evaluate the effectiveness of Extreme Learning Machine (ELM) in 3D human motion analysis based on pose cluster. ELM has
reputation as eager classifier with fast training and testing time but the classification result originally has still low testing accuracy even by increasing the hidden nodes number and adding more training data. To achieve better accuracy, we pursue a feature
selection method to reduce the dimension of pose cluster training data in time sequence. We propose to use frequency of pose occurrence. This method is similar like bag of words which is a sparse vector of occurrence counts of poses in histogram as features for training data (bag of poses). By using bag of poses as the optimum feature selection, the ELM performance can be improved without adding network complexity (Hidden nodes number and training data).
We present SpeeG, a multimodal speech- and body gesture-based text input system targeting media centres, set-top boxes and game consoles. Our controller-free zoomable user interface combines speech input with a gesture-based real-time... more
We present SpeeG, a multimodal speech- and body gesture-based text input system targeting media centres, set-top boxes and game consoles. Our controller-free zoomable user interface combines speech input with a gesture-based real-time correction of the recognised voice input. While the open source CMU Sphinx voice recogniser transforms speech input into written text, Microsoft's Kinect sensor is used for the hand gesture tracking. A modified version of the zoomable Dasher interface combines the input from Sphinx and the Kinect sensor. In contrast to existing speech error correction solutions with a clear distinction between a detection and correction phase, our innovative SpeeG text input system enables continuous real-time error correction. An evaluation of the SpeeG prototype has revealed that low error rates for a text input speed of about six words per minute can be achieved after a minimal learning phase. Moreover, in a user study SpeeG has been perceived as the fastest of all evaluated user interfaces and therefore represents a promising candidate for future controller-free text input.
Kinect is the de facto standard for real-time depth sensing and motion capture cameras. The sensor is here proposed for exploiting body tracking during driving operations. The motion capture system was developed taking advantage of the... more
Kinect is the de facto standard for real-time depth sensing and motion capture cameras. The sensor is here proposed for exploiting body tracking during driving operations. The motion capture system was developed taking advantage of the Microsoft software development kit (SDK), and implemented for real-time monitoring of body movements of a beginner and an expert tractor drivers, on different tracks (straight and with curves) and with different driving conditions (manual and assisted steering). Tests show how analyses can be done not only in terms of absolute movements, but also in terms of relative shifts, allowing for quantification of angular displacements or rotations.
Kinect is the de facto standard for real-time depth sensing and motion capture cameras. The sensor is here proposed for exploiting body tracking during driving operations. The motion capture system was developed taking advantage of the... more
Kinect is the de facto standard for real-time depth sensing and motion capture cameras. The sensor is here proposed for exploiting body tracking during driving operations. The motion capture system was developed taking advantage of the Microsoft software development kit (SDK), and implemented for real-time monitoring of body movements of a beginner and an expert tractor drivers, on different tracks (straight and with curves) and with different driving conditions (manual and assisted steering). Tests show how analyses can be done not only in terms of absolute movements, but also in terms of relative shifts, allowing for quantification of angular displacements or rotations.
We present SpeeG, a multimodal speech- and body gesture-based text input system targeting media centres, set-top boxes and game consoles. Our controller-free zoomable user interface combines speech input with a gesture-based real-time... more
We present SpeeG, a multimodal speech- and body gesture-based text input system targeting media centres, set-top boxes and game consoles. Our controller-free zoomable user interface combines speech input with a gesture-based real-time correction of the recognised voice input. While the open source CMU Sphinx voice recogniser transforms speech input into written text, Microsoft's Kinect sensor is used for the hand gesture tracking. A modified version of the zoomable Dasher interface combines the input from Sphinx and the Kinect sensor. In contrast to existing speech error correction solutions with a clear distinction between a detection and correction phase, our innovative SpeeG text input system enables continuous real-time error correction. An evaluation of the SpeeG prototype has revealed that low error rates for a text input speed of about six words per minute can be achieved after a minimal learning phase. Moreover, in a user study SpeeG has been perceived as the fastest of all evaluated user interfaces and therefore represents a promising candidate for future controller-free text input.
In this paper, I describe a Max/MSP interface series (Kinect-Via-) for composers wanting to route and map user-tracking data from the XBox Kinect. The interface series complements four different OpenNI applications, namely OSCeleton,... more
In this paper, I describe a Max/MSP interface series (Kinect-Via-) for composers wanting to route and map user-tracking data from the XBox Kinect. The interface series complements four different OpenNI applications, namely OSCeleton, Synapse, Processing's simple-openni library, and Delicode's NIMate. All Max/MSP interfaces communicate using OSC (Open Sound Control) messages and are performance-ready, meaning that all routing and system options may be changed in real time. The Kinect-Via-interfaces offer a tangible solution for anyone wishing to explore user tracking with the Kinect for creative application. The aim of the paper is to discuss features of four different OpenNI applications, to address potential issues and challenges when working with the OpenNI framework, and to outline formative interface issues revolving around video tracking technology.
This paper presents a fingertips tracking system based on particle filter using Microsoft Kinect. The tracking is performed in two separate modules: image-based 2D position tracking and calibrated 1D depth tracking. The two modules is... more
This paper presents a fingertips tracking system based on particle filter using Microsoft Kinect. The tracking is performed in two separate modules: image-based 2D position tracking and calibrated 1D depth tracking. The two modules is then combined to represent full 3D fingertips tracking. The separation is aimed for faster tracking by putting as most processing as possible in 2D while 3D position is obtained by just incorporating available depth information. Moreover, the error in 2D tracking can be corrected using the information from depth tracking and vice-versa. The fingertips are obtained by 2D convex hull algorithm on binary images which is extracted after setting a depth threshold. This assumes that hand and fingers is the foremost part of object nearest to camera during interaction. To deal with partial occlusion, we devised a simple retrace algorithm based on linear time-series analysis of fingertips trajectory. To evaluate the effectiveness of the system we design several scenarios involving contactless user interaction for zooming and motion capture for 3D animation creation. The evaluation is measured in terms of tracking accuracy and overhead computation. The 2D+1D tracking is experimentally shown to be more robust to occlusion with reasonably little overhead computation time.
This paper presents a fusion approach for improving human action recognition based on two differing modality sensors consisting of a depth camera and an inertial body sensor. Computationally efficient action features are extracted from... more
This paper presents a fusion approach for improving human action recognition based on two differing modality sensors consisting of a depth camera and an inertial body sensor. Computationally efficient action features are extracted from depth images provided by the depth camera and from accelerometer signals provided by the inertial body sensor. These features consist of depth motion maps and statistical signal attributes. For action recognition, both feature-level fusion and decision-level fusion are examined by using a collaborative representation classifier. In the feature-level fusion, features generated from the two differing modality sensors are merged before classification while in the decision-level fusion, the Dempster-Shafer theory is used to combine the classification outcomes from two classifiers, each corresponding to one sensor. The introduced fusion framework is evaluated using the Berkeley Multimodal Human Action Database. The results indicate that due to the complementary aspect of the data from these sensors, the introduced fusion approaches lead to from 2% to 23% recognition rate improvements depending on the action over the situations when each sensor is used individually.
—This work presents an application of the Microsoft Kinect camera for an autonomous mobile robot. In order to drive autonomously one main issue is the ability to recognize signalling panels positioned overhead. The Kinect camera can be... more
—This work presents an application of the Microsoft Kinect camera for an autonomous mobile robot. In order to drive autonomously one main issue is the ability to recognize signalling panels positioned overhead. The Kinect camera can be applied in this task due to its double integrated sensor, namely vision and distance. The vision sensor is used to perceive the signalling panel, while the distance sensor is applied as a segmentation filter, by eliminating pixels by their depth in the object's background. The approach adopted to perceive the symbol from the signalling panel consists in: a) applying the depth image filter from the Kinect camera; b) applying Morphological Operators to segment the image; c) a classification is carried out with an Artificial Neural Network and a simple Multilayer Perceptron network that can correctly classify the image. This work explores the Kinect camera depth sensor and hence this filter avoids heavy computational algorithms to search for the correct location of the signalling panels. It simplifies the next tasks of image segmentation and classification. A mobile autonomous robot using this camera was used to recognize the signalling panels on a competition track of the Portuguese Robotics Open.
El presente trabajo hace un recorrido por diferentes formas de comunicar ciencia en forma de arte interactiva: la física con un juego no convencional de ping-pong que explora diferentes conceptos cinéticos, la música donde se expresa la... more
El presente trabajo hace un recorrido por diferentes formas de comunicar ciencia en forma de arte interactiva: la física con un juego no convencional de ping-pong que explora diferentes conceptos cinéticos, la música donde se expresa la gran diferencia entre el sonido estático y el sonido en movimiento, la lingüística donde se hace una analogía entre la geometría algebraica y las palabras y el movimiento corporal donde el usuario crea figuras y formas con las manos. Sus autores exploran diferentes tipos de interacción con el usuario y muestran posibilidades de presentar contenidos científicos en forma atractiva a un público general.