In this paper, we present a practical vision based Simultane-ous Localization and Mapping (SLAM) system for a highly dynamic environment. We adopt a multibody Structure from Motion (SfM) approach, which is the generalization of classical... more
In this paper, we present a practical vision based Simultane-ous Localization and Mapping (SLAM) system for a highly dynamic environment. We adopt a multibody Structure from Motion (SfM) approach, which is the generalization of classical SfM to dynamic scenes with multiple rigidly mov-ing objects. The proposed framework of multibody visual SLAM allows choosing between full 3D reconstruction or simply tracking of the moving objects, which adds flexibil-ity to the system, for scenes containing non-rigid objects or objects having insufficient features for reconstruction. The solution demands a motion segmentation framework that can segment feature points belonging to different motions and maintain the segmentation with time. We propose a re-altime incremental motion segmentation algorithm for this purpose. The motion segmentation is robust and is capa-ble of segmenting difficult degenerate motions, where the moving objects is followed by a moving camera in the same direction. This robu...
Mobile robot vision-based navigation has been the source of countless research contributions, from the domains of both vision and control. Vision is becoming more and more common in applications such as localization, automatic map... more
Mobile robot vision-based navigation has been the source of countless research contributions, from the domains of both vision and control. Vision is becoming more and more common in applications such as localization, automatic map construction, autonomous navigation, path following, inspection, monitoring or risky situation detection. This survey presents those pieces of work, from the nineties until nowadays, which constitute a wide progress in visual navigation techniques for land, aerial and autonomous underwater vehicles. The paper deals with two major approaches: map-based navigation and mapless navigation. Map-based navigation has been in turn subdivided in metric map-based navigation and topological map-based navigation. Our outline to mapless navigation includes reactive techniques based on qualitative characteristics extraction, appearance-based localization, optical flow, features tracking, plane ground detection/tracking, etc... The recent concept of visual sonar has also been revised.
In this paper, we present the vision-aided inertial navigation (VISINAV) algorithm that enables precision planetary landing. The vision front-end of the VISINAV system extracts 2-D-to-3-D correspondences between descent images and a... more
In this paper, we present the vision-aided inertial navigation (VISINAV) algorithm that enables precision planetary landing. The vision front-end of the VISINAV system extracts 2-D-to-3-D correspondences between descent images and a surface map (mapped landmarks), as well as 2-D-to-2-D feature tracks through a sequence of descent images (opportunistic features). An extended Kalman filter (EKF) tightly integrates both types of visual feature observations with measurements from an inertial measurement unit. The filter computes accurate estimates of the lander's terrainrelative position, attitude, and velocity, in a resource-adaptive and hence real-time capable fashion. In addition to the technical analysis of the algorithm, the paper presents validation results from a sounding-rocket test flight, showing estimation errors of only 0.16 m/s for velocity and 6.4 m for position at touchdown. These results vastly improve current state of the art for terminal descent navigation without visual updates, and meet the requirements of future planetary exploration missions.
This paper presents an adaptation of a vision and inertial-based state estimation algorithm for use in an underwater robot. The proposed approach combines information from an Inertial Measurement Unit (IMU) in the form of linear... more
This paper presents an adaptation of a vision and inertial-based state estimation algorithm for use in an underwater robot. The proposed approach combines information from an Inertial Measurement Unit (IMU) in the form of linear accelerations and angular velocities, depth data from a pressure sensor, and feature tracking from a monocular downward facing camera to estimate the 6DOF pose of the vehicle. To validate the approach, we present extensive experimental results from field trials conducted in underwater environments with varying lighting and visibility conditions, and we demonstrate successful application of the technique underwater.
We present an overview on the history of tracking for mobile phone Augmented Reality. We present popular approaches using marker tracking, natural feature tracking or offloading to nearby servers. We then outline likely future work.
Recent advances in 3D scanning technology have enabled the development of interesting applications of 3D human body modelling and shape analysis, especially in the areas of virtual shopping, custom clothing and sizing surveys for the... more
Recent advances in 3D scanning technology have enabled the development of interesting applications of 3D human body modelling and shape analysis, especially in the areas of virtual shopping, custom clothing and sizing surveys for the clothing industry. Most of the current applications have so far been concerned with automatic tape measurement extraction, i.e. simulation of the manual procedure for extracting
This paper presents an implementation of a markerless tracking technique targeted to the Windows Mobile Pocket PC platform. The primary aim of this work is to allow the development of standalone augmented reality applications for handheld... more
This paper presents an implementation of a markerless tracking technique targeted to the Windows Mobile Pocket PC platform. The primary aim of this work is to allow the development of standalone augmented reality applications for handheld devices based on natural feature tracking. In order to achieve this goal, a subset of two computer vision libraries was ported to the Pocket PC platform. They were also adapted to use fixed point math, with the purpose of improving the overall performance of the routines. The port of these libraries opens up the possibility of having other computer vision tasks being executed on mobile platforms. A model based tracking approach that relies on edge information was adopted. Since it does not require a high processing power, it is suitable for constrained devices such as handhelds. The OpenGL ES graphics library was used to perform computer vision tasks, taking advantage of existing graphics hardware acceleration. An augmented reality application was created using the implemented technique and evaluations were done regarding tracking performance and accuracy
An algorithm that categorises animal locomotive behaviour by combining detection and tracking of animal faces in wildlife videos is presented. As an example, the algorithm is applied to lion faces. The detection algorithm is based on a... more
An algorithm that categorises animal locomotive behaviour by combining detection and tracking of animal faces in wildlife videos is presented. As an example, the algorithm is applied to lion faces. The detection algorithm is based on a human face detection method, utilising Haar-like features and AdaBoost classifiers. The face tracking is implemented by applying a specific interest model that combines low-level feature tracking with the detection algorithm. By combining the two methods in a specific tracking model, reliable and temporally coherent detection/tracking of animal faces is achieved. The information generated by the tracker is used to automatically annotate the animal's locomotive behaviour. The annotation classes of locomotive processes for a given animal species are predefined by a large semantic taxonomy on wildlife domain. The experimental results are presented.
This paper proposes vision-based techniques for localizing an unmanned aerial vehicle (UAV) by means of an on-board camera. Only natural landmarks provided by a feature tracking algorithm will be considered, without the help of visual... more
This paper proposes vision-based techniques for localizing an unmanned aerial vehicle (UAV) by means of an on-board camera. Only natural landmarks provided by a feature tracking algorithm will be considered, without the help of visual beacons or landmarks with known positions. First, it is described a monocular visual odometer which could be used as a backup system when the accuracy of GPS is reduced to critical levels. Homography-based techniques are used to compute the UAV relative translation and rotation by means of the images gathered by an onboard camera. The analysis of the problem takes into account the stochastic nature of the estimation and practical implementation issues. The visual odometer is then integrated into a simultaneous localization and mapping (SLAM) scheme in order to reduce the impact of cumulative errors in odometry-based position estimation approaches. Novel prediction and landmark initialization for SLAM in UAVs are presented. The paper is supported by an extensive experimental work where the proposed algorithms have been tested and validated using real UAVs.
This paper introduces a system that can recognize different type of paper-folding by users. The system allows users to register and use their desired paper in the interaction, and detect the folding by using Speed Up Robust Feature (SURF)... more
This paper introduces a system that can recognize different type of paper-folding by users. The system allows users to register and use their desired paper in the interaction, and detect the folding by using Speed Up Robust Feature (SURF) algorithm. The paper also describes a paper-based tower defense game which has been developed as a proof of concept of our method. This method can be considered as the initial step for seamlessly migrating meaningful traditional art of origami into the digital world as a part of the interactive media.
In this paper, we present three techniques for 6DOF natural feature tracking in real time on mobile phones. We achieve interactive frame rates of up to 30 Hz for natural feature tracking from textured planar targets on current generation... more
In this paper, we present three techniques for 6DOF natural feature tracking in real time on mobile phones. We achieve interactive frame rates of up to 30 Hz for natural feature tracking from textured planar targets on current generation phones. We use an approach based on heavily modified state-of-the-art feature descriptors, namely SIFT and Ferns plus a template-matching-based tracker. While SIFT is known to be a strong, but computationally expensive feature descriptor, Ferns classification is fast, but requires large amounts of memory. This renders both original designs unsuitable for mobile phones. We give detailed descriptions on how we modified both approaches to make them suitable for mobile phones. The template-based tracker further increases the performance and robustness of the SIFT-and Ferns-based approaches. We present evaluations on robustness and performance and discuss their appropriateness for Augmented Reality applications.
The removal of unwanted, parasitic vibrations in a video sequence induced by camera motion is an essential part of video acquisition in industrial, military and consumer applications. In this paper, a new robust online framework for video... more
The removal of unwanted, parasitic vibrations in a video sequence induced by camera motion is an essential part of video acquisition in industrial, military and consumer applications. In this paper, a new robust online framework for video stabilization is proposed to remove such vibrations and reconstruct the video sequence which is free from unstable motion. Combination of camera motion estimation along with motion separation determines the undesired motion, which is compensated to produce a stable video sequence. This proposed framework uses Speeded Up Robust Feature (SURF) as stable, best consistent feature points which is selected from few frames and matched it with SURF feature points of each frames for motion estimation. Different measures are taken to select the best consistent feature points. A discrete kalman filter is used to smoothen so estimated motion vectors and its prediction is fed back in case of missing of any feature points. The resultant stabilized video is obtained by compensating the unstable motion.
We present a method for segmenting and tracking vehicles on highways using a camera that is relatively low to the ground. At such low angles, 3-D perspective effects cause significant changes in appearance over time, as well as severe... more
We present a method for segmenting and tracking vehicles on highways using a camera that is relatively low to the ground. At such low angles, 3-D perspective effects cause significant changes in appearance over time, as well as severe occlusions by vehicles in neighboring lanes. Traditional approaches to occlusion reasoning assume that the vehicles initially appear well separated in the image; however, in our sequences, it is not uncommon for vehicles to enter the scene partially occluded and remain so throughout. By utilizing a 3-D perspective mapping from the scene to the image, along with a plumb line projection, we are able to distinguish a subset of features whose 3-D coordinates can be accurately estimated. These features are then grouped to yield the number and locations of the vehicles, and standard feature tracking is used to maintain the locations of the vehicles over time. Additional features are then assigned to these groups and used to classify vehicles as cars or trucks. Our technique uses a single grayscale camera beside the road, incrementally processes image frames, works in real time, and produces vehicle counts with over 90% accuracy on challenging sequences.
In this paper we describe a fully integrated system for detecting, localizing, and tracking pedestrians from a moving vehicle. The system can reliably detect upright pedestrians to a range of 40 m in lightly cluttered urban environments.... more
In this paper we describe a fully integrated system for detecting, localizing, and tracking pedestrians from a moving vehicle. The system can reliably detect upright pedestrians to a range of 40 m in lightly cluttered urban environments. The system uses range data from stereo vision to segment the scene into regions of interest, from which shape features are extracted and used to classify pedestrians. The regions are tracked using shape and appearance features. Tracking is used to temporally filter classifications to improve performance and to estimate the velocity of pedestrians for use in path planning. The end-toend system runs at 5 Hz on 11024 1 768 imagery using a standard 2.4 GHz Intel Core 2 Quad processor, and has been integrated and tested on multiple ground vehicles and environments. We show performance on a diverse set of datasets with groundtruth in outdoor environments with varying degrees of pedestrian density and clutter. In highly cluttered urban environments, the detection rates are on a par with state-of-the-art but significantly slower systems.
Enabling computer systems to recognize facial expressions and infer emotions from them in real time presents a challenging research topic. In this paper, we present a real time approach to emotion recognition through facial expression in... more
Enabling computer systems to recognize facial expressions and infer emotions from them in real time presents a challenging research topic. In this paper, we present a real time approach to emotion recognition through facial expression in live video. We employ an automatic facial feature tracker to perform face localization and feature extraction. The facial feature displacements in the video stream are used as input to a Support Vector Machine classifier. We evaluate our method in terms of recognition accuracy for a variety of interaction and classification scenarios. Our person-dependent and person-independent experiments demonstrate the effectiveness of a support vector machine and feature tracking approach to fully automatic, unobtrusive expression recognition in live video. We conclude by discussing the relevance of our work to affective and intelligent man-machine interfaces and exploring further improvements.
In this paper we present the computer vision component of a 6DOF pose estimation algorithm to be used by an underwater robot. Our goal is to evaluate which feature trackers enable us to accurately estimate the 3D positions of features, as... more
In this paper we present the computer vision component of a 6DOF pose estimation algorithm to be used by an underwater robot. Our goal is to evaluate which feature trackers enable us to accurately estimate the 3D positions of features, as quickly as possible. To this end, we perform an evaluation of available detectors, descriptors, and matching schemes, over different underwater datasets. We are interested in identifying combinations in this search space that are suitable for use in structure from motion algorithms, and more generally, vision-aided localization algorithms that use a monocular camera. Our evaluation includes frame-by-frame statistics of desired attributes, as well as measures of robustness expressed as the length of tracked features. We compare the fit of each combination based on the following attributes: number of extracted keypoints per frame, length of feature tracks, average tracking time per frame, number of false positive matches between frames. Several datasets were used, collected in different underwater locations and under different lighting and visibility conditions.
Computation in artificial perceptual systems assumes that appropriate and reliable sensory information about the environment is available. However, today's sensors cannot guarantee optimal information at all times. For example, when... more
Computation in artificial perceptual systems assumes that appropriate and reliable sensory information about the environment is available. However, today's sensors cannot guarantee optimal information at all times. For example, when an image from a CCD camera saturates, the entire vision system fails regardless of how 'algorithmically' sophisticated it is. The principal goal of sensory computing is to extract useful information about the environment from 'imperfect' sensors. This paper attempts to generalize our experience with smart vision sensors and provide a direction and illustration for exploiting complex spatio-temporal interaction of image formation, signal detectors, and on-chip processing to extract a surprising amount of useful information from on-chip systems. The examples presented include: VLSI sensory computing systems for adaptive imaging, ultra fast feature tracking with attention, and ultra fast range imaging. Using these examples, we illus...
A number of existing models for surface wave phase speeds (linear and non-linear, breaking and non-breaking waves) are reviewed and tested against phase speed data from a large-scale laboratory experiment. The results of these tests are... more
A number of existing models for surface wave phase speeds (linear and non-linear, breaking and non-breaking waves) are reviewed and tested against phase speed data from a large-scale laboratory experiment. The results of these tests are utilized in the context of assessing the potential improvement gained by incorporating wave non-linearity in phase speed based depth inversions. The analysis is focused on the surf zone, where depth inversion accuracies are known to degrade significantly. The collected data includes very high-resolution remote sensing video and surface elevation records from fixed, in-situ wave gages. Wave phase speeds are extracted from the remote sensing data using a feature tracking technique, and local wave amplitudes are determined from the wave gage records and used for comparisons to non-linear phase speed models and for nonlinear depth inversions. A series of five different regular wave conditions with a range of non-linearity and dispersion characteristics are analyzed and results show that a composite dispersion relation, which includes both non-linearity and dispersion effects, best matches the observed phase speeds across the domain and hence, improves surf zone depth estimation via depth inversions. Incorporating non-linearity into the phase speed model reduces errors to O(10%), which is a level previously found for depth inversions with small amplitude waves in intermediate water depths using linear dispersion. Considering the controlled conditions and extensive ground truth, this appears to be a practical limit for phase speed-based depth inversions. Finally, a phase speed sensitivity analysis is performed that indicates that typical nearshore sand bars should be resolvable using phase speed depth inversions. However, increasing wave steepness degrades the sensitivity of this inversion method.
This study investigated the degree to which speed of stereoscopic translational motion (i.e. moving binocular disparity information) can be discriminated in a display that minimizes position information. Observers viewed dynamic... more
This study investigated the degree to which speed of stereoscopic translational motion (i.e. moving binocular disparity information) can be discriminated in a display that minimizes position information. Observers viewed dynamic random-element stereograms depicting arrays of randomly positioned stereoscopic dots that moved bidirectionally. Two tasks were performed: a speed discrimination task and a displacement discrimination task. Across a range of conditions, speed could be discriminated under conditions in which displacement could not. Thus, speed of stereoscopic motion can be discriminated when position information is minimal. This result indicates that stereoscopic motion is sensed in a way that cannot be explained by feature tracking or by inferring the motion from memory of position and time.
We present an efficient natural feature tracking pipeline solely implemented in JavaScript. It is embedded in a web technologybased Augmented Reality system running plugin-free in web browsers. The evaluation shows that real-time... more
We present an efficient natural feature tracking pipeline solely implemented in JavaScript. It is embedded in a web technologybased Augmented Reality system running plugin-free in web browsers. The evaluation shows that real-time framerates on desktop computers and while on smartphones interactive framerates are achieved.
Eye movement analysis is of importance in clinical studies and in research. Monitoring eye movements using video cameras has the advantage of being nonintrusive, inexpensive, and automated. The main objective of this paper is to propose... more
Eye movement analysis is of importance in clinical studies and in research. Monitoring eye movements using video cameras has the advantage of being nonintrusive, inexpensive, and automated. The main objective of this paper is to propose an efficient approach for real-time eye feature tracking from a sequence of eye images. To this end, first we formulate a dynamic model for eye feature tracking, which relates the measurements from the eye images to the tracking parameters. In our model, the center of the iris is chosen as the tracking parameter vector and the gray level centroid of the eye is chosen as the measurement vector. In our procedure for evaluating the gray level centroid, the preprocessing step such as edge detection and curve fitting need to be performed only for the first frame of the image sequence. A discrete Kalman filter is then constructed for the recursive estimation of the eye features, while taking into account the measurement noise. Experimental results are presented to demonstrate the accuracy aspects and the real-time applicability of the proposed approach.
ÐFactorization algorithms for recovering structure and motion from an image stream have many advantages, but they usually require a set of well-tracked features. Such a set is in generally not available in practical applications. There is... more
ÐFactorization algorithms for recovering structure and motion from an image stream have many advantages, but they usually require a set of well-tracked features. Such a set is in generally not available in practical applications. There is thus a need for making factorization algorithms deal effectively with errors in the tracked features. We propose a new and computationally efficient algorithm for applying an arbitrary error function in the factorization scheme. This algorithm enables the use of robust statistical techniques and arbitrary noise models for the individual features. These techniques and models enable the factorization scheme to deal effectively with mismatched features, missing features, and noise on the individual features. The proposed approach further includes a new method for Euclidean reconstruction that significantly improves convergence of the factorization algorithms. The proposed algorithm has been implemented as a modification of the Christy-Horaud factorization scheme, which yields a perspective reconstruction. Based on this implementation, a considerable increase in error tolerance is demonstrated on real and synthetic data. The proposed scheme can, however, be applied to most other factorization algorithms.
The solution to the camera registration and tracking problem serves Augmented Reality, in order to provide an enhancement to the user's cognitive perception of the real world and his/her situational awareness. By analyzing the five most... more
The solution to the camera registration and tracking problem serves Augmented Reality, in order to provide an enhancement to the user's cognitive perception of the real world and his/her situational awareness. By analyzing the five most representative tracking and feature detection techniques, we have concluded that the Camera Pose Initialization (CPI) problem, a relevant sub-problem in the overall camera tracking problem, is still far from being solved using straightforward and non-intrusive methods. The assessed techniques often use user inputs (i.e. mouse clicking) or auxiliary artifacts (i.e. fiducial markers) to solve the CPI problem. This paper presents a novel approach to real-time scale, rotation and luminance invariant natural feature tracking, in order to solve the CPI problem using totally automatic procedures. The technique is applicable for the case of planar objects with arbitrary topologies and natural textures, and can be used in Augmented Reality. We also present a heuristic method for feature clustering, which has revealed to be efficient and reliable. The presented work uses this novel feature detection technique as a baseline for a real-time and robust planar texture tracking algorithm, which combines optical flow, backprojection and template matching techniques. The paper presents also performance and precision results of the proposed technique.
This paper explores the possibilities to use robust object tracking algorithms based on visual model features as generator of visual references for UAV control. A Scale Invariant Feature Transform (SIFT) algorithm is used for detecting... more
This paper explores the possibilities to use robust object tracking algorithms based on visual model features as generator of visual references for UAV control. A Scale Invariant Feature Transform (SIFT) algorithm is used for detecting the salient points at every processed image, then a projective transformation for evaluating the visual references is obtained using a version of the RANSAC algorithm, in which a series of matched key-points pairs that fulfill the transformation equations are selected, rejecting otherwise the corrupted data. The system has been tested using diverse image sequences showing its capability to track objects significantly changed in scale, position, rotation, generating at the same time velocity references to the UAV flight controller. The robustness our approach has also been validated using images taken from real flights showing noise and lighting distortions. The results presented are promising in order to be used as reference generator for the control system.
Tracking features and exploring their temporal dynamics can aid scientists in identifying interesting time intervals in a simulation and serve as basis for performing quantitative analyses of temporal phenomena. In this paper, we develop... more
Tracking features and exploring their temporal dynamics can aid scientists in identifying interesting time intervals in a simulation and serve as basis for performing quantitative analyses of temporal phenomena. In this paper, we develop a novel approach for tracking subsets of isosurfaces, such as burning regions in simulated flames, which are defined as areas of high fuel consumption on a temperature isosurface. Tracking such regions as they merge and split over time can provide important insights into the impact of turbulence on the combustion process. However, the convoluted nature of the temperature isosurface and its rapid movement make this analysis particularly challenging. Our approach tracks burning regions by extracting a temperature isovolume from the four-dimensional space-time temperature field. It then obtains isosurfaces for the original simulation time steps and labels individual connected "burning" regions based on the local fuel consumption value. Based on this information, a boundary surface between burning and non-burning regions is constructed. The Reeb graph of this boundary surface is the tracking graph for burning regions.
A robot-assisted system for medical diagnostic ultrasound has been developed by the authors. This paper presents key features of the user interface used in this system. While the ultrasound transducer is positioned by a robot, the... more
A robot-assisted system for medical diagnostic ultrasound has been developed by the authors. This paper presents key features of the user interface used in this system. While the ultrasound transducer is positioned by a robot, the operator, the robot controller, and an ultrasound image processor have shared control over its motion. Ultrasound image features that can be selected by the operator are recognized and tracked by a variety of techniques. Based on feature tracking, ultrasound image servoing in three axes has been incorporated in the interface and can be enabled to automatically compensate, through robot motions, unwanted motions in the plane of the ultrasound beam. The stability and accuracy of the system is illustrated through a 3D reconstruction of an ultrasound phantom.
Abstract Enabling computer systems to recognize facial expressions and infer emotions from them in real time presents a challenging research topic. In this paper, a real-time method is proposed as a solution to the problem of facial... more
Abstract Enabling computer systems to recognize facial expressions and infer emotions from them in real time presents a challenging research topic. In this paper, a real-time method is proposed as a solution to the problem of facial expression classification in video sequences. We employ an automatic facial feature tracker to perform face localization and feature extraction. The facial feature displacements in the video stream are used as input to a Support Vector Machine classifier. We evaluate our method in terms of recognition ...
Nowadays accessibility to new technologies for everyone is a task to accomplish. A way to contribute to this aim is creating new interfaces based on computer vision using low cost devices such as webcams. In this paper a face-based... more
Nowadays accessibility to new technologies for everyone is a task to accomplish. A way to contribute to this aim is creating new interfaces based on computer vision using low cost devices such as webcams. In this paper a face-based perceptual user interface is presented. Our approach is divided in four steps: automatic face detection, best face features detection, feature tracking
We examine the surges of five glaciers in the Pakistan Karakoram using satellite remote sensing to investigate the dynamic nature of surges in this region and how they may be affected by climate. Surface velocity maps derived by... more
We examine the surges of five glaciers in the Pakistan Karakoram using satellite remote sensing to investigate the dynamic nature of surges in this region and how they may be affected by climate. Surface velocity maps derived by feature-tracking quantify the surge development spatially in relation to the terminus position, and temporally with reference to seasonal weather. We find that the season of surge initiation varies, that each surge develops gradually over several years, and that maximum velocities are recorded within the lowermost 10 km of the glacier. Measured peak surge velocities are between one and two orders of magnitude greater than during quiescence. We also note that two of the glaciers are of a type not previously reported to surge. The evidence points towards recent Karakoram surges being controlled by thermal rather than hydrological conditions, coinciding with high-altitude warming from long-term precipitation and accumulation patterns.
This paper explores the possibilities to use robust object tracking algorithms based on visual model features as generator of visual references for UAV control. A Scale Invariant Feature Transform (SIFT) algorithm is used for detecting... more
This paper explores the possibilities to use robust object tracking algorithms based on visual model features as generator of visual references for UAV control. A Scale Invariant Feature Transform (SIFT) algorithm is used for detecting the salient points at every processed image, then a projective transformation for evaluating the visual references is obtained using a version of the RANSAC algorithm, in which a series of matched key-points pairs that fulfill the transformation equations are selected, rejecting otherwise the corrupted data. The system has been tested using diverse image sequences showing its capability to track objects significantly changed in scale, position, rotation, generating at the same time velocity references to the UAV flight controller. The robustness our approach has also been validated using images taken from real flights showing noise and lighting distortions. The results presented are promising in order to be used as reference generator for the control system.
In this paper, we present three techniques for 6DOF natural feature tracking in real time on mobile phones. We achieve interactive frame rates of up to 30 Hz for natural feature tracking from textured planar targets on current generation... more
In this paper, we present three techniques for 6DOF natural feature tracking in real time on mobile phones. We achieve interactive frame rates of up to 30 Hz for natural feature tracking from textured planar targets on current generation phones. We use an approach based on heavily modified state-of-the-art feature descriptors, namely SIFT and Ferns plus a template-matching-based tracker. While SIFT is known to be a strong, but computationally expensive feature descriptor, Ferns classification is fast, but requires large amounts of memory. This renders both original designs unsuitable for mobile phones. We give detailed descriptions on how we modified both approaches to make them suitable for mobile phones. The template-based tracker further increases the performance and robustness of the SIFT-and Ferns-based approaches. We present evaluations on robustness and performance and discuss their appropriateness for Augmented Reality applications.
This paper addresses robust feature tracking. We extend the well-known Shi-Tomasi-Kanade tracker by introducing an automatic scheme for rejecting spurious features. We employ a simple and efficient outlier rejection rule, called X84, and... more
This paper addresses robust feature tracking. We extend the well-known Shi-Tomasi-Kanade tracker by introducing an automatic scheme for rejecting spurious features. We employ a simple and efficient outlier rejection rule, called X84, and prove that its theoretical assumptions are satisfied in the feature tracking scenario. Experiments with real and synthetic images confirm that our algorithm makes good features track better; we show a quantitative example of the benefits introduced by the algorithm for the case of fundamental matrix estimation. The complete code of the robust tracker is available via ftp.
The understanding of the phenomena involved in the ventricular flow is becoming more and more important because of two main reasons: the continuous improvements in the field of the diagnostic techniques and the increasing popularity of... more
The understanding of the phenomena involved in the ventricular flow is becoming more and more important because of two main reasons: the continuous improvements in the field of the diagnostic techniques and the increasing popularity of the prosthetics devices. As a matter of fact, more accurate investigation techniques gives the chance to better diagnose diseases before they becomes dangerous for the health of the patient. On the other hand, the diffusion of prosthetic devices requires very detailed assessment both of their effectiveness and of their possible side-effects - e.g. haemolysis - and these evaluations are deeply linked to the fluid dynamics. The present work is focused on the experimental investigation of the flow in the left ventricle of the human heart, as this flow is crucial for the effective pumping of the blood through the circulatory system and its anomalies are recognised to be a reliable precursor of major heart diseases. To study the ventricular flow a conceptu...
This paper addresses the problem of computing the fundamental matrix which describes a geometric relationship between a pair of stereo images : the epipolar geometry. In the uncalibrated case, epipolar geometry captures all the 3D... more
This paper addresses the problem of computing the fundamental matrix which describes a geometric relationship between a pair of stereo images : the epipolar geometry. In the uncalibrated case, epipolar geometry captures all the 3D information available from the scene. It is of a central importance for problems such as 3D reconstruction, self-calibration and feature tracking. Hence, the computation of the fundamental matrix is of great interest. The existing methods 10] uses two steps : a linear step followed by a non linear one. But the linear step gives rarely a close form solution for the fundamental matrix resulting in more iterations for the non linear step which is not guaranteed to converge to the correct solution. In this paper, a novel method based on virtual parallax is proposed. The problem is formulated di erently, instead of computing directely the 3 3 fundamental matrix, we compute a homography with one epipole position, and show that this is equivalent to compute the fundamental matrix. Simple equations are derived by reducing the number of parameters to estimate. As a consequence, we obtain an accurate fundamental matrix of rank 2 with a stable linear computation. Experiments with simulated and real images validate our method and clearly show the improvement over the existing methods.
This paper proposes vision-based techniques for localizing an unmanned aerial vehicle (UAV) by means of an on-board camera. Only natural landmarks provided by a feature tracking algorithm will be considered, without the help of visual... more
This paper proposes vision-based techniques for localizing an unmanned aerial vehicle (UAV) by means of an on-board camera. Only natural landmarks provided by a feature tracking algorithm will be considered, without the help of visual beacons or landmarks with known positions. First, it is described a monocular visual odometer which could be used as a backup system when the accuracy of GPS is reduced to critical levels. Homography-based techniques are used to compute the UAV relative translation and rotation by means of the images gathered by an onboard camera. The analysis of the problem takes into account the stochastic nature of the estimation and practical implementation issues. The visual odometer is then integrated into a simultaneous localization and mapping (SLAM) scheme in order to reduce the impact of cumulative errors in odometry-based position estimation 138 J Intell Robot Syst (2009) 54:137-161 approaches. Novel prediction and landmark initialization for SLAM in UAVs are presented. The paper is supported by an extensive experimental work where the proposed algorithms have been tested and validated using real UAVs.
This paper introduces a system that can recognize different type of paper-folding by users. The system allows users to register and use their desired paper in the interaction, and detect the folding by using Speed Up Robust Feature (SURF)... more
This paper introduces a system that can recognize different type of paper-folding by users. The system allows users to register and use their desired paper in the interaction, and detect the folding by using Speed Up Robust Feature (SURF) algorithm. The paper also describes a paper-based tower defense game which has been developed as a proof of concept of our method. This method can be considered as the initial step for seamlessly migrating meaningful traditional art of origami into the digital world as a part of the interactive media.
The domain of vision and navigation often includes applications for feature tracking as well as simultaneous localization and mapping (SLAM). As these problems require computationally demanding solutions, it is challenging to achieve high... more
The domain of vision and navigation often includes applications for feature tracking as well as simultaneous localization and mapping (SLAM). As these problems require computationally demanding solutions, it is challenging to achieve high performance without sacrificing the fidelity of results or otherwise consuming excessive amounts of energy. Our goal then is to accelerate the applications in this domain to meet real-time performance constraints while simultaneously reducing energy consumption and avoiding degradation in the quality of results. To achieve this domain-specific acceleration, we model a customizable hardware platform based on the 3D integration of a Field-Programmable Gate Array (FPGA) atop a standard chip multiprocessor (CMP) with Through-Silicon Vias (TSVs) used for communication between the two layers. Furthermore, partial automation of accelerator creation using C-to-RTL tools allows for analysis of a wide range of candidates. In this work, we mathematically characterize viable accelerator candidates, describe ideal application code for acceleration, and outline a dynamic-programming-based methodology for selecting an optimal set of candidates. Our results yield an overall speedup and energy reduction of 9.56X along with a 94X EDP reduction for the domain. Finally, we investigate the effects of various interconnect models on our performance improvements. Overall, our proposed system is shown to be highly efficient in both accelerating performance and saving energy for compute-intensive applications in this domain.
This paper presents a mobile Augmented Reality (AR) system supporting architects in visualizing 3D models in real-time on site. We describe how vision based feature tracking techniques can help architects making decisions on site... more
This paper presents a mobile Augmented Reality (AR) system supporting architects in visualizing 3D models in real-time on site. We describe how vision based feature tracking techniques can help architects making decisions on site concerning visual impact assessment. The AR system consists only of a tablet PC, a web cam and a custom software component. No preparation of the site is required and accurate initialization, which has not been addressed by previous papers on real-time feature tracking, is achieved by a combination of automatic and manual techniques.
Minimally invasive surgery has been established as an important way forward in surgery for reducing patient trauma and hospitalization with improved prognosis. The introduction of robotic assistance enhances the manual dexterity and... more
Minimally invasive surgery has been established as an important way forward in surgery for reducing patient trauma and hospitalization with improved prognosis. The introduction of robotic assistance enhances the manual dexterity and accuracy of instrument manipulation. Further development of the field in using pre-and intra-operative imaging guidance requires the integration of the general anatomy of the patient with clear pathologic indications and geometrical information for preoperative planning and intra-operative manipulation. It also requires effective visualization and the recreation of haptic and tactile sensing with dynamic active constraints to improve consistency and safety of the surgical procedures. This paper describes key technical considerations of tissue deformation tracking, 3D reconstruction, subject-specific modeling, image guidance and augmented reality for robotic assisted minimally invasive surgery. It highlights the importance of adapting preoperative surgical planning according to intra-operative data and illustrates how dynamic information such as tissue deformation can be incorporated into the surgical navigation framework. Some of the recent trends are discussed in terms of instrument design and the usage of dynamic active constraints and human-robot perceptual docking for robotic assisted minimally invasive surgery.
A vision based navigation system is a basic tool to provide autonomous operations of unmanned vehicles. For offroad navigation that means that the vehicle equipped with a stereo vision system and perhaps a laser ranging device shall be... more
A vision based navigation system is a basic tool to provide autonomous operations of unmanned vehicles. For offroad navigation that means that the vehicle equipped with a stereo vision system and perhaps a laser ranging device shall be able to maintain a high level of autonomy under various illumination conditions and with little a priori information about the underlying scene.
In this paper we present a natural feature tracking algorithm based on on-line boosting used for localizing a mobile computer. Mobile augmented reality requires highly accurate and fast six degrees of freedom tracking in order to provide... more
In this paper we present a natural feature tracking algorithm based on on-line boosting used for localizing a mobile computer. Mobile augmented reality requires highly accurate and fast six degrees of freedom tracking in order to provide registered graphical overlays to a mobile user. With advances in mobile computer hardware, vision-based tracking approaches have the potential to provide efficient solutions that are non-invasive in contrast to the currently dominating marker-based approaches. We propose to use a tracking approach which can use in an unknown environment, i.e. the target has not be known beforehand. The core of the tracker is an on-line learning algorithm, which updates the tracker as new data becomes available. This is suitable in many mobile augmented reality applications. We demonstrate the applicability of our approach on tasks where the target objects are not known beforehand, i.e. interactive planing. (a) 1.
In this paper we present a real-time simultaneous localization and mapping system which uses a stereo camera as its only input. We combine the benefits of KLT feature tracking, which include high speed and robustness to repetitive... more
In this paper we present a real-time simultaneous localization and mapping system which uses a stereo camera as its only input. We combine the benefits of KLT feature tracking, which include high speed and robustness to repetitive features, with wide baseline features, which allow for feature matching after large camera motions. Updating the map of feature locations and camera poses is considerably more expensive than performing KLT tracking. For this reason we use the optical flow measured by the KLT tracker to adaptively select key frames for which we do a full map and camera pose update. In this way we limit the processing to only "interesting" parts of the video sequence. Additionally, we maintain a consistent scene scale at low cost by using a GPU implementation of multi-camera scene flow, a generalization of KLT to the motion of image features in three dimensions. The system uses multiple sub-maps; scalable, bag of features recognition and geometric verification to recover from motion estimation failure or "kidnapping". This architecture allows the robot to grow the existing map online and in real time while storing all of the data necessary for an off-line optimization to complete loops. We demonstrate the robustness of our system in a challenging indoor environment that includes semi-reflective glass walls and people moving in the scene.
Recent advances in technology enable portable, even wearable, computers to be equipped with wireless interfaces, which allows data transactions even while mobile. Combined with mixed reality (MR), mobile computing exploits a promising... more
Recent advances in technology enable portable, even wearable, computers to be equipped with wireless interfaces, which allows data transactions even while mobile. Combined with mixed reality (MR), mobile computing exploits a promising field for wearable computers. Natural and nonobtrusive means of interaction call for new devices, which should be simple to use, and provide effective tracking methods in unprepared environments for MR. In this paper, a new interaction hardware tilt pad designed using accelerometers and wireless devices is introduced. This is combined with two new natural feature-tracking algorithms based on geometrical image constraints. The first is based on epipolar geometry and provides a general description of the constraints on image flow between two static scenes. The second is based on the calculation of a homography relationship between the current frame and a stored representation of the scene. We assessed these algorithms compared with the current optical flow calculation algorithm across a number of criteria including robustness, speed, and accuracy. Finally, we demonstrated an MR computer game application combining the new tracking method and the hardware tilt pad. Videos of the tilt pad and application of tilt pad Pacman game can be found at the web site: http://www.mixedrealitylab.org.
In this paper we present a novel approach to robust visual servoing. This method removes the feature track- ing step from a typical visual servoing algorithm. We do not need correspondences of the features for deriving the control signal.... more
In this paper we present a novel approach to robust visual servoing. This method removes the feature track- ing step from a typical visual servoing algorithm. We do not need correspondences of the features for deriving the control signal. This is achieved by modeling the image features as a Mixture of Gaussians in the current as well as desired images. Using Lyapunov theory, a control signal is derived to minimize a distance function between the two Gaussian mixtures. The distance function is given in a closed form, and its gradient can be efficiently computed and used to control the system. For simplicity, we first consider the 2D motion case. Then, the general case is presented by introducing the depth distribution of the features to control the six degrees of freedom. Experiments are conducted within a simulation framework to validate our proposed method. Visual servo control or visual servoing is the process of positioning a robot end effector with respect to an object or a set ...
This paper proposes vision-based techniques for localizing an unmanned aerial vehicle (UAV) by means of an on-board camera. Only natural landmarks provided by a feature tracking algorithm will be considered, without the help of visual... more
This paper proposes vision-based techniques for localizing an unmanned aerial vehicle (UAV) by means of an on-board camera. Only natural landmarks provided by a feature tracking algorithm will be considered, without the help of visual beacons or landmarks with known positions. First, it is described a monocular visual odometer which could be used as a backup system when the accuracy of GPS is reduced to critical levels. Homography-based techniques are used to compute the UAV relative translation and rotation by means of the images gathered by an onboard camera. The analysis of the problem takes into account the stochastic nature of the estimation and practical implementation issues. The visual odometer is then integrated into a simultaneous localization and mapping (SLAM) scheme in order to reduce the impact of cumulative errors in odometry-based position estimation approaches. Novel prediction and landmark initialization for SLAM in UAVs are presented. The paper is supported by an extensive experimental work where the proposed algorithms have been tested and validated using real UAVs.
This paper proposes vision-based techniques for localizing an unmanned aerial vehicle (UAV) by means of an on-board camera. Only natural landmarks provided by a feature tracking algorithm will be considered, without the help of visual... more
This paper proposes vision-based techniques for localizing an unmanned aerial vehicle (UAV) by means of an on-board camera. Only natural landmarks provided by a feature tracking algorithm will be considered, without the help of visual beacons or landmarks with known positions. First, it is described a monocular visual odometer which could be used as a backup system when the accuracy of GPS is reduced to critical levels. Homography-based techniques are used to compute the UAV relative translation and rotation by means of the images gathered by an onboard camera. The analysis of the problem takes into account the stochastic nature of the estimation and practical implementation issues. The visual odometer is then integrated into a simultaneous localization and mapping (SLAM) scheme in order to reduce the impact of cumulative errors in odometry-based position estimation approaches. Novel prediction and landmark initialization for SLAM in UAVs are presented. The paper is supported by an extensive experimental work where the proposed algorithms have been tested and validated using real UAVs.
▶ Maximally Stable Volumes are automatically detected features. ▶ Maximally Stable Volumes allow stable tracking through time. ▶ Robust to nonuniform illumination and large changes in eye position. ▶ Faster than cross-correlation.
This paper addresses the problem of computing the three-dimensional (3-D) path of a moving rigid object using a calibrated stereoscopic vision setup. The proposed system begins by detecting feature points on the moving object. By tracking... more
This paper addresses the problem of computing the three-dimensional (3-D) path of a moving rigid object using a calibrated stereoscopic vision setup. The proposed system begins by detecting feature points on the moving object. By tracking these points over time, it produces clouds of 3-D points that can be registered, thus giving information about the underlying camera motion. A novel correction scheme that compensates for the accumulated error in the computed positions by automatic detection of loop-back points in the movement of the object is also proposed. An application to object modeling is presented in which a handheld object is moved in front of a camera and is reconstructed using silhouette intersection.