Correlation Filter-based trackers have recently achieved excellent performance, showing great robustness to challenging situations exhibiting motion blur and illumination changes. However, since the model that they learn depends strongly... more
Correlation Filter-based trackers have recently achieved excellent performance, showing great robustness to challenging situations exhibiting motion blur and illumination changes. However, since the model that they learn depends strongly on the spatial layout of the tracked object, they are notoriously sensitive to deformation. Models based on colour statistics have complementary traits: they cope well with variation in shape, but suffer when illumination is not consistent throughout a sequence. Moreover, colour distributions alone can be insufficiently discriminative. In this paper, we show that a simple tracker combining complementary cues in a ridge regression framework can operate faster than 80 FPS and outperform not only all entries in the popular VOT14 competition, but also recent and far more sophisticated trackers according to multiple benchmarks.
We present a novel solution to the problem of recovering and tracking the 3D position, orientation and full articulation of a human hand from markerless visual observations obtained by a Kinect sensor. We treat this as an optimization... more
We present a novel solution to the problem of recovering and tracking the 3D position, orientation and full articulation of a human hand from markerless visual observations obtained by a Kinect sensor. We treat this as an optimization problem, seeking for the hand model parameters that minimize the discrepancy between the appearance
and 3D structure of hypothesized instances of a hand model and actual hand observations. This optimization problem is effectively solved using a variant of Particle Swarm
Optimization (PSO). The proposed method does not require special markers and/or a complex image acquisition setup. Being model based, it provides continuous solutions
to the problem of tracking hand articulations. Extensive experiments with a prototype GPU-based implementation of the proposed method demonstrate that accurate and ro-
bust 3D tracking of hand articulations can be achieved in near real-time (15Hz).
We propose a method that relies on markerless visual observations to track the full articulation of two hands that interact with each-other in a complex, unconstrained manner. We formulate this as an optimization problem whose... more
We propose a method that relies on markerless visual observations to track the full articulation of two hands that interact with each-other in a complex, unconstrained manner. We formulate this as an optimization problem whose 54-dimensional parameter space represents all possible configurations of two hands, each represented as a kinematic structure with 26 Degrees of Freedom (DoFs). To solve this problem, we employ Particle Swarm Optimization (PSO), an evolutionary, stochastic optimization method with the objective of finding the two-hands configuration that best explains the RGB-D observations provided by a Kinect sensor. To the best of our knowledge, the proposed method is the first to attempt and achieve the articulated motion tracking of two strongly interacting hands. Extensive quantitative and qualitative experiments with simulated and real world image sequences demonstrate that an accurate and efficient solution of this problem is indeed feasible.
We propose a new tracking framework with an attentional mechanism that chooses a subset of the associated correlation filters for increased robustness and computational efficiency. The subset of filters is adaptively selected by a deep... more
We propose a new tracking framework with an attentional mechanism that chooses a subset of the associated correlation filters for increased robustness and computational efficiency. The subset of filters is adaptively selected by a deep attentional network according to the dynamic properties of the tracking target. Our contributions are manifold, and are summarised as follows: (i) Introducing the Atten-tional Correlation Filter Network which allows adaptive tracking of dynamic targets. (ii) Utilising an attentional network which shifts the attention to the best candidate modules, as well as predicting the estimated accuracy of currently inactive modules. (iii) Enlarging the variety of correlation filters which cover target drift, blurriness, occlusion, scale changes, and flexible aspect ratio. (iv) Validating the robust-ness and efficiency of the attentional mechanism for visual tracking through a number of experiments. Our method achieves similar performance to non real-time trackers, and state-of-the-art performance amongst real-time trackers.
A crucial prerequisite for recycling forming an integral part of municipal solid waste (MSW) management is sorting of useful materials from source-separated MSW. Researchers have been exploring automated sorting techniques to improve the... more
A crucial prerequisite for recycling forming an integral part of municipal solid waste (MSW) management is sorting of useful materials from source-separated MSW. Researchers have been exploring automated sorting techniques to improve the overall efficiency of recycling process. This paper reviews recent advances in physical processes, sensors, and actuators used as well as control and autonomy related issues in the area of automated sorting and recycling of source-separated MSW. We believe that this paper will provide a comprehensive overview of the state of the art and will help future system designers in the area. In this paper, we also present research challenges in the field of automated waste sorting and recycling.
We propose a new context-aware correlation filter based tracking framework to achieve both high computational speed and state-of-the-art performance among real-time trackers. The major contribution to the high computational speed lies in... more
We propose a new context-aware correlation filter based tracking framework to achieve both high computational speed and state-of-the-art performance among real-time trackers. The major contribution to the high computational speed lies in the proposed deep feature compression that is achieved by a context-aware scheme utilizing multiple expert auto-encoders; a context in our framework refers to the coarse category of the tracking target according to appearance patterns. In the pre-training phase, one expert auto-encoder is trained per category. In the tracking phase, the best expert auto-encoder is selected for a given target, and only this auto-encoder is used. To achieve high tracking performance with the compressed feature map, we introduce extrinsic denoising processes and a new orthogonality loss term for pre-training and fine-tuning of the expert auto-encoders. We validate the proposed context-aware framework through a number of experiments, where our method achieves a comparable performance to state-of-the-art track-ers which cannot run in real-time, while running at a significantly fast speed of over 100 fps.
This paper addresses the problem of automatically localizing dominant objects as spatio-temporal tubes in a noisy collection of videos with minimal or even no supervision. We formulate the problem as a combination of two complementary... more
This paper addresses the problem of automatically localizing dominant objects as spatio-temporal tubes in a noisy collection of videos with minimal or even no supervision. We formulate the problem as a combination of two complementary processes: discovery and tracking. The first one establishes correspondences between prominent regions across videos, and the second one associates similar object regions within the same video. Interestingly, our algorithm also discovers the implicit topology of frames associated with instances of the same object class across different videos, a role normally left to supervisory information in the form of class labels in conventional image and video understanding methods. Indeed, as demonstrated by our experiments, our method can handle video collections featuring multiple object classes, and substantially outperforms the state of the art in colocalization, even though it tackles a broader problem with much less supervision.
Content-Based Image Retrieval (CBIR) locates, retrieves and displays images alike to one given as a query, using a set of features. It demands accessible data in medical archives and from medical equipment, to infer meaning after some... more
Content-Based Image Retrieval (CBIR) locates, retrieves and displays images alike to one given as a query, using a set of features. It demands accessible data in medical archives and from medical equipment, to infer meaning after some processing. A problem similar in some sense to the target image can aid clinicians. CBIR complements text-based retrieval and improves evidence-based diagnosis, administration, teaching, and research in healthcare. It facilitates visual/automatic diagnosis and decision-making in real-time remote consultation/screening, store-and-forward tests, home care assistance and overall patient surveillance. Metrics help comparing visual data and improve diagnostic. Specially designed architectures can benefit from the application scenario. CBIR use calls for file storage standardization, querying procedures, efficient image transmission, realistic databases, global availability, access simplicity, and Internet-based structures. This chapter recommends important and complex aspects required to handle visual content in healthcare.
This work considers the classification of power quality disturbances based on VMD (Variational Mode Decomposition) and EWT (Empirical Wavelet Transform) using SVM (Support Vector Machine). Performance comparison of VMD over EWT is done... more
This work considers the classification of power quality disturbances based on VMD (Variational Mode Decomposition) and EWT (Empirical Wavelet Transform) using SVM (Support Vector Machine). Performance comparison of VMD over EWT is done for producing feature vectors that can extract salient and unique nature of these disturbances. In this paper, these two adaptive signal processing methods are used to produce three Intrinsic Mode Function (IMF) components of power quality signals. Feature vectors produced by finding sines and cosines of statistical parameter vector of three different IMF candidates are used for training SVM. Validation for six different classes of power qualities including normal sinusoidal signal, sag, swell, harmonics, sag with harmonics, swell with harmonics is performed using synthetic data in MATLAB. Classification results using SVM shows that VMD outperforms over EWT for feature extraction process and the classification accuracy is tabled.
This paper presents the state-of-the-art and reviews the state-of-research of acoustic sensors used for a variety of navigation and guidance applications on air and surface vehicles. In particular, this paper focuses on echolocation,... more
This paper presents the state-of-the-art and reviews the state-of-research of acoustic sensors used for a variety of navigation and guidance applications on air and surface vehicles. In particular, this paper focuses on echolocation, which is widely utilized in nature by certain mammals (e.g., cetaceans and bats). Although acoustic sensors have been extensively adopted in various engineering applications, their use in navigation and guidance systems is yet to be fully exploited. This technology has clear potential for applications in air and surface navigation/guidance for intelligent transport systems (ITS), especially considering air and surface operations indoors and in other environments where satellite positioning is not available. Propagation of sound in the atmosphere is discussed in detail, with all potential attenuation sources taken into account. The errors introduced in echolocation measurements due to Doppler, multipath and atmospheric effects are discussed, and an uncertainty analysis method is presented for ranging error budget prediction in acoustic navigation applications. Considering the design challenges associated with monostatic and multi-static sensor implementations and looking at the performance predictions for different possible configurations, acoustic sensors show clear promises in navigation, proximity sensing, as well as obstacle detection and tracking. The integration of acoustic sensors in multi-sensor navigation systems is also considered towards the end of the paper and a low Size, Weight and Power, and Cost (SWaP-C) sensor integration architecture is presented for possible introduction in air and surface navigation systems.
We present a novel probabilistic framework that jointly models individuals and groups for tracking. Managing groups is challenging, primarily because of their nonlinear dynamics and complex layout which lead to repeated splitting and... more
We present a novel probabilistic framework that jointly models individuals and groups for tracking. Managing groups is challenging, primarily because of their nonlinear dynamics and complex layout which lead to repeated splitting and merging events. The proposed approach assumes a tight relation of mutual support between the modeling of individuals and groups, promoting the idea that groups are better modeled if individuals are considered and vice versa. This concept is translated in a mathematical model using a decentralized particle filtering framework which deals with a joint individual-group state space. The model factorizes the joint space into two dependent subspaces, where individuals and groups share the knowledge of the joint individual-group distribution. The assignment of people to the different groups (and thus group initialization, split and merge) is implemented by two alternative strategies: using classifiers trained beforehand on statistics of group configurations, and through online learning of a Dirichlet process mixture model, assuming that no training data is available before tracking. These strategies lead to two different methods that can be used on top of any person detector (simulated using the ground truth in our experiments). We provide convincing results on two recent challenging tracking benchmarks.
A method is investigated for the motion estimation and the model identification of a free-floating target in space. Motion estimation relies on range data measurements, which are here simulated for analyzing the method. The work is... more
A method is investigated for the motion estimation and the model identification of a free-floating target in space. Motion estimation relies on range data measurements, which are here simulated for analyzing the method. The work is motivated by the necessity to provide an efficient long-term motion prediction algorithm, in the order of 100 seconds, to support planning of complex maneuvers or tracking during long phases without observation. The method is evaluated for different scenarios of range measurements and for a series of target motions, including translation and rotation or pure rotation about different axes, which may represent typical scenarios for tumbling objects in orbit.
Tracking-by-detection is a widely used paradigm for multi-person tracking but is affected by variations in crowd density, obstacles in the scene, varying illumination, human pose variation, scale changes, etc. We propose an improved... more
Tracking-by-detection is a widely used paradigm for multi-person tracking but is affected by variations in crowd density, obstacles in the scene, varying illumination, human pose variation, scale changes, etc. We propose an improved tracking-by-detection framework for multi-person tracking where the appearance model is formulated as a template ensemble updated online given detections provided by a pedestrian detector. We employ a hierarchy of trackers to select the most effective tracking strategy and an algorithm to adapt the conditions for trackers’ initialization and termination. Our formulation is online and does not require calibration information. In experiments with four pedestrian tracking benchmark datasets, our formulation attains accuracy that is comparable to, or better than, the state-of-the-art pedestrian trackers that must exploit calibration information and operate offline.
In vitro motility assays, in which fluorescently labeled actin filaments are propelled by myosin molecules adhered to a glass coverslip, require that actin filament velocity be determined. We have developed a computer-assisted filament... more
In vitro motility assays, in which fluorescently labeled actin filaments are propelled by myosin molecules adhered to a glass coverslip, require that actin filament velocity be determined. We have developed a computer-assisted filament tracking system that reduced the analysis time, minimized investigator bias, and provided greater accuracy in locating actin filaments in video images. The tracking routine successfully tracked filaments under experimental conditions where filament density, size, and extent of photobleaching varied dramatically. Videotaped images of actin filament motility were digitized and processed to enhance filament image contrast relative to background. Once processed, filament images were cross correlated between frames and a filament path was determined. The changes in filament centroid or center position between video frames were then used to calculate filament velocity. The tracking routine performance was evaluated and the sources of noise that contributed to errors in velocity were identified and quantified. Errors originated in algorithms for filament centroid determination and in the choice of sampling interval between video frames. With knowledge of these error sources, the investigator can maximize the accuracy of the velocity calculation through access to user-definable computer program parameters.
Current multi-person localisation and tracking systems have an over reliance on the use of appearance models for target re-identification and almost no approaches employ a complete deep learning solution for both objectives. We present a... more
Current multi-person localisation and tracking systems have an over reliance on the use of appearance models for target re-identification and almost no approaches employ a complete deep learning solution for both objectives. We present a novel, complete deep learning framework for multi-person localisation and tracking. In this context we first introduce a light weight sequential Generative Adversarial Network architecture for person localisation, which overcomes issues related to occlusions and noisy detections, typically found in a multi person environment. In the proposed tracking framework we build upon recent advances in pedestrian trajectory prediction approaches and propose a novel data association scheme based on predicted trajectories. This removes the need for computationally expensive person re-identification systems based on appearance features and generates human like trajectories with minimal fragmentation. The proposed method is evaluated on multiple public benchmarks including both static and dynamic cameras and is capable of generating outstanding performance, especially among other recently proposed deep neural network based approaches.
—The kernelized correlation filter (KCF) is one of the state-of-the-art object trackers. However, it does not reasonably model the distribution of correlation response during tracking process, which might cause the drifting problem,... more
—The kernelized correlation filter (KCF) is one of the state-of-the-art object trackers. However, it does not reasonably model the distribution of correlation response during tracking process, which might cause the drifting problem, especially when targets undergo significant appearance changes due to occlusion, camera shaking, and/or deformation. In this paper, we propose an output constraint transfer (OCT) method that by modeling the distribution of correlation response in a Bayesian optimization framework is able to mitigate the drifting problem. OCT builds upon the reasonable assumption that the correlation response to the target image follows a Gaussian distribution, which we exploit to select training samples and reduce model uncertainty. OCT is rooted in a new theory which transfers data distribution to a constraint of the optimized variable, leading to an efficient framework to calculate correlation filters. Extensive experiments on a commonly used tracking benchmark show that the proposed method significantly improves KCF, and achieves better performance than other state-of-the-art track-ers. To encourage further developments, the source code is made available.
In several hand-object(s) interaction scenarios, the change in the objects' state is a direct consequence of the hand's motion. This has a straightforward representation in Newtonian dynamics. We present the first approach that exploits... more
In several hand-object(s) interaction scenarios, the change in the objects' state is a direct consequence of the hand's motion. This has a straightforward representation in Newtonian dynamics. We present the first approach that exploits this observation to perform model-based 3D tracking of a table-top scene comprising passive objects and an active hand. Our forward modelling of 3D hand-object(s) interaction regards both the appearance and the physical state of the scene and is parameterized over the hand motion (26 DoFs) between two successive instants in time. We demonstrate that our approach manages to track the 3D pose of all objects and the 3D pose and articulation of the hand by only searching for the parameters of the hand motion. In the proposed framework, covert scene state is inferred by connecting it to the overt state, through the incorporation of physics. Thus, our tracking approach treats a variety of challenging observability issues in a principled manner, without the need to resort to heuristics.
In spite of over two decades of intense research, illumination and pose invariance remain prohibitively challenging aspects of face recognition for most practical applications. The objective of this work is to recognize faces using video... more
In spite of over two decades of intense research, illumination and pose invariance remain prohibitively challenging aspects of face recognition for most practical applications. The objective of this work is to recognize faces using video sequences both for training and recognition input, in a realistic, unconstrained setup in which lighting, pose and user motion pattern have a wide variability and face images are of low resolution. The central contribution is an illumination invariant, which we show to be suitable for recognition from video of loosely constrained head motion. In particular there are three contributions: (i) we show how a photometric model of image formation can be combined with a statistical model of generic face appearance variation to exploit the proposed invariant and generalize in the presence of extreme illumination changes; (ii) we introduce a video sequence "re-illumination" algorithm to achieve fine alignment of two video sequences; and (iii) we use the smoothness of geodesically local appearance manifold structure and a robust same-identity likelihood to achieve robustness to unseen head poses. We describe a fully automatic recognition system based on the proposed method and an extensive evaluation on 323 individuals and 1474 video sequences with extreme illumination, pose and head motion variation. Our system consistently achieved a nearly perfect recognition rate (over 99.7% on all four databases).
We present a novel method for on-line, joint object tracking and segmentation in a monocular video captured by a possibly moving camera. Our goal is to integrate tracking and fine segmentation of a single, previously unseen, potentially... more
We present a novel method for on-line, joint object tracking and segmentation in a monocular video captured by a possibly moving camera. Our goal is to integrate tracking and fine segmentation of a single, previously unseen, potentially non-rigid object of unconstrained appearance, given its segmentation in the first frame of an image sequence as the only prior information. To this end, we tightly couple an existing kernel-based object tracking method with Random Walker-based image segmentation. Bayesian inference mediates between tracking and segmentation, enabling effective data fusion of pixel-wise spatial and color visual cues. The fine segmentation of an object at a certain frame provides tracking with reliable initialization for the next frame, closing the loop between the two building blocks of the proposed framework. The effectiveness of the proposed methodology is evaluated experimentally by comparing it to a large collection of state of the art tracking and video-based object segmentation methods on the basis of a data set consisting of several challenging image sequences for which ground truth data is available.
We propose a framework for querying a distributed database of video surveillance data in order to retrieve a set of likely paths of a person moving in the area under surveillance. In our framework, each camera of the surveillance system... more
We propose a framework for querying a distributed database of video surveillance data in order to retrieve a set of likely paths of a person moving in the area under surveillance. In our framework, each camera of the surveillance system locally pro- cesses the data and stores video sequences in a storage unit and the metadata for each detected person in the distributed database. A pedestrian’s path is formulated as a dynamic Bayesian network (DBN) to model the dependencies between subsequent observa- tions of the person as he makes his way through the camera net- work. We propose a tool by which the analyst can pose queries about where a certain person appeared while moving in the site during a specified temporal window. The DBN is used in an al- gorithm that finds potentially relevant metadata records from the distributed databases and then assembles these into probable paths that the person took in the camera network. Finally, the system presents the analyst with the retrieved set of likely paths in ranked order. The computational complexity for our method is quadratic in the number of camera nodes and linear in the number of moving persons. Experiments were carried out on simulated data to test the system with large distributed databases and in a real setting in which six databases store the data from six video cameras. The simulations confirm that our method provides good results with varying numbers of cameras and persons moving in the network. In a real setting, the method reconstructs paths across the camera network with approximatively 75% accuracy at rank 1.
— In this paper, we propose a new tractable Bernoulli filter based on the random matrix framework to track an extended target in an ultra-wideband (UWB) sensor network. The resulting filter jointly tracks the kinematic and shape... more
— In this paper, we propose a new tractable Bernoulli filter based on the random matrix framework to track an extended target in an ultra-wideband (UWB) sensor network. The resulting filter jointly tracks the kinematic and shape parameters of the target and is called the extended target Gaussian inverse Wishart Bernoulli (ET-GIW-Ber) filter. Closed form expressions for the ET-GIW-Ber filter recursions are presented. A clustering step is inserted into the measurement update stage in order to have a computationally tractable filter. In addition, a new method that is consistent with the applied clustering method is embedded into the filter recursions in order to adaptively estimate the time-varying number of measurements of the extended target. The simulation results demonstrate the robust and effective performance of the proposed filter. Furthermore, real data collected from a UWB sensor network are used to assess the performance of the proposed filter. It is shown that the proposed filter yields a very promising performance in estimation of the kinematic and shape parameters of the target.
We explore how to track people and furniture based on a high-resolution pressure-sensitive floor. Gravity pushes people and objects against the floor, causing them to leave imprints of pressure distributions across the surface. While the... more
We explore how to track people and furniture based on a high-resolution pressure-sensitive floor. Gravity pushes people and objects against the floor, causing them to leave imprints of pressure distributions across the surface. While the sensor is limited to sensing direct contact with the surface, we can sometimes conclude what takes place above the surface, such as users’ poses or collisions with virtual objects. We demonstrate how to extend the range of this approach by sensing through passive furniture that propagates pressure to the floor. To explore our approach, we have created an 8 m2 back-projected floor prototype, termed GravitySpace, a set of passive touch-sensitive furniture, as well as algorithms for identifying users, furniture, and poses. Pressure-based sensing on the floor offers four potential benefits over camera- based solutions: (1) it provides consistent coverage of rooms wall-to-wall, (2) is less susceptible to occlusion between users, (3) allows for the use of simpler recognition algorithms, and (4) intrudes less on users’ privacy.
The objective of this work is to recognize faces using video sequences both for training and novel input, in a realistic, unconstrained setup in which lighting, pose and user motion pattern have a wide variability and face images are of... more
The objective of this work is to recognize faces using video sequences both for training and novel input, in a realistic, unconstrained setup in which lighting, pose and user motion pattern have a wide variability and face images are of low resolution. There are three major areas of novelty: (i) illumination generalization is achieved by combining coarse histogram correction with fine illumination manifold-based normalization; (ii) pose robustness is achieved by decomposing each appearance manifold into semantic Gaussian pose clusters, comparing the corresponding clusters and fusing the results using an RBF network; (iii) a fully automatic recognition system based on the proposed method is described and extensively evaluated on 600 head motion video sequences with extreme illumination, pose and motion pattern variation. On this challenging data set our system consistently demonstrated a very high recognition rate (95% on average), significantly outperforming state-of-the-art methods from the literature.
Due to the widespread use of cameras, it is very common to collect thousands of personal photos. A proper organization is needed to make the collection usable and to enable an easy photo retrieval. In this paper, we present a method to... more
Due to the widespread use of cameras, it is very common to collect thousands of personal photos. A proper organization is needed to make the collection usable and to enable an easy photo retrieval. In this paper, we present a method to organize personal photo collections based on “who” is in the picture. Our method consists in detecting the faces in the photo sequence and arranging them in groups corresponding to the probable identities. This problem can be conveniently modeled as a multi-target visual tracking where a set of on-line trained classifiers is used to represent the identity models. In contrast to other works where clustering methods are used, our method relies on a probabilistic framework; it does not require any prior information about the number of different identities in the photo album. To enable future comparison, we present experimental results on a public dataset and on a photo collection generated from a public face dataset.
It is much easier to divide attention across the left and right visual hemifields than within the same visual hemifield. Here we investigate whether this benefit of dividing attention across separate visual fields is evident at early... more
It is much easier to divide attention across the left and right visual hemifields than within the same visual hemifield. Here we investigate whether this benefit of dividing attention across separate visual fields is evident at early cortical processing stages. We measured the steady-state visual evoked potential, an oscillatory response of the visual cortex elicited by flickering stimuli, of moving targets and distractors while human observers performed a tracking task. The amplitude of responses at the target frequencies was larger than that of the distractor frequencies when participants tracked two targets in separate hemifields, indicating that attention can modulate early visual processing when it is divided across hemifields. However, these attentional modulations disappeared when both targets were tracked within the same hemifield. These effects were not due to differences in task performance, because accuracy was matched across the tracking conditions by adjusting target sp...
BACKGROUND: Liver injuries induced by carbon tetrachloride are the best-characterized system of xenobiotic-induced hepatotoxicity and commonly used model for the screening of hepatoprotective activities of drugs. The present study... more
Motivation & Objective Problem formulation & Challenges Original contributions Related work System description Simulation & Experimental Results Conclusions
This paper addresses the task of time-separated aerial image registration. The ability to solve this problem accurately and reliably is important for a variety of subsequent image understanding applications. The principal challenge lies... more
This paper addresses the task of time-separated aerial image registration. The ability to solve this problem accurately and reliably is important for a variety of subsequent image understanding applications. The principal challenge lies in the extent and nature of transient appearance variation that a land area can undergo, such as that caused by the change in illumination conditions, seasonal variations, or the occlusion by non-persistent objects (people, cars). Our work introduces several major novelties: (i) unlike previous work on aerial image registration, we approach the problem using a set-based paradigm; (ii) we show how image space local, pair-wise constraints can be used to enforce a globally good registration using a constraints graph structure; (iii) we show how a simple holistic representation derived from raw aerial images can be used as a basic building block of the constraints graph in a manner which achieves both high registration accuracy and speed; (iv) lastly, we introduce a new and, to the best of our knowledge, the only data corpus suitable for the evaluation of set-based aerial image registration algorithms. Using this data set, we demonstrate: (i) that the proposed method outperforms the state-of-the-art for pair-wise registration already, achieving greater accuracy and reliability, while at the same time reducing the computational cost of the task; and (ii) that the increase in the number of available images in a set consistently reduces the average registration error, with a major difference already for a single additional image.
Long term tracking of an object, given only a single instance in an initial frame, remains an open problem. We propose a visual tracking algorithm, robust to many of the difficulties which often occur in real-world scenes.... more
Long term tracking of an object, given only a single instance in an initial frame, remains an open problem. We propose a visual tracking algorithm, robust to many of the difficulties which often occur in real-world scenes. Correspondences of edge-based features are used, to overcome the reliance on the texture of the tracked object and improve invariance to lighting. Furthermore we address long-term stability, enabling the tracker to recover from drift and to provide redetection following object disappearance or occlusion. The two-module principle is similar to the successful state-of-the-art long-term TLD tracker, however our approach extends to cases of low-textured objects.
This work presents a system to perform autonomous landing of a small size fixed-wing Unmanned Aerial Vehicle (UAV) on a Fast Patrol Boat (FPB). We propose a ground-based vision system with the camera, image capture and processing... more
This work presents a system to perform autonomous landing of a small size fixed-wing Unmanned Aerial Vehicle (UAV) on a Fast Patrol Boat (FPB). We propose a ground-based vision system with the camera, image capture and processing equipment installed in the ship, thus reducing the UAV size, weight and power requirements. The system observes the UAV and computes the control commands to send to the UAV via radio. This approach makes it also possible to use standard UAVs equipped with commercial autopilots. The developed system uses the captured image as input and a Particle Filter (PF) structure to estimate the UAV trajectory. It is also used an Unscented Kalman Filter (UKF) for the translational motion filtering and an Unscented Bingham Filter (UBiF) for the rotational motion filtering. This filtering structure is reminiscent of the Unscented Particle Filter (UPF). The obtained tracking error is compatible with automatic landing requirements.
Illumination invariance remains one of the most researched, yet the most challenging aspect of automatic face recognition. In this paper the discriminative power of colour-based invariants is investigated in the presence of large... more
Illumination invariance remains one of the most researched, yet the most challenging aspect of automatic face recognition. In this paper the discriminative power of colour-based invariants is investigated in the presence of large illumination changes between training and query data, when appearance changes due to cast shadows and non-Lambertian effects are significant. Specifically, there are three main contributions: (i) a general photometric model of the camera is described and it is shown how its parameters can be estimated from realistic video input of pseudo-random head motion, (ii) several novel colour-based face invariants are derived for different special instances of the camera model, and (iii) the performance of the largest number of colour-based representations in the literature is evaluated and analysed on a database of 700 video sequences. The reported results suggest that: (i) colour invariants do have a substantial discriminative power which may increase the robustness and accuracy of recognition from low resolution images in extreme illuminations, and (ii) that the non-linearities of the general photometric camera model have a significant effect on recognition performance. This highlights the limitations of previous work and emphasizes the need to assess face recognition performance using training and query data which had been captured by different acquisition equipment.
In this paper we present two adaptive algorithms for detection of small changes/targets (of the order of one pixel) in a pair of images in a low signal to clutter plus noise ratio (SCNR) environment (SCNR is of the order of -14.5 dB).... more
In this paper we present two adaptive algorithms for detection of small changes/targets (of the order of one pixel) in a pair of images in a low signal to clutter plus noise ratio (SCNR) environment (SCNR is of the order of -14.5 dB). They both have the ability to track the non-stationary image signals and suppress the clutter plus noise background. Both detectors are based on adaptive correlation canceling technique. The first one uses an order recursive least squares (ORLS) lattice filter, while the second is based on the two-dimensional least square (TDLMS) algorithm. The only a priori information required is that the background clutter plus noise in the pair of images is spatially correlated. An analytical expression for the improvement factor of the suggested change detectors is presented. Also, the influence of the order of the ORLS lattice filter and of the algorithm parameters of the TDLMS on their detection performances is studied. The performances of the two algorithms are evaluated by using an optical satellite image with computer generated target and noise added to it.
In contrast to most scientific disciplines, sports science research has been characterized by comparatively little effort investment in the development of relevant phenomenological models. Scarcer yet is the application of said models in... more
In contrast to most scientific disciplines, sports science research has been characterized by comparatively little effort investment in the development of relevant phenomenological models. Scarcer yet is the application of said models in the practical domain. The present paper presents a framework which allows resistance training practitioners (coaches or athletes themselves) to employ a recently proposed model of neuromuscular engagement and adaptation in actual training program design. The first novelty concerns the monitoring aspect of coaching. A method for extracting training performance characteristics from loosely constrained video sequences, effortlessly and with minimal human input, using computer vision is described. The extracted data is subsequently used to infer the value of the parameters of the underlying neuromuscular model. This is achieved by using known differential equations describing the motion of the resistance during a particular exercise, in what is usually termed the inverse dynamics problem. Lastly, a computer simulation of hypothetical training bouts, using athlete specific capability parameters, is used to predict the effected adaptation and with it the changes in performance. The software developed and for the first time described here allows the practitioner to manipulate hypothetical training parameters and immediately see their effect on predicted adaptation for a specific athlete. Thus, this work presents a holistic view of the monitoring-assessment-adjustment loop. By bridging the gap between the theoretical and applied aspects of sports science, the present contribution serves to strongly encourage a further research focus on the development of more general, comprehensive models in this field.
Figure 1. Creepy Tracker is an open-source toolkit that provides spatial information about people and interactive surfaces. To do this, it resorts to multiple depth sensing cameras (A, B). It helps the design of systems that handle, for... more
Figure 1. Creepy Tracker is an open-source toolkit that provides spatial information about people and interactive surfaces. To do this, it resorts to multiple depth sensing cameras (A, B). It helps the design of systems that handle, for instance, (C) interactive tabletops, (D) vertical surfaces, (E) floor projections and even capture avatars for (F) telepresence or (G) virtual reality. ABSTRACT Context-aware pervasive applications can improve user experiences by tracking people in their surroundings. Such systems use multiple sensors to gather information regarding people and devices. However, when developing novel user experiences , researchers are left to building foundation code to support multiple network-connected sensors, a major hurdle to rapidly developing and testing new ideas. We introduce Creepy Tracker, an open-source toolkit to ease prototyping with multiple commodity depth cameras. It automatically selects the best sensor to follow each person, handling occlusions and maximizing interaction space, while providing full-body tracking in scalable and extensible manners. It also keeps position and orientation of stationary interactive surfaces while offering continuously updated point-cloud user representations combining both depth and color data. Our performance evaluation shows that, although slightly less precise than marker-based optical systems, Creepy Tracker provides reliable multi-joint tracking without any wearable markers or special devices. Furthermore, implemented representative scenarios show that Creepy Tracker is well suited for deploying spatial and context-aware interactive experiences.
Conventional tracking solutions are not able to deal with abrupt motion as these are based on a smooth motion assumption or an accurate motion model. Abrupt motion is not subject to motion continuity and smoothness. We address this... more
Conventional tracking solutions are not able to deal with abrupt motion as these are based on a smooth motion assumption or an accurate motion model. Abrupt motion is not subject to motion continuity and smoothness. We address this problem by casting tracking as an optimisation problem and propose a novel abrupt motion tracker based on swarm intelligence – the SwATrack. Unlike existing swarm-based filtering methods, we first of all introduce an optimised swarm-based sampling strategy for a tradeoff between the exploration and exploitation of the state space in search for the optimal proposal distribu- tion. Secondly, we propose Dynamic Acceleration Parameters (DAP) that allow on the fly tuning of the best mean and variance of the distribution for sampling. Combining the two strategies within the Particle Swarm Optimisation framework represents a novel method to address abrupt motion. To the best of our knowledge, this has never been done before. Thirdly, we introduce a new dataset – the Malaya Abrupt Motion (MAMo) dataset that consists of 12 videos with groundtruth. Finally, experimental on both quantitative and qualitative results have shown the effectiveness of the proposed method in terms of dataset unbiased, object size invariant and fast recovery in tracking the abrupt motions.
Monitoring coral reef populations as part of environmental assessment is essential. Recently, many marine science researchers are employing low-cost and power efficient Autonomous Underwater Vehicles (AUV) to survey coral reefs. While the... more
Monitoring coral reef populations as part of environmental assessment is essential. Recently, many marine science researchers are employing low-cost and power efficient Autonomous Underwater Vehicles (AUV) to survey coral reefs. While the counting problem, in general, has rich literature, little work has focused on estimating the density of coral population using AUVs. This paper proposes a novel approach to identify, count, and estimate coral populations. A Convolutional Neural Network (CNN) is utilized to detect and identify the different corals, and a tracking mechanism provides a total count for each coral species per transect. Experimental results from an Aqua2 underwater robot and a stereo hand-held camera validated the proposed approach for different image qualities.
Significant effort has been devoted within the visual tracking community to rapid learning of object properties on the fly. However, state-of-the-art approaches still often fail in cases such as rapid out-of-plane rotation, when the... more
Significant effort has been devoted within the visual tracking community to rapid learning of object properties on the fly. However, state-of-the-art approaches still often fail in cases such as rapid out-of-plane rotation, when the appearance changes suddenly. One of the major contributions of this work is a radical rethinking of the traditional wisdom of modelling 3D motion as appearance change during tracking. Instead, 3D motion is modelled as 3D motion. This intuitive but previously unexplored approach provides new possibilities in visual tracking research. Firstly, 3D tracking is more general, as large out-of-plane motion is often fatal for 2D trackers, but helps 3D trackers to build better models. Secondly, the tracker's internal model of the object can be used in many different applications and it could even become the main motivation, with tracking supporting reconstruction rather than vice versa. This effectively bridges the gap between visual tracking and Structure from Motion. A new benchmark dataset of sequences with extreme out-of-plane rotation is presented and an online leader-board offered to stimulate new research in the relatively underdeveloped area of 3D tracking. The proposed method, provided as a baseline, is capable of successfully tracking these sequences, all of which pose a considerable challenge to 2D trackers (error reduced by 46%).
During the past years, camera-equipped Unmanned Aerial Vehicles (UAVs) have revolutionized aerial cinematography, allowing easy acquisition of impressive footage. In this context, autonomous functionalities based on machine learning and... more
During the past years, camera-equipped Unmanned Aerial Vehicles (UAVs) have revolutionized aerial cinematography, allowing easy acquisition of impressive footage. In this context, autonomous functionalities based on machine learning and computer vision modules are gaining ground. During live coverage of outdoor events, an autonomous UAV may visually track and follow a specific target of interest, under a specific desired shot type, mainly adjusted by choosing appropriate focal length and UAV/camera trajectory relative to the target. However, the selected UAV/camera trajectory and the object tracker requirements (which impose limits on the maximum allowable focal length) affect the range of feasible shot types, thus constraining cinematography planning. Therefore, this paper explores the interplay between cinematography and computer vision in the area of autonomous UAV filming. UAV target-tracking trajectories are formalized and geometrically modeled, so as to analytically compute maximum allowable focal length per scenario, to avoid 2D visual tracker failure. Based on this constraint, formulas for estimating the appropriate focal length to achieve the desired shot type in each situation are extracted, so as to determine shot feasibility. Such rules can be embedded into practical UAV intelligent shooting systems, in order to enhance their robustness by facilitating on-the-fly adjustment of the cinematography plan. ACCESSIBLE AT: https://www.researchgate.net/publication/335011312_Shot_Type_Constraints_in_UAV_Cinematography_For_Autonomous_Target_Tracking
The problem of recursively approximating motion resulting from the Optical Flow (OF) in video thru Total Least Squares (TLS) techniques is addressed. TLS method solves an inconsistent system Gu=z , with G and z in error due to... more
The problem of recursively approximating motion resulting from the Optical Flow (OF) in video thru Total Least Squares (TLS) techniques is addressed. TLS method solves an inconsistent system Gu=z , with G and z in error due to temporal/spatial derivatives, and nonlinearity, while the Ordinary Least Squares (OLS) model has noise only in z. Sources of difficulty involve the non-stationarity of the field, the ill-posedness, and the existence of noise in the data. Three ways of applying the TLS with different noise conjectures to the end problem are observed. First, the classical TLS (cTLS) is introduced, where the entries of the error matrices of each row of the augmented matrix [G;z] have zero mean and the same standard deviation. Next, the Generalized Total Least Squares (GTLS) is defined to provide a more stable solution, but it still has some problems. The Generalized Scaled TLS (GSTLS) has G and z tainted by different sources of additive zero-mean Gaussian noise and scaling [G;z] by nonsingular D and E, that is, D[G;z]E makes the errors iid with zero mean and a diagonal covariance matrix. The scaling is computed from some knowledge on the error distribution to improve the GTLS estimate. For moderate levels of additive noise, GSTLS outperforms the OLS, and the GTLS approaches. Although any TLS variant requires more computations than the OLS, it is still applicable with proper scaling of the data matrix.
The problem of object recognition is of immense practical importance and potential, and the last decade has witnessed a number of breakthroughs in the state of the art. Most of the past object recognition work focuses on textured objects... more
The problem of object recognition is of immense practical importance and potential, and the last decade has witnessed a number of breakthroughs in the state of the art. Most of the past object recognition work focuses on textured objects and local appearance descriptors extracted around salient points in an image. These methods fail in the matching of smooth, untextured objects for which salient point detection does not produce robust results. The recently proposed bag of boundaries (BoB) method is the first to directly address this problem. Since the texture of smooth objects is largely uninformative, BoB focuses on describing and matching objects based on their post-segmentation boundaries. Herein we address three major weaknesses of this work. The first of these is the uniform treatment of all boundary segments. Instead, we describe a method for detecting the locations and scales of salient boundary segments. Secondly, while the BoB method uses an image based elementary descriptor (HoGs + occupancy matrix), we propose a more compact descriptor based on the local profile of boundary normals' directions. Lastly, we conduct a far more systematic evaluation, both of the bag of boundaries method and the method proposed here. Using a large public database, we demonstrate that our method exhibits greater robustness while at the same time achieving a major computational saving -- object representation is extracted from an image in only 6% of the time needed to extract a bag of boundaries, and the storage requirement is similarly reduced to less than 8%.