Keywords

1 Introduction

Video object tracking is an important topic within the field of computer vision. It has a wide range of applications such as human-vehicle navigation, computer interaction, etc. Various approaches for object tracking have been proposed. Reference [1] proposed a tracking method based on mean shift. It maximizes the similarity iteratively by comparing the color histogram of the object. The advantage is the elimination of a brute force search and low computation. Reference [2] extended to 3D domain, combines color and spatial information to solve the problems of orientation changing and small scale changing. Reference [3] used stochastic meta-descent optimization method. It can track fast moving objects with significant scale change in a low-frame-rate video.

Template matching is a common and direct tracking method. It finds the position of object by minimize the error with a predefined object template. [4] used previous frame to adapt the object template, which solves the problems of appearance changing during the movement. Reference [5] proposed a template updating algorithm that avoids the drifting inherent in the naive algorithm.

Particle filter is based on Monte Carlo theorem. It estimates state by posterior probability, commonly used in pattern recognition and object tracking, such as [6]. Improved from [7], this paper proposed an adaptive particle filters tracking algorithm scheme with exquisite resampling (AERPF). We will introduce it in the following section.

2 Adaptive Exquisitie Resampling Particle Filter

In this section, we will illustrate the proposed algorithm in detail. Figure 1 is the flow chart of AERPF with the basic three stages: prediction, importance sampling and resampling.

Fig. 1.
figure 1figure 1

The flow chart of AERPF algorithm

2.1 Particle Filter

Extended from Kalman filter, particle filter can be applied in both linear and non-linear problems. Suppose we have a system described by the equations:

$$ x_{t} = f_{t} \left( {x_{t - 1} ,w_{t} } \right) $$
(1)
$$ y_{t} = h_{t} \left( {x_{t} ,v_{t} } \right) $$
(2)

The location of object being tracked is a state vector Ă— which cannot be observed directly. We use a dynamic model of the state to predict how it evolves over time. Vector y represent the feature observed from location Ă—. We can use the observation to correct the estimate of the state.

2.2 Prediction

The first stage of particle filter is prediction stage. When object disappears, instead of randomly spreading particles, we radially spread particles from where object disappeared because of the assumption that the object will not move faraway immediately. If the object is temporarily occluded, the way we spread particles research the target more efficiently than searching globally. While in long-term occlusion, we have already spread particles globally and this can avoid missing the object.

Then, use the motion vector obtained from optical flow to adjust the diffusion range. A high standard deviation of the motion vector indicates the object moves drastically, hence we need to enlarge the diffusion range as Fig. 2(a). A low standard deviation indicates moving consistency, so the diffusion range could be shrunk, as Fig. 2(b).

Fig. 2.
figure 2figure 2

(a) Diffusion ranges for high standard deviation of the motion vector; (b) Diffusion ranges for low standard deviation of the motion vector.

In addition to diffusion range, we can also predict the moving direction by motion vector. It is reasonable that the object moves toward the same direction according to the last few seconds, as a result, we spread the particles toward the same direction if moving direction has consistency.

2.3 Importance Sampling

The Second stage of particle filter is importance sampling. At this stage, the objective is to give each particle a weight, preserve the more important and eliminate the less important particles according to their weights.

Color histogram of the target model is used as feature to determine the weights. The histograms are calculated in the RGB space using 8 × 8 × 8 bins. We established a target model \( q = \left\{ {q^{(u)} } \right\} \) with \( \sum_{u = 1}^{N} q^{(u)} \; = \;1 \) and \( p(x)\; = \;\{ p^{(u)} (x)\} \) with \( \sum_{u = 1}^{N} \,p^{(u)} \; = \;1 \) for the candidate. Because boundary pixels might belong to the background or get occluded easily, the center of the object is considered more important than the boundary. Hence the kernel function is used to assign small weights to the pixels further away from the region center:

$$ K_{E} {\kern 1pt} \left( x \right){\kern 1pt} = \,\left\{ \begin{aligned} c(1\;\; - \;\;\left\| x \right\|^{2} )\,\,\,\left\| x \right\|{\kern 1pt} \; \le \;1 \hfill \\ 0\quad \quad \quad \quad otherwise \hfill \\ \end{aligned} \right\} $$
(3)

where x is the distance of the pixel locations from the center.

After obtaining the original weights by calculating their Bhattacharyya coefficients, we take two steps to refine them. Optical flow [8] is the apparent motion of brightness patterns in the image. Ideally, it would be the same as the motion field. Calculating the average of motion vector (4) obtained from optical flow, we can predict a new center from the last center. Promoting the weights of particles around the center which optical predicts is the first step.

$$ \left\{ \begin{aligned} {{u^{n} \, = \,\bar{u}^{n} \, - \,I_{x} \left( {I_{x} \bar{u}^{n} + I_{y} \bar{v}^{n} + I_{t} } \right)} \mathord{\left/ {\vphantom {{u^{n} \, = \,\bar{u}^{n} \, - \,I_{x} \left( {I_{x} \bar{u}^{n} + I_{y} \bar{v}^{n} + I_{t} } \right)} {\left( {\alpha^{2} + I_{x} \,^{2} + I_{y} \,^{2} } \right)}}} \right. \kern-0pt} {\left( {\alpha^{2} + I_{x} \,^{2} + I_{y} \,^{2} } \right)}} \hfill \\ {{v^{n} \, = \,\bar{v}^{n} \, - \,I_{y} \left( {I_{x} \bar{u}^{n} + I_{y} \bar{v}^{n} + I_{t} } \right)} \mathord{\left/ {\vphantom {{v^{n} \, = \,\bar{v}^{n} \, - \,I_{y} \left( {I_{x} \bar{u}^{n} + I_{y} \bar{v}^{n} + I_{t} } \right)} {\left( {\alpha^{2} + I_{x} \,^{2} + I_{y} \,^{2} } \right)}}} \right. \kern-0pt} {\left( {\alpha^{2} + I_{x} \,^{2} + I_{y} \,^{2} } \right)}} \hfill \\ \end{aligned} \right. $$
(4)

The second step is to set a threshold. Low-weight particles decrease accuracy, to avoid it, we hope to eliminate those less important. A suitable measure of degeneracy of the algorithm is the effective sample size Neff introduced in [6]. Using (5) to obtain the the effective samples, choose the lowest weight in those samples to set the threshold. The weight which is lower than the threshold is set to be zero (6).

$$ N_{eff} = \frac{1}{{\sum\nolimits_{i = 1}^{N} {\left( {w{}_{t}^{i} } \right)^{2} } }} $$
(5)
$$ \widehat{{w{}_{t}^{i} }}{\kern 1pt} = \,\left\{ \begin{aligned} 0,\;\;\;\;if\;w_{t}^{i} \; < \;T \hfill \\ w_{t}^{i} \;,\;if\;w_{t}^{i} \; \ge \;T \hfill \\ \end{aligned} \right. $$
(6)

When all the weights are small and set to zeros, means all the particles in the whole frame are not similar to the target, in other words, there exists no object.

2.4 Resampling

The goal in this stage is to eliminate particles with small weights, concentrate on particles with large weights for the prediction stage for next time. But the original resampling algorithm causes some defects. We use Fig. 3 to illustrate the condition.

Fig. 3.
figure 3figure 3

A demonstration of resampling algorithm

N is the total number of particles. \( \left\{ {C_{i} } \right\}^{N} \;\;_{i = 1} \) represents the cumulative sum of weights. \( \{ U_{i} \}^{N} \;\;_{i = 1} \) is a sequence of random variable which is uniformly distributed in the interval [0,1].We view Ui as a threshold, the CDF crossing over it is considered the more important one. As shown in Fig. 1, Because C2, C4 and C5 cross the threshold U2, U4 and U5, they are reserved for the next stage of prediction. But, we can clearly see that actually the weight of particle 2 is smaller than particle 1, and the weight of particle 4 is smaller than particle 3.This defect will lead to decreasing of estimation accuracy.

We adopt exquisite resampling [9] to overcome this defect. First, implement the same procedure mentioned above until particle 2 pierces the threshold. Then go back to check all the particles in this interval, and preserve the one with highest weight. So C1, C3 will be preserved instead. By this the true state of the pdf can be more accurately reflected.

To prove the improvement, we use the following nonlinear model as an example. The dynamic state space models are given by the following equations:

$$ x_{k + 1} \; = \;\alpha x_{k} + \;\beta \,\frac{{x_{k} }}{{1 + x_{k} \;^{2} }} + \gamma \,\cos \left( {1.2{\text{k}}} \right)\; + \;v_{k} $$
(7)
$$ y_{k} \; = \;\frac{{x_{k} \;^{2} }}{20} + \,w_{k} ,\;k\; = \;\;1,2,\; \ldots ,\;tf $$
(8)

where vk and wk are nonzero mean Gaussian random variables, x0 = 1, α = 0.5, β = 25, γ = 8, sample numbers = 100, Time step = 50 s.

3 Experimental Results

Table 1 shows the comparison of AERPF with original PF. Experiment 1 ~ 3 are sequence by fixed camera, and experiment 4 ~ 6 are by active camera. From the tracking results, we can see the promotion of the accuracy and F-score. At the same time, AERPF decrease the mean error, which means it can enhance the recognition of target object.

Table 1. Comparison of AERPF with original PF