Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
OBJECT TRACKING BY KALMAN FILTER WITH MOTION DETECTION Xuan Tan∗ Instructor: Professor Yao Wang 2014.5.9 Abstract This report reviews an Objects Tracking strategy that can track objects robustly with a fixed environment. There are two primary factors in this method. Foreground segmentation is a fundamental first processing for our tracking system. We first review different background subtraction methods, then introduce the method proposed by P. Kaew Tra Kul Pong [1] and Chris Stauffer [2] which is an application of Gaussian Mixture Model(GMM) in details. The other cornerstone to fulfill tracking task is building robust tracking system (Keeping tracking objects without foreground information), implemented by the Kalman Filter proposed by Kalman [3]. Finally we apply the combination of these two methods into real-time application. The implementation of the approach has been proven to be efficient and robust. Keywords: Tracking, GMM, Kalman Filter 1 Overview and Schedule 1.1 Overview Because of the success of video sensors and high performance video processing hardware, there is emerging the exciting possibilities of tackling massive video-based understanding problem. One of the most popular theme is object tracking, especially we consider it in state of art way. It is essential to build a robust and efficient system under real-time environment, and this idea is applied to both military and civilian area, such as ballistic missile defense, air defense, traffic control, ocean surveillance, and battlefield surveillance. The core technology of tracking object involves object detection and associating measurements with the appropriate tracks. The appropriate measurement becomes more crucial, as the prediction step becoming major factor we can consider. In this report we mainly solve these two issues: object detection and robust tracking system. We apply motion-based method, which is an extension of Gaussian Mixture Model, to fulfill detection task. Secondly Kalman Filter will be the cornerstone of this tracking system. We reviewed them separately in the following sections. 1.2 Schedule 1. Background Reading and Research: We did literature search first. After achieving sufficient relevant knowledge by reading massive articles, paper and reference by using Google Scholar searching and IEEE database, we decide Gaussian Mixture Model and Kalman Filter to be fundamental theories to support tracking task. Responsible member: Xuan Tan(From 04/06 to 04/23) 2. System Design for Single Object moving tracking: After theoretical research, we designed a object detection and tracking system, and implemented it using Python with OpenCV library. Responsible member: Xuan Tan(04/23 to 04/30) 3. System Testing and Debug: Tested the system and fixed bugs. Responsible member: Xuan Tan(From 05/01 to 05/07) 4. System Improvement Improved the system to make it robust and user-friendly. Responsible member: Xuan Tan(From 05/08 to 05/10) ∗ xuantan.chuck@nyu.edu 1 5. Final Report Writing and Preparation of Final Presentation Responsible member: Xuan Tan(From 5/11 to 5/14) 2 Tracking System Framework As shown in Figure 1, we divide our task into 2 major parts: Foreground Segmentation/Detection and Foreground Objects Tracking. Foreground Detection: This subtask aims to segment all pixels which don’t belong to the background to achieve reasonable foreground regions for each frame. Foreground Object Tracking: After the Foreground Segmentation, a tracking system is used to maintain objects consistency between frames. Segmentation process doesn’t use prior information of tracked objects, thus it is important to build system to make output robust and efficient. The details are brought in section 3 and section 4. Figure 1: Tracking System(Taken from [4]) 3 Foreground Segmentation We discuss motion-based objection detection methods in this section. Then we bring the method, Gaussian Mixture Method in details. 3.1 Introduction Foreground detection/Segmentation is the foremost step of vision systems which monitor real-world activity being of great interest in many applications. Fore example conference-video meeting, once the foreground and the background are separated, the background can replaced by other images or videos which beautified the video and protect users’ privacy. And in our task which is a video surveillance task, foreground segmentation allows a correct object identification and tracking. In past several years, massive techniques are developed for Foreground Segmentation based on state of art. The earliest and easiest technique is Temporal Median Filter, proposed by Lo and Velastin [5]. As shown in Figure 2, this method take last N frames as training samples to obtain initial background reference. Each pixel p(i, j) in the image has N -dimension vector which is denoted as B(i, j). For each new frame coming in, the system analyzes every pixel p(i, j) based on last median value of last N frames and a setting limit value. If the pixel p(i, j) value is within the limit, set p(i, j) as background and update background vector B(i, j); otherwise set the pixel as foreground, and no update will be done. This method is easy to understand and easy to implement. However the segmentation result is not reliable when including illumination change, such as weather and light source changing. Another important technique is Running Gaussian Average, proposed by Wren et al [6]. This method models every background pixel p(i, j) as a variable subjects to a Gaussian distribution with mean µ and variance σ. Segmentation process is similar with Temporal Median Filter: after obtain initial background reference in certain training time, for every pixel p(i, j) in a new frame, if the probability of belonging background is within limit, set p(i, j) as background and update current pixel Gaussian Distribution. Otherwise label p(i, j) as foreground and no update is done. The improvement comparing with the previous method is introducing probabilistic and information theory. Thus the system can be analyzed quantitatively 2 Figure 2: Example of Temporal Median Filter background model(The figure is taken from (4)) by using information standard such as Entropy. However it is not appropriate to model a variable by a single Gaussian Distribution. The experiment result is not robust, especially when the context of the video is complex, for example the ripple of water is dynamic and unstable. Oliver et al [7] proposed a new method borrowing from the idea of Face-Recognition which is sub-space analysis technique based on PCA(Principle Componet Analysis), so called Eigenbackgrounds. It has two steps, one being learning step, the other being classification. In learning step, the system compute covariance matrix based on the difference between each pair frame and each frame mean. Achieving M eigenvectors corresponding to first M largest eigenvalue to realize dimension reduction. The classification based on M eigenvectors is reasonable, because the eigenspace is a good model for static regions of the image, but not for moving objects. However this method is higher computationally costly than the methods shown before, because the matrices operations needed to obtain image eigenvectors and eigenvalues(the size of covariance matrix is M × N by M × N , the image is M by N ). Comparing with the techniques mentioned above, the method proposed by Chris Stauffer [2] is an efficient and reliable method that are easy to implement, which is called Gaussian Mixtuer Model. After analyzing those methods, we finally decide this method to be our fundamental technique for Foreground Segmentation task. The details are reviewed in the next section. 3.2 Gaussian Mixture Model Rather than explicitly modeling the value of all the pixels as one particular type of distribution, the method proposed by P. Kaew Tra Kul Pong [1] simply model the values of a particular pixel as a mixture of Gaussian. In this case the recent of every pixel x(i, j) is modeled by a Mixture Gaussians. The method considers the values of a particular pixel over time as a ”pixel process”. The ”pixel process” is a time series of pixel values, e.g. scalars for gray values. At any time, t, what is known about a particular pixel, (x0, y0), is its history: (x1 , ...xt ) = I(x0 , y0 , i) : 1 ≤ i ≤ t (1) ,where I is the image sequence. The recent history of each pixel,(x1 , ...xt ) is modeled by a mixture of K Gaussian distributions. Thus the probability of observing the current pixel value is: f (xt ) = K X ωi,t · η(xt , µi,t , σi,t ) (2) i=1 ,where K is the number of distributions, ωi,t is the weight parameter of ith Gaussian in the mixture at time t, µi,t is the mean value of ith Gaussian in the mixture at time t, σi,t is the variance to the corresponding Gaussian distribution, and η is a Gaussian probability density function: η(xt , µi,t , σi,t ) = 1 n 1 e (2π) 2 |σi,t | 2 − 21 (xt −µt )2 σi,t (3) ,where σi,t is the covariance corresponding to the ith pixel at time t. We used K = 3 in our implementation. Weighting parameter ωi,t is normalized by the following equation: ωi,t = ωi,t PK i=1 ωi,t (4) i . And the background B is modeled by the first b Gaussian Then K distributions are ordered based on the fitness value σωi,t corresponding to first highest b (could be 1, 2 or 3 and is different from pixel to pixel at each time) fitness value: 3 B = arg min( Pb j=1 ωj >T ) (5) b ,where T is the decision threshold (0.6 in our experiment). B represents the minimum number of Gaussian distributions to include in the summation, in order to verify the inequality. In other words, B is the minimum prior probability that the background is in the scene. The inequality above just gives understanding of background probabilistic model. However we classify the pixel(either to be background or forground) by deciding whether the input pixel value matches any Gaussian distribution (Matching strategy), and we use the following expression: |xt − µi,t > φσi,t | (6) ,where xt is the pixel value at time t, φ is the constant parameter threshold (2.5 in our experiment), µi,t is the ith Gaussian mean and σi,t is its standard deviation. It has 2 matching cases: Case 1: If the pixel value satisfies one or more Gaussians, then label it as background and mark the most probable Gaussian model as matched. Update matched Gaussian model by equations (7)-(11). The rest of Gaussians are marked as unmatched, then decrease their weights only by using equation (7). Case 2: If no Gaussian distributions matches the pixel value, the least probable component is replaced by a NEW Gaussian distribution and set: µi,t = xt σi,t = 1 ωi,t = 0.1. The online updating method for corresponding Gaussian of the pixels marked as background is described as following: p(ωi |xt+1 ) = ωit+1 = (1 − α)ωit + αp(ωi |xt+1 ) (7) µt+1 = (1 − α)µti + ρxt+1 i (8) σi,t+1 = (1 − α)σi,t + ρ(xt+1 − µt )2 (9) ρ = αη(xt+1 , µti , σi,t ) (10) ( 1 if ωi is the best matched Gaussian componet 0 Otherwise (11) ,where α is the learning rate (0.02 in our experiment). Finially we can ahcieve the foreground mask Mt , Figure 3. 3.3 Post-processing The GMM method described above allows us to identify foreground pixels in each new frame while updating the model. As shown in Figure 3, there a lots of motion candidates and massive false positive caused by lighting changing or undesired motion. The labeled foreground pixels are segmented by performing Morphological operations and Blob Analysis to eliminate massive false positive. The steps are described belowing: 1. Apply Morphological operation closing to the foreground mask Mt first, then opening to eliminate noise. 2. Connected Compnonet Analysis: make the regions connected reasonably. Then achieve the list of regions and corresponding center positions, sizes. 3. Find the largest region, which must contain more than 400 pixels. Return the center position and size. Morphological operation and Connected Compnonet Analysis are done by bulit-in functions in opencv. This process is efficient in determining the whole moving object, as moving object detection is not only due to its motion characteristics, but also depends on its size, shape information. 4 Kalman Filter We discuss tracking system in this section based on Kalman Filter 4 Figure 3: Foreground detection by GMM 4.1 Introduction Kalman Filter is proposed in 1960 by Kalman [3], and have been applied widely and successfully in the past 30 years. Recently it has been applied in a wide variety of computer graphics application, such as building tracking system, extracting lip motion from video sequences of speaker, and fitting spline surfaces over collections of points. The Kalman Filter is the best optimal estimator for a large class of problems. With a few basic conceptual tools, Kalman Filter is very easy to use without advanced mathematical knowledge. A Kalman Filter is an optimal estimator which is a state-space method. It is also a recursive process. If all noise is Gaussian, the Kalman Filter minimizes the mean square error of the estimated parameters. The denoise result is not optimal when noise is not Gaussian as Kalman Filter is a linear estimator, then introducing non-linear estimator will be better. The reason of using word ”Filter” is that the process aims to find the best estimation from noisy data which is similar to the function of a filter. Details are discussed in the next section. 4.2 Kalman Filter Derivation In this section, we introduce Kalman Filter process directly to our application in an easy way. The key point to apply Kalman Filter to our application is to build the system to realize the equation: P (Xt |Xt−1 ) = P (Xt )P (Xt−1 |Xt ) P (Xt ) (12) ,where Xt and Xt−1 is the state of the object, (Xt = [Lt , vt ]T ,where Lt is the center position of tracking object, vt is its velocity). P (Xt |Xt−1 ) is the estimation of current state based on state at t − 1 that allows us to predict P (Xt ), then we can use P (Xt ) to predict state at t + 1. It is a recursive processing. Recalling basic knowledge of Linear System, the linear stochastic difference equation is described as following: Xt = AXt−1 + But + ǫx (13) ,where Xt is current state, an n by 1 vector, Xt−1 is previous state. u is the a driving function (artificial manipulation), which is zero in our experiment. ǫx is process noise modeled by a single Gaussian distribution that need to be eliminated. The n by n matrix A in the difference equation relates the state at the previous time step t − 1 to the state at the current step t, in the absence of either a driving function u or process noise ǫx . The matrix B relates to the optional control input 2 u, which is [ dt2 dt] in this application, dt = 1/30 sec. Then the Observing state is: Zt = HXt + ǫz (14) ,where Zt is the current observing state, which is the center position of the tracking object in our experiment, Lt . ǫz is measurement noise which is also modeled by a single Gaussian distribution. For Zt = Lt , the H is [1 0]T Noise is modeled as following: p(ǫx )˜N (0, Q) 5 (15) p(ǫz )˜N (0, R) (16) In practice, the process noise covariance and measurement noise covariance matrices might change with each step, however here we assumed they are constant. We defined estimate errors as: et = Xt − X̂t (17) Then the estimate error covariance is: Pk = E[et eTt ] (18) Them the current state X̂t is corrected as: − X̂t = X̂t− + Kt (Zt − H X̂t ) (19) ,where the difference(zt −H X̂t ) is called the residual. The residual reflects the discrepancy between the predicted measurement H X̂t and the actual measurement zt . A residual of zero means that the two are in complete agreement. K is the famous Kalman Gain that aims to minimizes the error covariance equation. The optimal solution to chose K is given by [9]: Kt = Pt H T HPt H T + R (20) To understand Kalman gain, Kt is the crucial point to understand whole Kalman Filter system. Recall (14) and (16), the noise can be described as ǫz = Zt − HXt . R is the noise assumption variance of ǫz . When the measurement error covariance R approaches zero, the gain K weights the residual more heavily, which is: lim Kt = H −1 Rt →0 (21) What is more interesting, when the estimate error covriance Pt approaches 0, the gain K weights the residual less heavily, which is lim Kt = 0 (22) Pt →0 To examine the two limitation of K above we find, as the measurement error covariance R approaches zero, the actual measurement Zt is trusted more and more, while the predicted measurement HXt is trusted less and less. On the other hand, as the estimate error covariance Pt approaches zero the actual measurement zt is trusted less and less, while the predicted measurement HXt is trusted more and more. The prove and more details are described in [3]. All in all, the goal of Kalman Filter is to make the estimated output more trustful and robust. The recursive updating system can be described in Figure 4. The whole system is divided in two parts: Time Updating Step, Table 1, and Measurement Updating Step, Table 2. In the next section we apply this method into our tracking system. All the parameters setting and related matrix are discussed in the next section. Figure 4: The ongoing discrete Kalman filter cycle)) 6 X̂t− = AXt−1 + But P̂t = APt−1 AT Table 1: Time Updating Kt = P̂t H T (H P̂t H T +R−1 ) X̂t = X̂t− + Kt (Zt − H X̂t− ) P̂t := (I − Kt H)P̂t Table 2: Measurement Updating 4.3 Kalman Filter implementation In this section, we derive a particular Kalman Filter process for our experiment.        L 1 dt Lt−1 L + vt−1 dt X̂t = t = AXt−1 + ǫx = + ǫx = t−1 + ǫx vt 0 1 vt−1 vt−1 ,where L and v are the position and velocity of the tracked object. ǫx is the absence Gaussian noise (e.g., the tracked object is missing or hit by something), and its covariance matrix is assumed as Q.     Lt Zt = [Lt ] = HXt + ǫz = 1 0 + ǫz vt ,where ǫz is the measurement noise. And its covariance matrix is R. Then the Kalman Filter Algorithm is: X̂t− = AXt−1 P̂t = APt−1 AT T Kt = (HPtPHt TH+R−1 ) X̂t = X̂t− + Kt (Zt − H X̂t− ) P̂t := (I − KtH)P̂ t Lt ,where X̂t = vt   1 dt . Pt is the error covanrice matrix, which is 0 1 initialized as: P0 = (X1 − X0 )(X1 − X0 )T , and it is updated by: Pt = (Xt − X̂t )(Xt − X̂t )T . vt is initialized as 0, then it is updated by: vt = (Lt − Lt−1 ) × 30, 30 frames per second. H = [1 0]. R is the measurement covariance matrix, σp2 σ p σv dt4 /4 dt3 /2 R= , where dt = 1/30. = 2 dt3 /2 dt2 σp σv σv We bring all the techniques mentioned above to building tracking system in the next section. X̂t− is the prediction result. X̂t is the correction result. A = 7 5 Tracking System Algorithm The overall algorithm description is in Algorithm 4, the details for each sub-algorithm is described in Algorithm 1-3. Algorithm 1 Video pre-processing and System initialization Task: Pre-process the video for next Foreground Segmentation processing Input Source Reading • Frames Set is read by WebCam software or directly from video, F rames = x1 , x2 , ...xt . The frame size is assumed as M by N. And xt = [xt (0, 0), ..., xt (M, N )] • Video Denoise Stabilization are needed when necessary. • The background light source should be nearly stable and illumination should be in the appropriate range(neither too bright, nor too dark). Model and System Initailization Gaussian Mixture Model Initialization • Use first 30 frames to obtain the initial GMM model. • For every pixel xt (i, j)(every pixel in frame is independently modeled as a GMM) do: PK PK • Compute the initialized Gaussian Mixture Model, f(i, j) = l=1 ωl,t Gl (i, j) = l=1 ωl,t · η(xt (i, j), µl,t , σl,t ), for pixel x(i, j) by using openCV library PyMix module (a module which solves GMM problems). The number of Gaussian, K, is chosen to be 3. • EndFor Tracking System Initialization • Initialize the Kalman Filter system parameters: center position Lt = (0, 0), velocity v = [0, 0], Kalman Gain Kt = 0. Algorithm 2 Foreground Segmentation Processing Task: Segment foreground moving object out from background Require Input: Nice, clean video! Foreground extraction • For every pixel xt (i, j) in the new coming frame do: • Measure if new input pixel matches any current Gaussian Models: • Case 1: the pixel xt (i, j) matches one or more Gaussian distributions based on Matching Strategy • • • Mark the current pixel xt (i, j) as background, pick the most probable Gaussian as matched Gaussian, others are marked as unmatched. Then update all Gaussian distribution based on updating equations. Case 2: no Gaussian is marked matched: Mark the current pixel xt (i, j) as foreground. Reset the least probable Gaussian in the mixture: Mean = Xt; Variance with a high value(1); Weight as a low value; Re-normalize all weights. • End For • Return binary mask Mt , black(0) represents background, white(1) represents foreground. Connected Component Analysis Object Segmentation • Perform Morphological operations and Blob Analysis to M for eliminating massive false positive. • Perform Connected Component Analysis to segment the foreground object (The final region we choose must be the largest foreground region; It must contain more than 400 pixels.) • Return size of object St , position Lt (center of the region) and velocity vt of the foreground moving object, vt = (Lt − Lt−1 ) × 30. 8 Algorithm 3 Foreground Objects Tracking Task: Robustly track the object based on its recent motion history. Require Input: Position Lt−1 (center of the region) and velocity vt−1 of the foreground moving object. Foreground Object Tracking Time Update • X̂t− = AXt−1 • P̂t = APt−1 AT Measurement Update • Kt = P̂t H T (H P̂t H T +R−1 ) • X̂t = X̂t− + Kt (Zt − H X̂t− ) • Pt := (I − Kt H)Pt .   1 dt where A = ; Pt is the error covanrice matrix, which is initialized as: P0 = (X1 − X0 )(X1 − X0 )T , and it is updated 0 1 T T by: Pt = (Xt − X̂t )(Xt − matrix,  X̂2t ) ; H isthe observing  which is [1 0] ; R is the measurment error covariance matrix, 4 3 σp σp σv dt /4 dt /2 which is assumed to be: , where dt = 1/30. = dt3 /2 dt2 σp σv σv2 Return: L̂t = Ẑt = H X̂t Algorithm 4 Overall Tracking Algorithm Motion-Based Object Tracking by Kalman Filter 1.Video pre-processing and System initialization 2.Track and display: • while a new input frame Inew arrive do • Foreground Segmentation Processing to generate tracked object position Lt−1 (the center of the region) and velocity vt−1 history • Foreground Object Tracking based on previous state information history, (Lt−1 and velocity vt−1 history are which generated in Foreground Segmentation processing step), achieve robust prediction result which is the position of the tracked object, L̂t . • Mark the position L̂t and display output frames. 9 6 Experiments and Results We applied the techniques(GMM for Foreground Segmentation and Kalman Filter for object tracking) both to recorded video and real-time video by using WebCam. Experiment 1: Ball Trajectory Tracking As shown in Figure 4(a), the trajectory of the green ball is hidden by a box at some location. Figture (b) is showing the object detection through the method we implemented which is the background model by GMM. When the ball entered the box, we can no longer detected the ball based on its motion information. Figture (c) shows the final tracking result by the combination of GMM and Kalman filter. Red crosses is the detection by GMM, and blue points are the predictions by Kalman filter. It still allows us to robustly track the ball even without its current motion information. Experiment 2: Real-time Objects Tracking As shown in Figure 5, we built the real-time tracking successfully with a nice GUI. The left window displays original video by WebCam; the central window displays the finial result by Kalman filter, the right window displays the object detection result by GMM, the red square in the right window indicates the size of the moving object. 10 (a) Trajectory of the moving object (b) Motion-Based Object Detection by using GMM (c) Object Tracking by using Kalman Filter perdiction Figure 5: Object Tracking on recorded video Figure 6: Real-Time Tracking 11 7 Conclusion and Future Work In this project, we have reviewed the basic knowledge of Gaussian Mixture Model and Kalman Filter. Then we implemented the combination of these two methods to real-time tracking application based on the algorithm description in Reference [1], [2], [3]. The experiment results showed these two methods can result in good performance even by using their basic form only. We plan to study the extension form of them to achieve better tracking results. Besides, we will try using more Gaussians to model background in foreground segmentation step, and using more motion information(such as acceleration) in the tracking step. What’s more , we plan to introduce global camera motion to our tracking system that may allow us to track even camera is moving. Acknowledgements My most sincere gratitude to Professor Yao Wang for directing this project with attention, dedication and crucial suggestions in all moment. Thank you. Bibliography [1] P. KaewTraKulPong and R. Bowden, ”An Improved Adaptive Background Mixture Model for Realtime Tracking with Shadow Detection”, http://info.ee.surrey.ac.uk/CVSSP/Publications/papers/KaewTraKulPong-AVBS01.pdf [2] Chris Stauffer W.E.L Grimson, ”Adaptive background mixture models for real-time tracking”,Computer Vision and Pattern Recognition, 1999. IEEE Computer Society Conference on. (Volume:2 ) [3] R. E. KALMAN,”A New Approach to Linear Filtering and Prediction Problems”, http://www.cs.unc.edu/~welch/ kalman/media/pdf/Kalman1960.pdf [4] MERIT,”Reviews of Tracking based on Foreground and Background Modeling Techniques” http://upcommons. upc.edu/pfc/bitstream/2099.1/7006/1/Proyecto%20MERIT%20Master%20Jaume%20Gallego%20Final%20Version% 2013Febrero09%202.pdf [5] B.P.L. Lo* and S.A. Velastin,”Automatic Congestion Detection System for Underground Platforms”,Intelligent Multimedia, Video and Speech Processing, 2001. Proceedings of 2001 International Symposium on [6] Wren, et, ”Pfinder: Real-Time Tracking of the Human Body”, Automatic Face and Gesture Recognition, 1996., Proceedings of the Second International Conference on [7] Nuria M. Oliver, Barbara Rosario, and Alex P. Pentland, Senior Member, IEEE, ”A Bayesian Computer Vision System for Modeling Human Interactions”, Pattern Analysis and Machine Intelligence, IEEE Transactions on (Volume:22 , Issue: 8) [8] Thou-Ho (Chao-Ho) Chen, Yu-Feng Lin, and Tsong-Yi Chen, ”Intelligent Vehicle Counting Method Based on Blob Analysis in Traffic Surveillance”, Innovative Computing, Information and Control, 2007. ICICIC ’07. Second International Conference on [9] Jacobs, O. L. R. (1993), ”Introduction to Control Theory (Second ed.)”, Oxford University 13