Abstract
Point processes constitute a natural extension of Markov random fields (MRF), designed to handle parametric objects. They have shown efficiency and competitiveness for tackling object extraction problems in vision. Simulating these stochastic models is however a difficult task. The performances of the existing samplers are limited in terms of computation time and convergence stability, especially on large scenes. We propose a new sampling procedure based on a Monte Carlo formalism. Our algorithm exploits the Markovian property of point processes to perform the sampling in parallel. This procedure is embedded into a data-driven mechanism so that the points are distributed in the scene in function of spatial information extracted from the input data. The performances of the sampler are analyzed through a set of experiments on various object detection problems from large scenes, including comparisons to the existing algorithms. The sampler is also tested as optimization algorithm for MRF-based labeling problems.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
GCO C++ library (http://vision.csd.uwo.ca/code/).
References
Baddeley, A. J., & Lieshout, M. V. (1993). Stochastic geometry models in high-level vision. Journal of Applied Statistics, 20(5–6), 231–256.
Benchmark, (2013). Datasets, results and evaluation tools. http://www-sop.inria.fr/members/Florent.Lafarge/benchmark/evaluation.html.
Besag, J. E. (1986). On the statistical analysis of dirty pictures. Journal of the Royal Statistical Society, 48(3), 259–302.
Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. Pattern Analysis and Machine Intelligence, 23(11), 1222–1239.
Byrd, J., Jarvis, S., & Bhalerao, A. (2010). On the parallelisation of mcmc-based image processing. IEEE International Symposium on Parallel and Distributed Processing. Atlanta, US.
Chai, D., Forstner, W., & Lafarge, F. (2013). Recovering line-networks in images by junction-point processes. Computer Vision and Pattern Recognition, Portland.
Chai, D., Forstner, W., & Yang, M. Y. (2012). Combine Markov random fields and marked point processes to extract building from remotely sensed images. International Society for Photogrammetry and Remote Sensing Congress. Melbourne, Australia.
Descombes, X. (2011). Stochastic geometry for image analysis. Oxford: Wiley.
Descombes, X., Minlos, R., & Zhizhina, E. (2009). Object extraction using a stochastic birth-and-death dynamics in continuum. Journal of Mathematical Imaging and Vision, 33(3), 347–359.
Earl, D., & Deem, M. (2005). Parallel tempering: Theory, applications, and new perspectives. Physical Chemistry Chemical Physics, 23(7), 3910–3916.
Ge, W., & Collins, R. (2009). Marked point processes for crowd counting. Computer Vision and Pattern Recognition. Miami.
Gonzalez, J., Low, Y., Gretton, A., & Guestrin, C. (2011). Parallel Gibbs sampling: From colored fields to thin junction trees. Journal of Machine Learning Research, 15, 324–332.
Green, P. (1995). Reversible jump Markov chains Monte Carlo computation and Bayesian model determination. Biometrika, 82(4), 711–732.
Grenander, U., & Miller, M. (1994). Representations of knowledge in complex systems. Journal of the Royal Statistical Society, 56(4), 549–603.
Han, F., Tu, Z. W., & Zhu, S. (2004). Range image segmentation by an effective jump-diffusion method. Pattern Analysis and Machine Intelligence, 26(9), 1138–1153.
Harkness, M., & Green, P. (2000). Parallel chains, delayed rejection and reversible jump mcmc for object recognition. British Machine Vision Conference. Bristol, United Kingdom.
Hastings, W. (1970). Monte Carlo sampling using Markov chains and their applications. Biometrika, 57(1), 97–109.
Lacoste, C., Descombe, X., & Zerubia, J. (2005). Point processes for unsupervised line network extraction in remote sensing. Pattern Analysis and Machine Intelligence, 27(10), 1568–1579.
Lafarge, F., Gimel’farb, G., & Descombes, X. (2010). Geometric feature extraction by a multi-marked point process. Pattern Analysis and Machine Intelligence, 32(9), 1597–1609.
Lafarge, F., & Mallet, C. (2012). Creating large-scale city models from 3d-point clouds: A robust approach with hybrid representation. International Journal of Computer Vision, 99(1), 69–85.
Lehmussola, A., Ruusuvuori, P., Selinummi, J., Huttunen, H., & Yli-Harja, O. (2007). Computational framework for simulating fluorescence microscope images with cell populations. IEEE Transactions on Medical Imaging, 26(7), 1010–1016.
Lempitsky, V., & Zisserman, A. (2010). Learning to count objects in images. Conference on Neural Information Processing Systems. Vancouver, Canada.
Li, S. (2001). Markov random field modeling in image analysis. Berlin: Springer.
Lieshout, M. V. (2008). Depth map calculation for a variable number of moving objects using markov sequential object processes. Pattern Analysis and Machine Intelligence, 30(7), 1308–1312.
Liu, J. (2001). Monte Carlo strategies in scientific computing. New York: Springer.
Mallet, C., Lafarge, F., Roux, M., Soergel, U., Bretar, F., & Heipke, C. (2010). A marked point process for modeling lidar waveforms. IEEE Transactions on Image Processing, 19(12), 3204–3221.
Nguyen, H.-G., Fablet, R., & Bouchet, J. (2010). Spatial statistics of visual keypoints for texture recognition. European Conference on Computer Vision. Heraklion, Greece.
Ortner, M., Descombes, X., & Zerubia, J. (2008). A marked point process of rectangles and segments for automatic analysis of digital elevation models. Pattern Analysis and Machine Intelligence, 30(1), 105–119.
Rochery, M., Jermyn, I., & Zerubia, J. (2006). Higher order active contours. International Journal of Computer Vision, 69(3), 335–351.
Salamon, P., Sibani, P., & Frost, R. (2002). Facts, Conjectures, and Improvements for Simulated Annealing. Philadelphia: SIAM Monographs on Mathematical Modeling and Computation.
Srivastava, A., Grenander, U., Jensen, G., & Miller, M. (2002). Jump-Diffusion Markov processes on orthogonal groups for object pose estimation. Journal of Statistical Planning and Inference, 103(1–2), 15–27.
Stoica, R. S., Martinez, V., & Saar, E. (2007). A three dimensional object point process for detection of cosmic filaments. Journal of the Royal Statistical Society, 56(4), 459.
Sun, K., Sang, N., & Zhang, T. (2007). Marked point process for vasculartree extraction on angiogram. Energy Minimization Methods in Computer Vision and Pattern Recognition. Ezhou, China.
Szeliski, R., Zabih, R., Scharstein, D., Veksler, O., Kolmogorov, V., Agarwala, A., et al. (2008). Comparative study of energy minimization methods for markov random fields with smoothness-based priors. Pattern Analysis and Machine Intelligence, 30(6), 1068.
Tu, Z., & Zhu, S. (2002). Image segmentation by data-driven Markov chain Monte Carlo. Pattern Analysis and Machine Intelligence, 24(5), 657–673.
Utasi, A., & Benedek, C. (2011). A 3-D marked point process model for multi-view people detection. Conference on Computer Vision and Pattern Recognition. Colorado Springs, US.
Verdie, Y., & Lafarge, F. (2012). Efficient Monte Carlo sampler for detecting parametric objects in large scenes. European Conference on Computer Vision. Firenze, Italy.
Weiss, Y., & Freeman, W. (2001). On the optimality of solutions of the max-product belief propagation algorithm in arbitrary graphs. IEEE Transactions on Information Theory, 47(2), 736–744.
Zhu, S., Guo, C., Wang, Y., & Xu, Z. (2005). What are textons? International Journal of Computer Vision, 62(1–2), 121–143.
Acknowledgments
This work was partially funded by the European Research Council (ERC Starting Grant “Robust Geometry Processing”, Grant agreement 257474). The authors thank A. Lehmussola, V. Lempitsky, H. Bischof, R. Ehrich, the French Mapping Agency (IGN), the Tour du Valat, and the BRGM for providing the datasets, as well as the reviewers for their valuable comments.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: Population counting model
Let \(x\) denote a configuration of ellipses for which the center of mass of an ellipse is contained in the compact set \(K\) supporting the input image (see Fig. 19). The energy follows the form specified by Eq. 5. The unitary data term \(D(x_i)\) and the potential \(V(x_i,x_j)\) are given by:
where
-
\(d(x_i)\) represents the Bhattacharyya distance between the radiometry inside and outside the object \(x_i\):
$$\begin{aligned} d(x_i) = \frac{(m_{in}-m_{out})^2}{4(\sigma _{in}^2+\sigma _{out}^2)} - \frac{1}{2} ln\left( \frac{2\sigma _{in}\sigma _{out}}{\sigma _{in}^2+\sigma _{out}^2}\right) \end{aligned}$$(22)where \(m_{in}\) and \(\sigma _{in}\) (respectively \(m_{out}\) and \(\sigma _{out}\)) are the intensity mean and standard deviation in \(S_{in}\) (respectively in \(S_{out}\)).
-
\(d_0\) is a coefficient fixing the sensitivity of the object fitting. The higher the value of \(d_0\), the more selective the object fitting. In particular, \(d_0\) has to be high when the input images are corrupted by a significant amount of noise.
-
\(A(x_i)\) is the area of object \(x_i\).
-
\(\beta \) is a coefficient weighting the non-overlapping constraint with respect to the data term.
Note that a basic mathematical dilatation is used in practice to roughly extract the class of interest from the image of birds for creating a space-partitioning tree.
Appendix 2: Line-network extraction model
A line-segment is defined by five parameters, including the 2D point corresponding to the center of mass of the object (Fig. 19). Similarly to the population counting model detailed in Appendix 1, the fitting quality with respect to the data is based on the Bhattacharyya distance: the unitary data term \(D(x_i)\) of the energy is given by Eq. 20. The potential \(V(x_i,x_j)\) penalizes strong object overlaps (see Eq. 21), but also takes into account a connection interaction in order to favor the linking of the line-segments. The potential term is thus given by:
where
-
\(\beta _1\) and \(\beta _2\) are two coefficients weighing respectively the non-overlapping and connection constraints with respect to the data term.
-
\(\sim _{nc}\) is the non-connection relationship between two objects. \(x_i \sim _{nc} x_j\) if the anchor areas of \(x_i\) and \(x_j\) (see Fig. 19) do not overlap.
-
\( \mathbf 1 _{condition}\) is the indicative function returning one when condition is valid, and zero otherwise.
-
\(f(x_i,x_j)\) is a symmetric function weighting the penalization of two non-connected objects \(x_i\) and \(x_j\) with respect to their average fitting quality. The function \(f\) is introduced to slightly relax the connection constraint when the two objects are of very good quality.
As for the bird counting problem, a basic mathematical dilatation has been used to roughly extract the class of interest from the aerial image shown on Fig. 14. Indeed the pixels corresponding to the class road in this image are relatively bright compared to the background. The segmented result is obviously not optimal, but sufficient to create an efficient space-partitioning tree.
Appendix 3: Tree recognition model formulation
Let \(x\) represent a configuration of 3D-models of trees from a template library described in Fig. 20. The center of mass \(p\) of a tree is contained in the compact set \(K\) supporting the 3D bounding box of the input point cloud (Fig. 19). We denote by \(\partial x_i\) the surface of the object \(x_i\), and by \(\mathcal C x_i\) the cylindrical volume having a vertical axis passing through the center of mass of \(x_i\), in which the input points are considered to measure the quality of \(x_i\). The unitary data term \(D(x_i)\) and the pairwise potential \(V(x_i,x_j)\) are given by:
where
-
\(|\mathcal C x_i|\) is a coefficient normalizing the unitary data term with respect to the number of input points contained in \(\mathcal C x_i\).
-
\(d(p_c, \partial x_i)\) is a distance measuring the coherence of the point \(p_c\) with respect to the object surface \(\partial x_i\). \(d\) is not the traditional orthogonal distance from point to surface because, as real trees do not describe ellipsoidal/conoidal shapes, input points are not homogeneously distributed on the object surface. Here, \(d\) is defined as the combination of the planimetric distance, i.e. the projection in the plane of equation \(z=0\) of the Euclidean distance, and the altimetric variation such that points outside the object are more penalized than inside points. Note that \(d\) is invariant by rotation around the Z-axis.
-
\(\gamma (.) \in [-1,1]\) is a quality function which is strictly increasing.
-
\(V_{overlap}\) is the pairwise potential penalizing strong overlapping between two objects, and given by:
$$\begin{aligned} V_{overlap}(x_i,x_j) = \frac{A(x_i \cap x_j)}{\min (A(x_i),A(x_j))} \end{aligned}$$(26)where \(A(x_i)\) is the area of the object \(x_i\) projected onto the plane of equation \(z=0\).
-
\(V_{competition}\) is the pairwise potential favoring a similar tree type \(t\) in a local neighborhood:
$$\begin{aligned} V_{competition}(x_i,x_j) = \mathbf 1 _{t_i \ne t_j} \end{aligned}$$(27)where \(\mathbf 1 _{.}\) is the indicative function.
-
\(\beta _1\) and \(\beta _2\) are two coefficients weighting respectively the non-overlapping constraint and the competition term with respect to the data term.
In order to roughly extract the class of interest from the point clouds, the scatter descriptor proposed by Lafarge and Mallet (2012) is used to identify the points which potentially correspond to trees.
Rights and permissions
About this article
Cite this article
Verdié, Y., Lafarge, F. Detecting parametric objects in large scenes by Monte Carlo sampling. Int J Comput Vis 106, 57–75 (2014). https://doi.org/10.1007/s11263-013-0641-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-013-0641-0