Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Cascade of phase transitions for multiscale clustering

Tony Bonnaire, Aurélien Decelle, and Nabila Aghanim
Phys. Rev. E 103, 012105 – Published 6 January 2021

Abstract

We present a framework exploiting the cascade of phase transitions occurring during a simulated annealing of the expectation-maximization algorithm to cluster datasets with multiscale structures. Using the weighted local covariance, we can extract, a posteriori and without any prior knowledge, information on the number of clusters at different scales together with their size. We also study the linear stability of the iterative scheme to derive the threshold at which the first transition occurs and show how to approximate the next ones. Finally, we combine simulated annealing together with recent developments of regularized Gaussian mixture models to learn a principal graph from spatially structured datasets that can also exhibit many scales.

  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Received 13 August 2020
  • Revised 15 October 2020
  • Accepted 2 December 2020

DOI:https://doi.org/10.1103/PhysRevE.103.012105

©2021 American Physical Society

Physics Subject Headings (PhySH)

Statistical Physics & Thermodynamics

Authors & Affiliations

Tony Bonnaire1,2, Aurélien Decelle2,3, and Nabila Aghanim1

  • 1Université Paris-Saclay, CNRS, Institut d'Astrophysique Spatiale, 91405 Orsay, France
  • 2Université Paris-Saclay, TAU Team INRIA Saclay, CNRS, Laboratoire de Recherche en Informatique, 91190 Gif-sur-Yvette, France
  • 3Departamento de Física Téorica I, Universidad Complutense, 28040 Madrid, Spain

Article Text (Subscription Required)

Click to Expand

References (Subscription Required)

Click to Expand
Issue

Vol. 103, Iss. 1 — January 2021

Reuse & Permissions
Access Options
Author publication services for translation and copyediting assistance advertisement

Authorization Required


×

Images

  • Figure 1
    Figure 1

    (a) Displacement of K=25 centers during the annealing procedure for a dataset with five spherical Gaussian clusters. Colors indicate in which final cluster the center ends. (b) Evolution of the ratio Γk/σ2 as a function of σ2. Black stars correspond to the scales of successive transitions, the black vertical line to Tchard, and colored ones indicate the size of the clusters as defined by the maximum eigenvalue of the empirical covariance. The black dashed curve shows the evolution of Q as defined in Eq. (4) that we identify as an order parameter. This quantity is not represented for σ1 since the number of physical clusters Kr begins to be higher than the number of generated clusters q.

    Reuse & Permissions
  • Figure 2
    Figure 2

    Ratio between the estimated variances σ̂2 obtained when freezing the K=25 centers when Γk/σ21 and the empirical ones σtrue2 from data of Fig. 1 for several values of ρ, the proportion of datapoints left for the computation.

    Reuse & Permissions
  • Figure 3
    Figure 3

    (a) Displacement of K=25 centers during the annealing procedure for a dataset made of ten 5D spherical Gaussian clusters visualized in the plane of the two first principal components. Colors depend on the macrocluster the component stands in at the last iteration. (b) Evolution of the ratio Γk/σ2 as a function of σ2, the hard annealing parameter. Colored vertical lines indicate the actual size of the corresponding Gaussian cluster or macrocluster (gray dashed lines, first from the right in each panel) as defined by the maximum eigenvalue of the empirical covariance.

    Reuse & Permissions
  • Figure 4
    Figure 4

    (a) Arrows indicate the displacement of K=25 centers during the soft annealing procedure for a dataset made of six spherical Gaussian clusters (black points). Colors relate to the cluster in which the component ends. Red crosses and gray dashed circles respectively indicate the positions and variances fixed a posteriori when the center undergoes its last split before remaining above the Γk/σk2=1 line. (b) Evolution of the ratio Γk/σk2 as a function of σ2. The vertical black line corresponds to Tcsoft. The inset figure shows the evolution of the ratios maxQ/Qth and σ̂2/σtrue2, when varying the contrast between the two nested clusters.

    Reuse & Permissions
  • Figure 5
    Figure 5

    (a) Left: Displacement of K=100 components during the hard annealing of a tree branches dataset with different sampling standard deviations. Black dashed line corresponds to the first principal direction. Colors refer to branches in which centers end. Right: Learned structure when stopping the annealing for components reaching the temperature σ2γk. Red lines are edges of the graph and gray shaded areas are 1σk circles. (b) Evolution of the ratio γk/σ2 as a function of σ2. Vertical lines indicate the used variance for the generation of branches. The black vertical line corresponds to the value of Tcgraph.

    Reuse & Permissions
×

Sign up to receive regular email alerts from Physical Review E

Log In

Cancel
×

Search


Article Lookup

Paste a citation or DOI

Enter a citation
×