Phase-transition component discovery by deterministic annealing
Source:R/anneal.R
gmm_anneal_path.RdTracks the number of distinct mixture centroids as a function of temperature
under mass-constrained deterministic annealing (Rose, Gurewitz and Fox 1990),
a physics-derived alternative to information-criterion model selection. The
system starts at a high temperature where all k_max centroids collapse to
the data centroid (a single effective component) and is cooled along a
geometric schedule; at each critical temperature a centroid bifurcates, so the
number of distinct centroids grows in steps. The temperatures at which it
grows are the phase transitions, and the count occupying the widest
temperature range is the discovered component number.
Usage
gmm_anneal_path(
x,
k_max = 8L,
sigma = NULL,
t_high = NULL,
t_low = NULL,
n_steps = 80L,
n_inner = 30L,
w = NULL,
perturb = 0.02,
merge_tol = 0.1,
ridge_eps = 1e-06,
seed = 1L
)Arguments
- x
A numeric
nbypmatrix of samples, or a gmm_target carrying asamplesmatrix. For regime (iii) targets, pass an importance-resampled draw.- k_max
Maximum number of centroids tracked (the discovered count is at most
k_max).- sigma
Reference scale: the shared covariance is
sigma^2 * I. WhenNULL(the default)sigmais1, so the first critical temperature is the largest eigenvalue of the data covariance.- t_high, t_low
Top and bottom of the temperature schedule. When
NULLthey default to3 * t_critical_analyticand0.05 * t_critical_analytic, bracketing the bifurcation cascade.- n_steps
Number of temperatures on the geometric schedule.
- n_inner
Fixed-point iterations run at each temperature.
- w
Optional length-
nvector of non-negative observation weights (e.g. importance weights). Defaults to uniform.- perturb
Symmetry-breaking perturbation, as a fraction of the data scale, applied to the centroids at each temperature.
- merge_tol
Two centroids count as distinct when their distance exceeds
merge_toltimes the data scale.- ridge_eps
Ridge added to the reference covariance for stability.
- seed
Optional integer seed for the perturbations (the result is deterministic given a seed).
Value
A list with elements path (a data frame of temperature,
n_effective and free_energy), critical_temperatures (the temperatures
at which the count increased), first_critical_temperature (the first such,
or NA if none was detected), t_critical_analytic
(\(\lambda_{\max}(\Sigma^{-1} C)\)), k_selected (the widest-plateau
component count), lambda_max and sigma.
Details
The first bifurcation has a closed-form critical temperature
\(T_c = \lambda_{\max}(\Sigma^{-1} C)\), where \(C\) is the (weighted)
data covariance and \(\Sigma = \sigma^2 I\) the shared reference covariance.
This value is returned as t_critical_analytic and serves as an independent
analytic check on the empirically detected first transition. Subsequent
transitions have no comparably simple closed form, and the count is a
diagnostic rather than a guarantee.
Annealing fixes the component covariance to the reference \(\Sigma\) so the
temperature is the only scale; this is the clean isotropic regime in which the
critical temperature is exact. For robust fitting under free covariances,
use anneal = TRUE on fit_em_samples() or fit_kld_em() instead.
References
Rose, K., Gurewitz, E. and Fox, G. C. (1990) Statistical mechanics and phase transitions in clustering. Physical Review Letters 65(8), 945–948. doi:10.1103/PhysRevLett.65.945