Implements regime (iii) of Hoek and Elliott (2024). Minimises
KL(f || g_theta) where f is supplied as an evaluable log-density on
the target, via expectation-maximisation against importance-sampled
draws from a user-chosen proposal q.
Usage
fit_kld_em(
target,
N = 3L,
proposal = NULL,
is_size = 5000L,
init = NULL,
max_iter = 100L,
tol = 1e-05,
ridge_eps = 1e-06,
min_ess = 50,
on_low_ess = c("warn", "abort"),
seed = NULL,
validation_size = NULL,
validation_proposal = NULL,
validation_seed = NULL,
support_warn = TRUE,
adapt = c("none", "pmc"),
refresh_every = 5L,
defensive_gamma = 0.15,
inflate = 1.5,
anneal = FALSE,
temp_schedule = NULL,
canonicalise = TRUE
)Arguments
- target
A gmm_target with a non-NULL
log_density.- N
Number of mixture components.
- proposal
An is_proposal. When
NULL(the default) the proposal is chosen automatically: a support-matchedis_uniform()when the target declares a bounded or one-sidedsupport, otherwise a multivariate-t withdf = 5intarget@n_dimdimensions. The automatic choice is announced with a one-line message so it is never silent.- is_size
Number of importance-sampling draws used for fitting.
- init
A gmm initialisation, or
NULLto use a kmeans pass on the importance-resampled draws.- max_iter
Maximum number of EM iterations.
- tol
Convergence tolerance on the relative change in the importance-weighted EM objective
Q(theta) = sum_n W_n log g(x_n).Qis invariant to the target's normalising constant, so the stopping rule behaves identically for normalised and unnormalised targets (the importance-sampled KLD estimate carries an additive-log Z(f)offset and is therefore never used for stopping).- ridge_eps
Ridge added to each component covariance at every M-step.
- min_ess
Minimum effective sample size below which the fit is flagged as degenerate: a classed warning (
proxymix_low_ess) is issued (or, withon_low_ess = "abort", a classed errorproxymix_degenerate_fit), the fit'sconvergedflag is forced toFALSE, anddegenerate = TRUEis recorded in the diagnostics and the quality certificate.- on_low_ess
What to do when the effective sample size falls below
min_ess:"warn"(the default) flags and continues,"abort"refuses to return a degenerate fit.- seed
Optional integer seed. When supplied, the fit is reproducible end-to-end: the fitting IS draw, the initialisation resample and kmeans pass, and any empty-component reseed draws are all derived from it. When
NULL, those draws consume the ambient random-number stream.- validation_size
Number of independent importance-sampling draws to use for held-out validation. The default
NULLusesceiling(is_size / 4), so the overfit-vs-generalise diagnostic (validation_kldand the certificate'svalidation_gap) exists by default; set0Lto disable the validation split.- validation_proposal
Optional is_proposal for the validation sample. Defaults to the same proposal used for fitting.
- validation_seed
Optional integer seed used when drawing the validation sample. Defaults to
seed + 1Lwhenseedis supplied,NULLotherwise.- support_warn
Logical. If
TRUE(the default), issue a warning when more than 5% of IS draws receive non-finite weights (typically because the proposal does not dominate the target's support).- adapt
Proposal adaptation:
"none"(the default; one fixed IS draw, the historical behaviour) or"pmc"(population-Monte-Carlo refresh of the proposal from the current iterate; see Details).- refresh_every
With
adapt = "pmc", refresh the proposal after this many EM iterations on the current batch. Default5L.- defensive_gamma
With
adapt = "pmc", the mass kept on the original proposal as a heavy-tailed defensive anchor at every refresh (bounds the importance-weight variance). Default0.15.- inflate
With
adapt = "pmc", the factor inflating the current iterate's covariances inside the refreshed proposal. Default1.5.- anneal
Logical. If
TRUE, a deterministic-annealing warm-start (seegmm_anneal_path()) replaces the kmeans initialisation: components are annealed from a high temperature down to one on the importance-weighted draws, and the resulting parameters seed the (unchanged) cold KLD-EM loop. This attacks the local-optima sensitivity of cold EM. Defaults toFALSE.- temp_schedule
Optional numeric vector of descending temperatures for the annealing warm-start.
NULL(the default) uses a geometric schedule from10down to1in covariance-whitened units. Ignored whenanneal = FALSE.- canonicalise
Logical. If
TRUE(the default), the fitted mixture is post-processed bygmm_canonicalise().
Value
A gmm_fit with regime = "kld". The diagnostics list
contains, among others, kld_trace, kld_final,
kld_is_shifted, kld_final_absolute (when computable), ess,
ess_relative (ess / is_size), max_weight, support_fraction,
mc_se_kld, validation_kld, validation_ess, and
validation_max_weight.
Details
With adapt = "none" (the default) the Monte Carlo draws from q are
computed once at the start and the resulting self-normalised
importance-sampling weights are reused at every EM iteration. With
adapt = "pmc" the proposal is refreshed every refresh_every
iterations with a defensive mixture built from the current iterate –
the population-Monte-Carlo scheme: the fitted mixture (covariances
inflated by inflate) carries 1 - defensive_gamma of the proposal
mass and the original proposal q keeps defensive_gamma as a
heavy-tailed anchor, a fresh IS batch is drawn, and EM continues on the
refreshed weights. Because the refreshed proposal tracks the target,
the effective sample size recovers from a poor initial proposal and the
usable dimension range extends well beyond what a fixed proposal
reaches; the per-batch ESS trace is reported as
diagnostics$ess_history. While a batch is degenerate (its effective
sample size is below min_ess), the refresh fires every iteration
with an escalating covariance inflation floored at a growing fraction
of the batch's sample covariance, so a collapsed iterate walks back
out toward the target instead of freezing; and convergence is only
accepted on an adapted batch, so a run that stabilises on the original
proposal's draw is refreshed at least once before it is allowed to
stop. The scheme is the mixture population-Monte-Carlo idea of Cappé
et al. (2008) with the defensive-mixture safeguard of Owen and Zhou
(2000); it re-draws rather than recycles batches (compare the adaptive
multiple importance sampling of Cornuet et al., 2012).
Since v0.1.1 the function also draws an independent validation IS
sample when validation_size > 0 and reports its own KLD estimate,
effective sample size, and largest weight share. This lets users tell
the difference between in-sample EM overfit to one particular IS draw
and a fit that generalises across independent IS draws.
When the target's normalised property is FALSE or NA, the
importance-sampled kld_final and kld_trace measure
\(\widehat{KL}(f \Vert g) - \log Z(f)\) rather than the absolute
divergence. The fit's diagnostics list records this via
kld_is_shifted = TRUE and a kld_shift_explanation string. When the
target also supplies a finite log_normalizer, a corrected absolute
estimate is reported as kld_final_absolute.
References
Cappé, O., Douc, R., Guillin, A., Marin, J.-M. and Robert, C. P. (2008) Adaptive importance sampling in general mixture classes. Statistics and Computing 18, 447–459. doi:10.1007/s11222-008-9059-x
Cornuet, J.-M., Marin, J.-M., Mira, A. and Robert, C. P. (2012) Adaptive multiple importance sampling. Scandinavian Journal of Statistics 39, 798–812. doi:10.1111/j.1467-9469.2011.00756.x
Owen, A. and Zhou, Y. (2000) Safe and effective importance sampling. Journal of the American Statistical Association 95(449), 135–143. doi:10.1080/01621459.2000.10473909
See also
Other fitting:
fit_em_samples(),
fit_moment_match(),
from_kde(),
from_objective(),
select_N()
Examples
tgt <- banana_target()
q <- is_mvt(n_dim = 2L, mean = c(0, 0),
sigma = 4 * diag(2), df = 5)
fit <- fit_kld_em(tgt, N = 3L, proposal = q,
is_size = 1500L, max_iter = 25L, seed = 1L,
validation_size = 1500L)
fit@diagnostics$kld_final
#> [1] 0.006967304
fit@diagnostics$validation_kld
#> [1] 0.00926966