proxymix — an interactive primer

Gaussian-mixture proxies for densities you can evaluate but cannot sample

Max Moldovan, Adelaide University

2026-05-15

Reading depth

Show R code

What this primer is

The proxymix R package takes a density — a “shape of how likely things are” — and approximates it by adding a small number of bell-shaped curves together. We call the sum a Gaussian mixture. You will move two sliders and watch the bells stretch, shrink, and re-weight in real time, with the actual proxymix numbers behind every frame.

You do not need to know R to read this page. Every R snippet is hidden by default; toggle Show R code in the header if you are curious.

proxymix fits a parametric Gaussian-mixture proxy g(x; w, \mu, \Sigma) = \sum_{k=1}^K w_k \, \mathcal N(x; \mu_k, \Sigma_k) to a target density \pi(x) on \mathbb R^p by minimising \mathrm{KL}(g \,\|\, \pi), dispatching across three regimes from Hoek–Elliott (2026): closed-form moment matching (i), classical EM on samples (ii), and importance-sampling-driven KLD-EM (iii) for the case where \pi can be evaluated but not sampled. The two scenarios below exercise regime (iii) and the affine-Gaussian operator calculus (gmm_observe). All states are precomputed at fixed seed 1L against proxymix 0.3.0; the rendered numbers are the package’s actual output, not stylised diagrams.

Scenario A — the banana

Can a small mixture of Gaussians stand in for a target whose shape is curved and whose interface gives us only a formula, not samples?

The banana below is the target density. proxymix is given only one thing about it: a way to ask “how tall is the density at this point?”. It is not given samples drawn from the banana. The job is to glue together K bell-shaped components into a mixture that matches the banana’s shape as closely as possible.

Move the K slider from 1 to 8. At K = 1, one bell cannot match the curved banana — it can only make an elliptical compromise. By K = 4 the mixture has enough flexibility to track the curve. The “validation KL” number on the right is the mismatch; smaller is better.

The banana is banana_target() from proxymix: an evaluate-only \mathcal{C}^\infty density on \mathbb R^2 defined by a non-linear transform of an isotropic Gaussian. The fit runs fit_proxymix(target, N = K, regime = "kld", seed = 1L, validation_size = 3000L, validation_seed = 2L). The reported KL is the held-out validation estimate \widehat{\mathrm{KL}}_{\mathrm{val}}(g \,\|\, \pi) computed by self-normalised importance sampling against an independent draw; its Monte Carlo standard error and the effective sample size of the importance proposal are exposed below. Note the non-monotone KL at K = 8 — a small over-fit / optimum-landscape artefact, not a bug; reproducible by re-running the precompute pipeline.

library(proxymix)
target <- banana_target()
fit    <- fit_proxymix(target, N = 4, regime = "kld",
                       seed = 1L, validation_size = 3000L,
                       validation_seed = 2L)
print(fit)
ess_summary(fit)$validation_kld

Takeaway. One Gaussian cannot match a curved target. A small mixture (3 or 4 components) gets close. More is not always better — past a point the fit can wobble (look at K = 8).

Takeaway. Regime (iii) handles evaluate-only targets where classical EM is undefined (no samples) and rejection sampling is hopeless. The KL plateau around K = 4–6 suggests the effective intrinsic complexity of the banana is roughly four components; the mild K = 8 degradation is consistent with KLD-EM finding a local optimum the importance proposal does not adequately cover. ESS stays high (≈ 3 600) across K, indicating the heavy-tailed proposal is not the bottleneck.

Scenario B — conditioning in closed form

Take the four-component fit from above as a prior. Observe a noisy reading of one coordinate. What does the posterior look like?

We now use the K = 4 mixture from Scenario A as a prior belief about a 2-D quantity x = (x_1, x_2). Then someone tells us: “x_1 is approximately y, give or take noise of size σ.” The posterior — the updated belief after hearing the message — is itself a four-component Gaussian mixture, with each bell updated by a formula. No simulation, no MCMC.

Slide y to move the observation across x_1. Switch σ to make the measurement more or less noisy. Watch the centre panel.

Affine-Gaussian update. With prior g(x) = \sum_k w_k \mathcal N(\mu_k, \Sigma_k), observation matrix A = (1, 0), observation y \in \mathbb R, and noise covariance \sigma^2, the posterior is

g(x \mid y) \;=\; \sum_{k=1}^K w_k^\mathrm{post}\, \mathcal N\!\big(x;\; \mu_k^\mathrm{post},\; \Sigma_k^\mathrm{post}\big),

with per-component Kalman gain K_k = \Sigma_k A^\top (A \Sigma_k A^\top + \sigma^2)^{-1}, posterior covariance \Sigma_k^\mathrm{post} = \Sigma_k - K_k A \Sigma_k, posterior mean \mu_k^\mathrm{post} = \mu_k + K_k (y - A \mu_k), and re-mixed weights w_k^\mathrm{post} \propto w_k \, \mathcal N(y; A\mu_k, A\Sigma_k A^\top + \sigma^2). The log marginal evidence \log \pi(y) is reported below. The catastrophic-evidence warning fires when every component sees the observation at numerical density zero; it is not triggered anywhere in the precomputed grid.

prior g(x) — fixed

posterior g(x | y) — redraws on slider move

log marginal evidence: -- (how "likely" the observation is given the prior; bigger = less surprising)

Watch how the posterior tightens when σ is small (a precise measurement) and how it stays diffuse when σ is large (a vague measurement).

The same algebra underlies the Kalman filter; the difference is that the prior carries multiple modes and the posterior is allowed to re-weight them by their per-component marginal evidence rather than collapse to one.

prior <- fit_proxymix(banana_target(), N = 4, regime = "kld",
                      seed = 1L, validation_size = 3000L,
                      validation_seed = 2L)
A     <- matrix(c(1, 0), nrow = 1)
post  <- gmm_observe(prior, A = A, y = 0.8,
                     noise_cov = matrix(0.1^2, 1, 1))
post@metadata$log_marginal_evidence

Where to go next

The proxymix package on GitHub: https://github.com/max578/proxymix — installation, vignettes, examples.
The methodology paper: Hoek & van der Hoek (2026), in Stochastic Analysis and Applications. Open access via DOI 10.1080/07362994.2024.2372605.
A no-statistics reading level for this primer is planned for v0.2.

proxymix::vignette("operator_calculus") — full algebra of gmm_affine, gmm_observe, gmm_aggregate, gmm_missing.
proxymix::vignette("from_kde") — Contract C with kernel density estimators (Scenario C in this primer, planned for v0.2).
proxymix::vignette("three_regimes") — moment matching, sample EM, and KLD-EM compared on the same target.
The forthcoming R Journal manuscript (under preparation) covers the operator calculus and reproducibility of the validation numbers shown here.

Glossary

Term	Guided	Technical
Gaussian mixture	A sum of bells (Gaussians) that can approximate almost any 2-D density.	g(x) = \sum_{k=1}^K w_k \mathcal N(x; \mu_k, \Sigma_k) with w_k > 0, \sum w_k = 1.
Evaluate-only target	A density whose formula we can compute at any point, but for which sampling is not given.	\pi(x) exposed as a `log_density(x)` callable without a `samples` attribute; targets regime (iii).
KL divergence	A non-negative mismatch score; zero means identical.	\mathrm{KL}(g \,\\|\, \pi) = \int g \log(g/\pi); non-negative, asymmetric.
Importance sampling	Estimating an average under one distribution using draws from a different, easier one.	\mathbb E_g[h] \approx \frac{1}{n}\sum_i \frac{g(x_i)}{q(x_i)} h(x_i) with x_i \sim q; variance controlled by ESS.
Effective sample size	Quality measure for an importance-sampling estimate; higher is better, capped at the raw count.	\mathrm{ESS} = (\sum_i w_i)^2 / \sum_i w_i^2; Owen’s bounds.
Kalman update	The exact formula for updating a Gaussian belief after a noisy linear observation.	Per-component conjugate update with gain K = \Sigma A^\top (A\Sigma A^\top + R)^{-1}.

About this primer

The two interactive widgets are driven by precomputed states that were generated once with the proxymix package. The page itself is a single HTML file with no Internet dependencies — it works on a shared drive, in an email attachment, or on a USB stick.

The precompute pipeline lives at scripts/precompute_states.R. It runs against proxymix 0.3.0, R 4.5.2, seed 1L, validation seed 2L, validation size 3 000. Both state files (states/banana.json, states/observe.json) carry an inputs_sha256 header; CI re-runs the script and gates on JSON structural equality. The inlined <script type="application/json"> blocks in the rendered HTML carry the same payload, so the page is authoritative for offline reading. Source repo: https://github.com/max578/proxymix-tutorial (planned).