Gaussian-mixture proxies for densities you can evaluate but cannot sample
2026-05-15
The proxymix R package takes a density — a “shape of how likely things are” — and approximates it by adding a small number of bell-shaped curves together. We call the sum a Gaussian mixture. You will move two sliders and watch the bells stretch, shrink, and re-weight in real time, with the actual proxymix numbers behind every frame.
You do not need to know R to read this page. Every R snippet is hidden by default; toggle Show R code in the header if you are curious.
proxymix fits a parametric Gaussian-mixture proxy g(x; w, \mu, \Sigma) = \sum_{k=1}^K w_k \, \mathcal N(x; \mu_k, \Sigma_k) to a target density \pi(x) on \mathbb R^p by minimising \mathrm{KL}(g \,\|\, \pi), dispatching across three regimes from Hoek–Elliott (2026): closed-form moment matching (i), classical EM on samples (ii), and importance-sampling-driven KLD-EM (iii) for the case where \pi can be evaluated but not sampled. The two scenarios below exercise regime (iii) and the affine-Gaussian operator calculus (gmm_observe). All states are precomputed at fixed seed 1L against proxymix 0.3.0; the rendered numbers are the package’s actual output, not stylised diagrams.
Can a small mixture of Gaussians stand in for a target whose shape is curved and whose interface gives us only a formula, not samples?
The banana below is the target density. proxymix is given only one thing about it: a way to ask “how tall is the density at this point?”. It is not given samples drawn from the banana. The job is to glue together K bell-shaped components into a mixture that matches the banana’s shape as closely as possible.
Move the K slider from 1 to 8. At K = 1, one bell cannot match the curved banana — it can only make an elliptical compromise. By K = 4 the mixture has enough flexibility to track the curve. The “validation KL” number on the right is the mismatch; smaller is better.
The banana is banana_target() from proxymix: an evaluate-only \mathcal{C}^\infty density on \mathbb R^2 defined by a non-linear transform of an isotropic Gaussian. The fit runs fit_proxymix(target, N = K, regime = "kld", seed = 1L, validation_size = 3000L, validation_seed = 2L). The reported KL is the held-out validation estimate \widehat{\mathrm{KL}}_{\mathrm{val}}(g \,\|\, \pi) computed by self-normalised importance sampling against an independent draw; its Monte Carlo standard error and the effective sample size of the importance proposal are exposed below. Note the non-monotone KL at K = 8 — a small over-fit / optimum-landscape artefact, not a bug; reproducible by re-running the precompute pipeline.
library(proxymix)
target <- banana_target()
fit <- fit_proxymix(target, N = 4, regime = "kld",
seed = 1L, validation_size = 3000L,
validation_seed = 2L)
print(fit)
ess_summary(fit)$validation_kldTakeaway. One Gaussian cannot match a curved target. A small mixture (3 or 4 components) gets close. More is not always better — past a point the fit can wobble (look at K = 8).
Takeaway. Regime (iii) handles evaluate-only targets where classical EM is undefined (no samples) and rejection sampling is hopeless. The KL plateau around K = 4–6 suggests the effective intrinsic complexity of the banana is roughly four components; the mild K = 8 degradation is consistent with KLD-EM finding a local optimum the importance proposal does not adequately cover. ESS stays high (≈ 3 600) across K, indicating the heavy-tailed proposal is not the bottleneck.
Take the four-component fit from above as a prior. Observe a noisy reading of one coordinate. What does the posterior look like?
We now use the K = 4 mixture from Scenario A as a prior belief about a 2-D quantity x = (x_1, x_2). Then someone tells us: “x_1 is approximately y, give or take noise of size σ.” The posterior — the updated belief after hearing the message — is itself a four-component Gaussian mixture, with each bell updated by a formula. No simulation, no MCMC.
Slide y to move the observation across x_1. Switch σ to make the measurement more or less noisy. Watch the centre panel.
Affine-Gaussian update. With prior g(x) = \sum_k w_k \mathcal N(\mu_k, \Sigma_k), observation matrix A = (1, 0), observation y \in \mathbb R, and noise covariance \sigma^2, the posterior is
g(x \mid y) \;=\; \sum_{k=1}^K w_k^\mathrm{post}\, \mathcal N\!\big(x;\; \mu_k^\mathrm{post},\; \Sigma_k^\mathrm{post}\big),
with per-component Kalman gain K_k = \Sigma_k A^\top (A \Sigma_k A^\top + \sigma^2)^{-1}, posterior covariance \Sigma_k^\mathrm{post} = \Sigma_k - K_k A \Sigma_k, posterior mean \mu_k^\mathrm{post} = \mu_k + K_k (y - A \mu_k), and re-mixed weights w_k^\mathrm{post} \propto w_k \, \mathcal N(y; A\mu_k, A\Sigma_k A^\top + \sigma^2). The log marginal evidence \log \pi(y) is reported below. The catastrophic-evidence warning fires when every component sees the observation at numerical density zero; it is not triggered anywhere in the precomputed grid.
prior <- fit_proxymix(banana_target(), N = 4, regime = "kld",
seed = 1L, validation_size = 3000L,
validation_seed = 2L)
A <- matrix(c(1, 0), nrow = 1)
post <- gmm_observe(prior, A = A, y = 0.8,
noise_cov = matrix(0.1^2, 1, 1))
post@metadata$log_marginal_evidenceproxymix package on GitHub: https://github.com/max578/proxymix — installation, vignettes, examples.proxymix::vignette("operator_calculus") — full algebra of gmm_affine, gmm_observe, gmm_aggregate, gmm_missing.proxymix::vignette("from_kde") — Contract C with kernel density estimators (Scenario C in this primer, planned for v0.2).proxymix::vignette("three_regimes") — moment matching, sample EM, and KLD-EM compared on the same target.| Term | Guided | Technical |
|---|---|---|
| Gaussian mixture | A sum of bells (Gaussians) that can approximate almost any 2-D density. | g(x) = \sum_{k=1}^K w_k \mathcal N(x; \mu_k, \Sigma_k) with w_k > 0, \sum w_k = 1. |
| Evaluate-only target | A density whose formula we can compute at any point, but for which sampling is not given. | \pi(x) exposed as a log_density(x) callable without a samples attribute; targets regime (iii). |
| KL divergence | A non-negative mismatch score; zero means identical. | \mathrm{KL}(g \,\|\, \pi) = \int g \log(g/\pi); non-negative, asymmetric. |
| Importance sampling | Estimating an average under one distribution using draws from a different, easier one. | \mathbb E_g[h] \approx \frac{1}{n}\sum_i \frac{g(x_i)}{q(x_i)} h(x_i) with x_i \sim q; variance controlled by ESS. |
| Effective sample size | Quality measure for an importance-sampling estimate; higher is better, capped at the raw count. | \mathrm{ESS} = (\sum_i w_i)^2 / \sum_i w_i^2; Owen’s bounds. |
| Kalman update | The exact formula for updating a Gaussian belief after a noisy linear observation. | Per-component conjugate update with gain K = \Sigma A^\top (A\Sigma A^\top + R)^{-1}. |
The two interactive widgets are driven by precomputed states that were generated once with the proxymix package. The page itself is a single HTML file with no Internet dependencies — it works on a shared drive, in an email attachment, or on a USB stick.
The precompute pipeline lives at scripts/precompute_states.R. It runs against proxymix 0.3.0, R 4.5.2, seed 1L, validation seed 2L, validation size 3 000. Both state files (states/banana.json, states/observe.json) carry an inputs_sha256 header; CI re-runs the script and gates on JSON structural equality. The inlined <script type="application/json"> blocks in the rendered HTML carry the same payload, so the page is authoritative for offline reading. Source repo: https://github.com/max578/proxymix-tutorial (planned).