Compressing a Bayesian posterior you can evaluate but not sample
Source:vignettes/posterior_proxy.Rmd
posterior_proxy.RmdThis is the workflow the package exists for, end to end on real data: you have an unnormalised log-posterior – a likelihood times a prior, the normalising constant unknown – and you want a compact object you can marginalise, condition, integrate, and put error bars on, without running and tuning a Markov chain.
A real unnormalised posterior
A Bayesian logistic regression of transmission type on weight for the
built-in mtcars data, under a flat prior. The log-posterior
is the log-likelihood, known only up to its normalising constant (the
model evidence):
y <- mtcars$am
w <- mtcars$wt
log_post <- function(theta) {
if (is.null(dim(theta))) theta <- matrix(theta, ncol = 2L)
eta <- outer(rep(1, length(y)), theta[, 1L]) + outer(w, theta[, 2L])
colSums(y * eta - log1p(exp(eta)))
}
tgt <- gmm_target(
n_dim = 2L,
log_density = log_post,
normalised = FALSE,
name = "logistic(am ~ wt)"
)Fit the proxy
A cheap point fit locates the proposal (the usual workflow: any rough
guess of location and scale will do – with adapt = "pmc"
the proposal is refreshed from the fit itself, so it only has to be
survivable, not good):
mle <- stats::glm(am ~ wt, data = mtcars, family = stats::binomial())
q0 <- proposal_mvt(2L, mean = stats::coef(mle),
sigma = 9 * stats::vcov(mle), df = 5)
fit <- fit_proxymix(tgt, N = 2L, regime = "kld", proposal = q0,
is_size = 3000L, max_iter = 60L, seed = 1L,
adapt = "pmc")
#> Warning: The importance-weighted EM objective decreased by 0.0716 during fitting.
#> ℹ This usually signals ridge regularisation interacting with a near-singular
#> component.
gmm_fit_quality(fit)
#> $regime
#> [1] "kld"
#>
#> $converged
#> [1] TRUE
#>
#> $degenerate
#> [1] FALSE
#>
#> $ess
#> [1] 2427.338
#>
#> $ess_relative
#> [1] 0.8091127
#>
#> $min_component_ess
#> [1] 1728.417
#>
#> $max_weight
#> [1] 0.0005377459
#>
#> $support_fraction
#> [1] 1
#>
#> $kld_final
#> [1] -7.823271
#>
#> $validation_gap
#> [1] 0.008137811The certificate above is the part to read before anything else: the effective sample size of the importance weights, the degeneracy flag, and the held-out validation gap travel with the fit through every operation below.
The evidence, with an independent check
The fitted proxy doubles as the importance proposal for the model evidence (the log marginal likelihood under the flat prior). A Laplace approximation computed directly from the MLE gives an independent first-order check:
ev <- gmm_evidence(fit, n = 4000L, seed = 2L)
ev
#> $log_z
#> [1] -7.836408
#>
#> $se_log_z
#> [1] 0.002402278
#>
#> $n
#> [1] 4000
#>
#> $ess
#> [1] 3909.77
#>
#> $max_weight_share
#> [1] 0.00084703
#>
#> $top_decile_share
#> [1] 0.1246958
#>
#> $flagged
#> [1] FALSE
#>
#> attr(,"class")
#> [1] "proxymix_evidence"
## Laplace approximation: log f(theta_hat) + (d/2) log(2 pi)
## - (1/2) log det(-Hessian).
H <- -solve(stats::vcov(mle))
log_post(matrix(stats::coef(mle), nrow = 1L)) + log(2 * pi) -
0.5 * as.numeric(determinant(-H, logarithm = TRUE)$modulus)
#> [1] -7.927324Closed-form reads off the proxy
Everything below is exact algebra on the fitted mixture – no chains, no further target evaluations. The marginal posterior of the slope, the probability the slope is negative, and a central 90% interval:
slope <- gmm_marginalise(fit, keep = 2L)
pgmm(0, slope) # P(beta_wt < 0 | data)
#> [1] 0.9999166
qgmm(c(0.05, 0.95), slope) # central 90% credible interval
#> [1] -8.072877 -2.533571Error bars on the proxy itself
The numbers above condition on the fitted mixture. The bootstrap ensemble prices the fit’s own sampling variability – at zero new posterior evaluations, because the replicates re-weight the fit’s cached importance draws:
ens <- gmm_fit_ensemble(fit, B = 80L, seed = 3L)
proxy_functional_ci(ens, gmm_mean, level = 0.9)
#> term estimate conf.low conf.high
#> 1 f1 14.80838 14.646580 14.972430
#> 2 f2 -4.91444 -4.968041 -4.862262
proxy_functional_ci(ens, function(g) pgmm(0, gmm_marginalise(g, keep = 2L)),
level = 0.9)
#> term estimate conf.low conf.high
#> 1 f1 0.9999166 0.9998968 0.9999476When to use this, and when not to
If you can evaluate the unnormalised posterior you can always run
MCMC and fit a sample-based mixture to the draws. Reach for the direct
fit when what you want is the compact closed-form object –
deterministic given the seed, no chain tuning, evidence and error bars
included – and the dimension is moderate (the effective sample size of
importance sampling falls sharply beyond roughly
–;
every fit reports it). For high-dimensional posteriors, sample first and
use regime (ii) on the draws. The nearest CRAN neighbour,
AdMit, adaptively fits a mixture of Student-t densities to
an evaluable kernel for use as a proposal; the Gaussian family here is
what buys the closed-form operator calculus and the certificate that
travel with the result.
References
- Cappé, O., Douc, R., Guillin, A., Marin, J.-M. and Robert, C. P. (2008). Adaptive importance sampling in general mixture classes. Statistics and Computing 18, 447–459. doi:10.1007/s11222-008-9059-x.
- Hoek, J. van der and Elliott, R. J. (2024). Mixtures of multivariate Gaussians. Stochastic Analysis and Applications. doi:10.1080/07362994.2024.2372605.
- Owen, A. and Zhou, Y. (2000). Safe and effective importance sampling. Journal of the American Statistical Association 95(449), 135–143. doi:10.1080/01621459.2000.10473909.
- Rubin, D. B. (1981). The Bayesian bootstrap. The Annals of Statistics 9(1), 130–134. doi:10.1214/aos/1176345338.