Importance-sampling estimate of \(Z = \int f(x)\, dx\) for the fit's
target \(f\), using the fitted mixture \(\hat g\) as the proposal:
$$\widehat{Z} = \frac{1}{n} \sum_{i=1}^n
\frac{f(x_i)}{\hat g(x_i)}, \qquad x_i \sim \hat g,$$
computed in the log domain. For a Bayesian posterior handed over as
likelihood x prior, \(\log Z\) is the log marginal likelihood, so a
fitted proxy doubles as a model-comparison device.
Arguments
- fit
A gmm_fit whose target carries a
log_density.- n
Number of evidence draws from the fitted proxy.
- seed
Optional integer seed for the evidence draw.
Value
A list of class proxymix_evidence with elements log_z,
se_log_z, n, ess, max_weight_share, top_decile_share, and
flagged (the heavy-tail indicator).
Details
The estimator is exact in expectation for any proposal that dominates
\(f\), and its Monte Carlo error is driven by how well \(\hat g\)
matches \(f\) – which is precisely what the fit optimised. The
variance is finite only when \(\hat g\) has tails at least as heavy
as \(f\); a right-tail diagnostic is returned (the effective sample
size and the share of the estimate carried by the largest ten percent
of weights, which sits near 0.10 for a well-matched proxy), and a
classed warning (proxymix_heavy_tail) is raised when it indicates an
untrustworthy tail. Results also report the delta-method standard error
of \(\log \widehat{Z}\).
When the target declares itself normalised (normalised = TRUE), the
true value is \(\log Z = 0\) and the function still estimates it –
a useful end-to-end diagnostic of the fit.
Examples
## An unnormalised target with a known constant: log f = log N(., 0, I) + 3.
tgt <- gmm_target(
n_dim = 2L,
log_density = function(x) {
if (is.null(dim(x))) x <- matrix(x, ncol = 2L)
-0.5 * rowSums(x^2) - log(2 * pi) + 3
},
normalised = FALSE, name = "shifted_gaussian"
)
fit <- fit_kld_em(tgt, N = 1L, is_size = 2000L, max_iter = 40L, seed = 1L)
ev <- gmm_evidence(fit, n = 2000L, seed = 2L)
ev$log_z # close to 3
#> [1] 3.00043