Skip to contents

Importance-sampling estimate of \(Z = \int f(x)\, dx\) for the fit's target \(f\), using the fitted mixture \(\hat g\) as the proposal: $$\widehat{Z} = \frac{1}{n} \sum_{i=1}^n \frac{f(x_i)}{\hat g(x_i)}, \qquad x_i \sim \hat g,$$ computed in the log domain. For a Bayesian posterior handed over as likelihood x prior, \(\log Z\) is the log marginal likelihood, so a fitted proxy doubles as a model-comparison device.

Usage

gmm_evidence(fit, n = 4000L, seed = NULL)

Arguments

fit

A gmm_fit whose target carries a log_density.

n

Number of evidence draws from the fitted proxy.

seed

Optional integer seed for the evidence draw.

Value

A list of class proxymix_evidence with elements log_z, se_log_z, n, ess, max_weight_share, top_decile_share, and flagged (the heavy-tail indicator).

Details

The estimator is exact in expectation for any proposal that dominates \(f\), and its Monte Carlo error is driven by how well \(\hat g\) matches \(f\) – which is precisely what the fit optimised. The variance is finite only when \(\hat g\) has tails at least as heavy as \(f\); a right-tail diagnostic is returned (the effective sample size and the share of the estimate carried by the largest ten percent of weights, which sits near 0.10 for a well-matched proxy), and a classed warning (proxymix_heavy_tail) is raised when it indicates an untrustworthy tail. Results also report the delta-method standard error of \(\log \widehat{Z}\).

When the target declares itself normalised (normalised = TRUE), the true value is \(\log Z = 0\) and the function still estimates it – a useful end-to-end diagnostic of the fit.

Examples

## An unnormalised target with a known constant: log f = log N(., 0, I) + 3.
tgt <- gmm_target(
  n_dim = 2L,
  log_density = function(x) {
    if (is.null(dim(x))) x <- matrix(x, ncol = 2L)
    -0.5 * rowSums(x^2) - log(2 * pi) + 3
  },
  normalised = FALSE, name = "shifted_gaussian"
)
fit <- fit_kld_em(tgt, N = 1L, is_size = 2000L, max_iter = 40L, seed = 1L)
ev <- gmm_evidence(fit, n = 2000L, seed = 2L)
ev$log_z   # close to 3
#> [1] 3.00043