
Raw GAM fits and per-cell metrics for a smoothed-association matrix
Source:R/janusplot.R
janusplot_data.RdCompanion to janusplot() returning the raw list of GAM fits plus
per-cell metrics (EDF, F-test p-value, deviance explained, asymmetry
index, pairwise correlations, shape descriptors) without constructing
the ggplot. Useful for custom rendering or downstream analysis.
Usage
janusplot_data(
data,
vars = NULL,
adjust = NULL,
method = NULL,
k = -1L,
bs = "tp",
na_action = c("pairwise", "complete"),
parallel = FALSE,
keep_fits = FALSE,
derivatives = integer(),
derivative_ci = c("pointwise", "none", "simultaneous"),
derivative_ci_nsim = 1000L,
n_grid = NULL,
shape_cutoffs = janusplot_shape_cutoffs(),
k_check_thresholds = NULL,
auto_refit_k = FALSE,
k_max_iter = 2L,
engine = c("bam", "gam"),
discrete = FALSE,
nthreads = 1L,
...
)Arguments
- data
A data frame with numeric columns to include.
- vars
Character vector of column names to use.
NULL(default) uses all numeric columns indata. Non-numeric columns trigger an error listing offenders.- adjust
A one-sided formula RHS giving additional covariates and/or random effects to include in every pairwise GAM. For example,
adjust = ~ s(age) + s(site, bs = "re")fitsgam(y ~ s(x) + s(age) + s(site, bs = "re"))for each pair. DefaultNULLfits unadjusted pairwise smooths.- method
Smoothing-parameter estimation method passed to the chosen fitting backend. Default
NULLresolves per-engine:"fREML"forengine = "bam"(mgcv's recommended at scale),"REML"forengine = "gam"(the v0.1.0 behaviour). Pass any valid mgcv method string to override.- k
Integer, or named list mapping variable names to integers. Basis dimension for
s(). Default-1L(mgcv's automatic choice).- bs
Basis type for
s(). Default"tp"(thin plate).- na_action
One of
"pairwise"(default; per-cell complete observations) or"complete"(listwise; all cells use the same rows).- parallel
Logical. If
TRUE, usefuture.apply::future_mapply()to fit pairs in parallel. Requires thefuture.applypackage and a user-configuredfuture::plan(). DefaultFALSE.- keep_fits
Logical. If
TRUE, retain fullmgcv::gam()model objects in the return (large memory footprint forkabove ~15). DefaultFALSE— retains summary metrics and prediction grids only.- derivatives
Integer vector of derivative orders to compute on every pair (subset of
1:2). Defaultinteger()— no derivatives. Unlikejanusplot(), the data companion can return multiple orders from a single call for programmatic analysis; passc(1L, 2L)to surface both.- derivative_ci
One of
"none"(default),"pointwise", or"simultaneous". Controls whether — and how — a 95% confidence ribbon is drawn underneath the derivative curve whendisplay %in% c("d1", "d2"). Ignored whendisplay = "fit"."none"— no ribbon. The curve and the zero reference line are all you see. Default, because pointwise ribbons overshoot nominal coverage as a joint region and can invite over-reading of local features."pointwise"— 95% pointwise ribbon from \(\sqrt{\mathrm{diag}(D V_p D^\top)}\) (Wood 2017 §7.2.4). Valid marginally; not a simultaneous statement."simultaneous"— 95% simultaneous band via the Monte Carlo construction of Ruppert, Wand & Carroll (2003) popularised for GAMs by Simpson (2018, Frontiers Ecol. Evol. 6:149): draw \(B\) samples \(\tilde{\boldsymbol\beta} \sim \mathcal{N}(\hat{\boldsymbol\beta}, V_p)\), compute \(\max_x |D_i(\tilde{\boldsymbol\beta} - \hat{\boldsymbol\beta})| / \mathrm{se}_i\), and use the \((1-\alpha)\) quantile as a critical multiplier on the pointwise SE. Valid for feature localisation ("where is \(\hat f'(x)\) significantly non-zero").
- derivative_ci_nsim
Integer. Number of Monte Carlo samples used when
derivative_ci = "simultaneous". Default1000L— a compromise between coverage accuracy (Simpson 2018 uses 10000) and CPU budget across every pair in a medium-sized matrix. Ignored for any otherderivative_ci.- n_grid
Integer or
NULL. Number of equally-spaced points used to evaluate each fitted smooth (and its derivatives). DefaultNULLresolves to100whendisplay = "fit"and200otherwise, because finite-difference second derivatives visibly degrade below \(\sim 150\) points on moderate-ksmooths. Supplyingn_griddirectly overrides both defaults. Larger grids shift the numerical shape-metric values (\(M\), \(C\), turning / inflection counts) slightly because they are computed on this same grid. Shapes and asymmetry are the primary reading;M,Cand the counts are secondary diagnostics and the grid-induced drift is tolerable.- shape_cutoffs
Named list of classification thresholds used to map the continuous shape indices (
monotonicity_index,convexity_index) into discreteshape_categorylabels. Defaults fromjanusplot_shape_cutoffs().- k_check_thresholds
Named list giving the three flag-criterion thresholds used by
mgcv::k.check()-style basis-dimension diagnostics. Required entries:edf_ratio(Wood's \(\widehat{\mathrm{edf}}/k'\) ratio above which the smooth is too close to its basis cap),k_index(residual-difference variance ratio below which the basis appears underspecified), andp(the simulation p-value below which the basis-deficiency signal is significant). Defaults —edf_ratio = 0.9,k_index = 1.0,p = 0.05— trackmgcv::gam.check()and Wood (2017) §5.9.- auto_refit_k
Logical. If
TRUE, every cell whose Wood trifecta flags an underfit is refit with a doubling-k loop until either the flag clears, the per-cell unique-x cap is reached, ork_max_iteriterations have passed. DefaultFALSE— the diagnostic (k_check_status,k_flag,k_prime,k_index,k_p) is always computed and surfaced regardless of this flag, but the refit is opt-in because it can multiply wall time on pathological data.- k_max_iter
Integer. Maximum number of doublings allowed per cell when
auto_refit_k = TRUE. Default2L(so a cell that starts at themgcvdefaultk = 10will visit at mostk = 20and thenk = 40, capped by the per-cell unique-x limit). Ignored whenauto_refit_k = FALSE.- engine
One of
"bam"(default, new in v0.1.1) or"gam". Selects mgcv's fitting backend:"bam"—mgcv::bam(). Block-Lanczos solve + fREML estimation + lower memory. ~3-10x speedup at janusplot's scale (k = 15-25 vars, 600+ pairwise fits per call). The default, and the one non-byte-identical change in v0.1.1: fREML differs from REML by ~1-3% in EDF on identical data, so the asymmetry index may shift by similar amounts vs v0.1.0 output. Recoverable verbatim viaengine = "gam"."gam"—mgcv::gam(). The v0.1.0 backend. Use for backward-compat reproduction, very small n (< 200) where bam's setup overhead exceeds its solve gain, or methodologically sensitive contexts that require REML rather than fREML.
- discrete
Logical.
bam-only opt-in to mgcv's covariate-discretisation optimisation. Further ~2-5x speedup at the cost of small (sub-pixel at typical janusplot resolution) prediction-shift. DefaultFALSE. Ignored whenengine = "gam".- nthreads
Integer.
bam-only intra-fit threading. Default1Lto avoid oversubscription when combined withparallel = TRUE(future.applyfans out pair fits across cores; nthreads > 1 within each pair would double-book CPUs). Raise above 1 only whenparallel = FALSE. Ignored whenengine = "gam".- ...
Additional arguments passed to
mgcv::gam().
Value
A list with components:
varsCharacter vector of variables used, in plotted order.
pairsList of per-pair results. Each element has
i,j,var_i,var_j,fit_yx,fit_xy(NULL ifkeep_fits = FALSE),pred_yx,pred_xy(data frames withx,fit,se,lo,hi),edf_yx,edf_xy,pvalue_yx,pvalue_xy,dev_exp_yx,dev_exp_xy,n_used,asymmetry_index, plus Pearson / Spearman / Kendall correlations (cor_pearson,cor_spearman,cor_kendall), the maximum tie ratio acrossxandy(tie_ratio), and per-direction shape descriptors (monotonicity_index_yx,convexity_index_yx,monotonicity_index_xy,convexity_index_xy,n_turning_yx,n_inflect_yx,n_turning_xy,n_inflect_xy,shape_yx,shape_xy). Whenderivativesis non-empty, each pair additionally carriesderiv_yxandderiv_xy, each a named list keyed by order ("1","2") whose entries are data frames with columnsx,fit,se,lo,hi,ci_typematching the schema ofpred_yx/pred_xy. Theci_typecolumn records whether thelo/hicolumns are"pointwise"(default),"simultaneous"(Ruppert–Wand–Carroll / Simpson 2018 critical-multiplier bands), or"none". Whenderivative_ci = "simultaneous", each derivative frame also carries a"crit_multiplier"attribute giving the MC-derived critical multiplier used. Seejanusplot_shape_metrics()for the definition of the monotonicity and convexity indices.callMatch call.
See also
janusplot() for the ggplot front-end,
janusplot_shape_metrics() for the shape-metric primitives.
Other smooth-associations:
janusplot()
Examples
# Per-pair fits + metrics on a small mtcars slice
out <- janusplot_data(mtcars[, c("mpg", "hp", "wt")])
out$pairs[[1L]]$asymmetry_index
#> [1] 0.006028205
out$pairs[[1L]]$cor_spearman
#> [1] -0.8946646
out$pairs[[1L]]$shape_yx
#> [1] "s_shape"