Roadmap
This page tracks features that are deliberately deferred from the current release. Each item has a stable position in the long-term design; future releases will land them behind explicit opt-in flags and new exports. Version numbers are intentionally not committed: the order in which deferred features land depends on user demand and the surface area each one needs.
What has already landed
- v0.2.0 — first public surface:
mask(),recipe,audit_mask(),apply_recipe()/unmask()round-trip. - v0.3.0 —
detect_design()+plot_design_summary()+propose_roles(detect = TRUE)default. - v0.4.0 —
synthesise_geospatial()(preserves site-count and within-site clustering without publishing real coordinates). - v0.4.1 — contract sharpening: fail-closed unknown-level handling in
apply_recipe()/unmask(); integrity-fingerprint check onapply_recipe(); atomic numeric pass-through inunmask(); honestexact_match_pctdenominator inaudit_mask();synthesise_geospatial()NA-mask source authority.
Deferred — joint structure and shareability
Design-conditional synthesis
The current Gaussian copula uses a single global Pearson covariance over all numeric outcomes and covariates. For multi-environment trials where the genotype-by-environment interaction dominates the variance structure, this is the wrong unit of analysis.
Planned API:
The copula would be fit within each stratum defined by
condition_on, preserving the conditional structure that
mixed-effects models will see. Compute cost scales with the number of
strata.
Mixed-margin copula
The current package deliberately breaks the joint between categorical covariates and numerics (categoricals are row-permuted independently of the numeric copula). For datasets where the joint matters — e.g., soil type clustering yield outliers — a Gaussian copula with discrete margins (Smith and Khaled 2012; ordinal probit links) is the natural extension.
Planned API: roles$kind == "ordinal"
plus a covariate-level option to opt in.
draw_new_synthetic(rec, n)
Today, regenerating synthetic data requires the original. The recipe
holds enough state to re-translate a pipeline but not to re-simulate. A
future release will persist simulator state under
save_recipe(rec, path, include_simulator = TRUE) (currently
a no-op) so:
rec <- read_recipe("path/to/rec.rds")
new_synth <- draw_new_synthetic(rec, n = 5000, seed = 99)This is useful for cross-validation folds and bootstrap pipelines that need many synthetic draws without revealing the original.
Joint-treatment masking
mask() currently requires at most one treatment column.
A future release will accept multiple treatment columns and produce
joint aliases (factorial trials, treatment combinations) with an
order-of-magnitude larger alias namespace and a corresponding update to
audit_mask()’s leakage thresholds.
Deferred — collaboration ergonomics
Column-name aliasing in collaborate mode
The column_name_map slot in the recipe is currently
NULL. A future release will support
mask(..., alias_columns = TRUE) so that even column
identities are hidden behind opaque names (x_001,
x_002, …). apply_recipe() and
unmask() already invert column-name maps if present.
Interactive role builder
propose_roles() is declarative — the user edits the
returned tibble. A thin interactive wrapper (using cli
prompts for ambiguous columns) would lower the barrier for one-off use.
The declarative core stays unchanged so scripts remain reproducible.
mask_csv() convenience verb
A wrapper that reads a CSV, runs propose_roles() with
sensible defaults, surfaces the role tibble for the user to confirm, and
returns the masque object. Targets non-R-fluent data custodians.
Out of scope, permanently
-
Differential-privacy guarantees. A different
package, different algorithms, different threat model.
masquewill not pretend. -
Public-release safety claims. Synthetic from
masqueis for controlled sharing only; “safe to publish” is never amasqueclaim. -
Pipeline source-code rewriting. Translation happens
through the data via
apply_recipe()andunmask(), not by mutating R or Python source code.
References
- Smith, M. and Khaled, M. (2012). Estimation of copula models with discrete margins via Bayesian data augmentation. Journal of the American Statistical Association 107: 290-303.
- John, J. A. (1987). Statistical Analysis of Experiments with Different Numbers of Replicates per Treatment. CRC Press.
If you have a use case that does not fit cleanly in the current release, open an issue with a small reproducible example.