Roadmap

This page tracks features that are deliberately deferred from the current release. Each item has a stable position in the long-term design; future releases will land them behind explicit opt-in flags and new exports. Version numbers are intentionally not committed: the order in which deferred features land depends on user demand and the surface area each one needs.

What has already landed

v0.2.0 — first public surface: mask(), recipe, audit_mask(), apply_recipe() / unmask() round-trip.
v0.3.0 — detect_design() + plot_design_summary() + propose_roles(detect = TRUE) default.
v0.4.0 — synthesise_geospatial() (preserves site-count and within-site clustering without publishing real coordinates).
v0.4.1 — contract sharpening: fail-closed unknown-level handling in apply_recipe() / unmask(); integrity-fingerprint check on apply_recipe(); atomic numeric pass-through in unmask(); honest exact_match_pct denominator in audit_mask(); synthesise_geospatial() NA-mask source authority.

Deferred — joint structure and shareability

Design-conditional synthesis

The current Gaussian copula uses a single global Pearson covariance over all numeric outcomes and covariates. For multi-environment trials where the genotype-by-environment interaction dominates the variance structure, this is the wrong unit of analysis.

Planned API:

mask(df, roles,
     mode = "collaborate",
     condition_on = c("rep", "site:year"))

The copula would be fit within each stratum defined by condition_on, preserving the conditional structure that mixed-effects models will see. Compute cost scales with the number of strata.

Mixed-margin copula

The current package deliberately breaks the joint between categorical covariates and numerics (categoricals are row-permuted independently of the numeric copula). For datasets where the joint matters — e.g., soil type clustering yield outliers — a Gaussian copula with discrete margins (Smith and Khaled 2012; ordinal probit links) is the natural extension.

Planned API: roles$kind == "ordinal" plus a covariate-level option to opt in.

`draw_new_synthetic(rec, n)`

Today, regenerating synthetic data requires the original. The recipe holds enough state to re-translate a pipeline but not to re-simulate. A future release will persist simulator state under save_recipe(rec, path, include_simulator = TRUE) (currently a no-op) so:

rec       <- read_recipe("path/to/rec.rds")
new_synth <- draw_new_synthetic(rec, n = 5000, seed = 99)

This is useful for cross-validation folds and bootstrap pipelines that need many synthetic draws without revealing the original.

Joint-treatment masking

mask() currently requires at most one treatment column. A future release will accept multiple treatment columns and produce joint aliases (factorial trials, treatment combinations) with an order-of-magnitude larger alias namespace and a corresponding update to audit_mask()’s leakage thresholds.

Deferred — collaboration ergonomics

Column-name aliasing in collaborate mode

The column_name_map slot in the recipe is currently NULL. A future release will support mask(..., alias_columns = TRUE) so that even column identities are hidden behind opaque names (x_001, x_002, …). apply_recipe() and unmask() already invert column-name maps if present.

Interactive role builder

propose_roles() is declarative — the user edits the returned tibble. A thin interactive wrapper (using cli prompts for ambiguous columns) would lower the barrier for one-off use. The declarative core stays unchanged so scripts remain reproducible.

`mask_csv()` convenience verb

A wrapper that reads a CSV, runs propose_roles() with sensible defaults, surfaces the role tibble for the user to confirm, and returns the masque object. Targets non-R-fluent data custodians.

Out of scope, permanently

Differential-privacy guarantees. A different package, different algorithms, different threat model. masque will not pretend.
Public-release safety claims. Synthetic from masque is for controlled sharing only; “safe to publish” is never a masque claim.
Pipeline source-code rewriting. Translation happens through the data via apply_recipe() and unmask(), not by mutating R or Python source code.

References

Smith, M. and Khaled, M. (2012). Estimation of copula models with discrete margins via Bayesian data augmentation. Journal of the American Statistical Association 107: 290-303.
John, J. A. (1987). Statistical Analysis of Experiments with Different Numbers of Replicates per Treatment. CRC Press.

If you have a use case that does not fit cleanly in the current release, open an issue with a small reproducible example.

Roadmap

Roadmap

What has already landed

Deferred — joint structure and shareability

Design-conditional synthesis

Mixed-margin copula

draw_new_synthetic(rec, n)

Joint-treatment masking

Deferred — collaboration ergonomics

Column-name aliasing in collaborate mode

Interactive role builder

mask_csv() convenience verb

Out of scope, permanently

References

`draw_new_synthetic(rec, n)`

`mask_csv()` convenience verb