Changelog
Source:NEWS.md
masque 0.4.1
Maintenance release: contract-sharpening corrections plus the documentation and metadata that were prepared for v0.4.0 but not released. No new public exports. The two behaviour changes below are deliberate fail-closed corrections to existing exports; user code that depended on the silent failure mode will need to be updated.
Behaviour: fail-closed corrections
-
apply_recipe()andunmask()now error when a non-NA value is not present in the recipe’s level map. Previously the row was silently coerced toNA, which could quietly poison downstream model matrices. Schema drift or a new treatment level in the input now fails closed with the offending values listed. -
apply_recipe()now verifies that the NA mask oforiginalmatches the recipe’s recordedintegrity_fp. A mismatch errors with guidance. Newcheck_integrity = TRUEparameter (default) gives an escape hatch (check_integrity = FALSE) for workflows where the missingness has legitimately changed since the recipe was built.
Bug fixes
-
unmask(x, rec)now passes through atomic numeric, integer, logical, andDate/POSIXctvectors unchanged, matching the documented numeric pass-through contract. Previously these inputs errored when the recipe held no level maps. -
audit_mask()’sexact_match_pctnow divides by the number of jointly-observed comparable cells, not bynrow(df). Columns dominated by NAs no longer underreport leakage. The audit tibble gains a newcomparable_ncolumn for interpretability. -
synthesise_geospatial()now usesoriginal’s NA mask as the authority for cell-level preservation (previously usedsynth’s mask, which could let synthesised coordinates leak into rows that the original had missing). Adds anrow(synth) == nrow(original)check.
Documentation
-
roles_validate()error message for the multiple-treatment case is refreshed: drops the stale “v0.2 / deferred to v0.3” wording and guides the user to either edit the roles tibble or callpropose_roles(df, detect = FALSE)for byte-stable v0.2.x behaviour. - Stale “arrive in build-order steps 6-7” comment in
mask()’s roxygen removed. -
recipe_io.Rdoc and therecipe_anatomyvignette reword theinclude_simulator = TRUEno-op without pinning it to v0.2 / v0.3. -
roadmapvignette restructured around feature areas. The hard version pins (“v0.3”, “v0.4”) are gone — v0.3 / v0.4 shipped different features from the prior roadmap, so the pins were stale. -
getting_startedvignette: “vignette(‘roadmap’) — what’s planned for v0.3+” replaced by “features deliberately deferred from the current release”.
Test suite
- Local MET integration tests (
test-mask-end-to-end.R,test-mask-roundtrip-integration.R) callpropose_roles(df, detect = FALSE)so the suite is clean against the maintainer’s local fixtures while the multi-treatment design decision remains roadmap. - Three jitter tests that intentionally trigger the collaborate-mode HIGH-leakage warning now wrap with
expect_warning("HIGH leakage")so future warning regressions remain visible. - New tests cover: atomic numeric / integer / logical pass-through in
unmask(); fail-closed unknown-level handling inapply_recipe()andunmask();integrity_fpenforcement (positive, negative, and thecheck_integrity = FALSEescape hatch);synthesise_geospatial()NA-mask source authority and row-count check.
masque 0.4.0
Adds first-class geospatial synthesis. One new export, no breaking changes to the v0.3.0 surface.
New export
-
synthesise_geospatial(synth, original, anchor_col, lat_col, lon_col, anchor_centroids, site_spread_deg, jitter_deg, seed)— re-anchors the latitude / longitude columns in a masqued data frame at user-supplied centroids, while preserving (a) the count of distinct sites per anchor level, (b) the per-site replication distribution, and (c) within-site tight clustering with between-site spread. The original positions are never published; the function reads them only to count distinct sites. NA pattern in coordinates is preserved cell-by-cell. RNG hygiene viawithr::local_preserve_seed().Motivated by the masque release walkthrough, where state-centroid
- uniform-jitter (per-walkthrough recipe) failed to preserve the within-state clustering of real trial sites.
CRAN and r-universe readiness
- Added
cran-comments.mdfor first-submission notes. - Added
.github/workflows/R-CMD-check.yaml(r-lib standard matrix: Linux release / devel / oldrel-1, macOS release, Windows release). -
R CMD check --as-cranreports 0 errors, 0 warnings, 2 NOTEs (new-submission boilerplate and local HTML Tidy environmental).
masque 0.3.0
Adds automatic experimental-design detection and a sanity-check visualisation. New public surface: 3 exports, 1 vignette.
New exports
-
detect_design(df, roles = NULL, interactive = FALSE, threshold = 0.5, tie_delta = 0.02)— returns an S7design_summarywith the most likely design class (CRD,RCBD,IBD/alpha-lattice,row-column,split-plot,factorial, ornone), per-rule scores, evidence, and arecommended_rolestibble. Rule engine, not ML. -
design_summary— S7 class wrapping the detection result.print()is cli-styled and surfaces top-3 alternates so the user can see how confident the call was. Slots includeclass_label,treatment_col,block_cols,whole_plot_col,sub_plot_col,spatial_cols,scores,evidence,recommended_roles,candidates,warnings. -
plot_design_summary(x, df, engine = c("base", "ggplot2"))— also registered as an S7plot()method. Base-graphics sanity-check visualisation dispatched per class: replication tile, spatial layout, factor-nesting tree, treatment-frequency + NA-pattern.
Behaviour change
-
propose_roles(df)flips todetect = TRUEby default. The detected design’srecommended_rolesare overlaid on the name-based proposal, promoting structurally-identified treatments and blocks even when their column names don’t match the design / treatment regexes (e.g.,genin an alpha-lattice). Thedesign_summaryis stashed asattr(roles, "design"). Passdetect = FALSEto recover the v0.2.x byte-stable behaviour.
Design philosophy
- Detection is read-only.
mask()synthesis behaviour is unchanged. Onlypropose_roles()consumes detection output, and only as role hints. - Rule engine over ML: each of the six rules is a pure function returning a score in
[0, 1]with evidence; the orchestrator picks the top above threshold, breaking ties in favour of the simpler design (CRD < RCBD < factorial < IBD < row-column < split-plot). - Visualisation is sanity-check grade. For publication-quality field layouts use
desplot::desplot()orggplot2-based packages.
Suggests
-
agridat— canonical fixtures for tests and the new vignette. -
ggplot2— optional plot engine viaengine = "ggplot2"; base graphics is the default and the fallback.
Limitations
- The detector cannot distinguish a true split-plot from a factorial-in-blocks: both have the same data layout. The whole-plot / sub-plot assignment uses cardinality (fewer levels = whole-plot), which is heuristic.
- Detection on fewer than ~20 rows is unreliable. Pass
detect = FALSEfor toy fixtures.
masque 0.2.0
First public release of masque — a structurally faithful development surrogate for tabular datasets. Successor to the unreleased synthPR v0.1.0 (folder-scanning multi-file API), rewritten around a single-file data-frame-first interface and a round-trippable recipe object.
masque is not an anonymisation or differential-privacy tool. It produces development surrogates suitable for building and debugging pipelines, and a private recipe that re-targets a pipeline built against the synthetic clone back onto the original data. See vignette("confidentiality") for the threat model.
Design
-
Strict 5-role taxonomy for columns:
design,treatment,outcome,covariate,ignore. Multi-outcome supported. Date / POSIX columns and PII-pattern column names default toignore. -
Two modes with different safety postures:
-
local— realistic dev surrogate for the data owner. Column names and level vocabularies preserved. Treatment-level permutation is opt-in. Issues a load-time warning when the synthetic is extracted. -
collaborate— give the synthetic to a collaborator while keeping the recipe private. Treatment + categorical-covariate levels are opaque-aliased (trt_001,<col>_L01). Numeric draws are jittered within column resolution; integer columns are stochastically rounded.ignorecolumns are dropped.audit_mask()runs automatically and warns on HIGH leakage.
-
Public API (11 exports)
-
propose_roles(df)— heuristics-driven role tibble; the user edits and passes tomask(). -
roles_validate(roles, df)— fail-closed structural + semantic check. -
mask(df, roles, mode, seed, ...)— returns an S7masqueobject. -
synthetic(m)/recipe(m)— accessors that hide S7. -
apply_recipe(original, recipe)— forward translate original-namespace data into the synthetic namespace. -
unmask(x, recipe, column = NULL)— inverse on a data frame or atomic vector; round-trips a pipeline back to the original. -
save_recipe(rec, path, include_simulator = FALSE)/read_recipe(path)— runtime-minimal.rdspersistence (under 10 KB on a 17,000-row, 38-column MET fixture). -
audit_mask(m, original = NULL, print = TRUE)— first-class leakage audit returning the per-column severity tibble. -
reveal_maps(recipe)— explicit, banner-fenced unmasked-map reveal (never automatic;print(recipe)is redacted by default).
Synthesis engine
- Numeric: per-column empirical-quantile marginals + a single global Pearson copula correlation matrix sampled via Gaussian copula.
- Categorical: within-column row permutation that preserves the level set and marginal frequencies.
- NA mask: preserved cell-by-cell from the original.
- Design columns: byte-identical pass-through in both modes.
Confidentiality
- RNG hygiene throughout (
withr::with_seed/local_preserve_seed);mask()does not mutate the caller’s.Random.seed. -
recipeis runtime-minimal by default — no copula matrix or raw marginals stored. SHA-256 NA-mask fingerprint provided as an integrity check, not a privacy primitive. -
print(recipe)redacted by default;reveal_maps()is the only unmasked path. -
audit_mask()flags retained PII-pattern columns, unaliased treatments under collaborate, rare-level leakage, and numeric exact- match rates above the per-role thresholds.
Documentation
- Four vignettes:
getting_started,confidentiality,recipe_anatomy,roadmap. -
inst/extdata/john_alpha.csv— 72-row, 7-column public fixture derived fromagridat::john.alpha(John 1987, alpha design).