Skip to contents

What is a recipe?

The masque_recipe is the only artefact that must remain confidential alongside the original data. It is what makes the round-trip work: without the recipe you cannot translate a pipeline trained on the synthetic back onto the original.

A recipe is an S7 object, but users do not need to know that. The two exported accessors hide the class details:

m   <- mask(df, roles, mode = "collaborate", seed = 1)
rec <- recipe(m)
class(rec)
#> [1] "masque::masque_recipe" "S7_object"

Anatomy

A recipe holds runtime-minimal state by default:

  • masque_version — the package version that built the recipe.
  • created_at — wall-clock timestamp at construction.
  • mode"local" or "collaborate".
  • seed — the seed passed to mask(), or NULL if not given.
  • roles — the per-column role tibble.
  • column_name_map — original-to-synthetic column-name map (currently NULL; reserved for a future opt-in column-aliasing flag — see vignette("roadmap")).
  • level_maps — per-column factor / character maps. The sensitive bit.
  • storage_classes — per-column R class of the original.
  • factor_meta — per-factor levels and ordered status.
  • warnings — text of any warnings raised at construction.
  • integrity_fp — SHA-256 of is.na(original). An integrity fingerprint, not a privacy guarantee.

What it deliberately does not hold:

  • Simulator state (the copula covariance matrix, the raw observed margins). Reserved for a future opt-in via save_recipe(..., include_simulator = TRUE); currently a no-op.
  • Raw observed values.
  • Source file paths, machine usernames, or absolute paths.

print(recipe) is redacted

The default print method shows the per-column role table and a marker indicating whether a level map exists for each column (* = mapped, = = no map), but never the actual level vocabularies.

rec
#> 
#> ── masque_recipe ───────────────────────────────────────────────────────────────────────────────────
#>  Created: 2026-05-18 02:33:42 UTC
#>  Mode: collaborate
#>  Seed: present (redacted)
#>  masque version: 0.4.1
#>  Integrity fingerprint: 0cec319ba9e2...
#> 
#> ── Columns (7 total; 1 level-map(s); 0 column-name map(s)) ──
#> 
#>   = design     plot                              (integer)
#>   = design     rep                               (factor)
#>   = design     block                             (factor)
#>   * treatment  gen                               (factor)
#>   = outcome    yield                             (numeric)
#>   = design     row                               (integer)
#>   = design     col                               (integer)
#> 
#>  PRIVATE - never share this recipe alongside the synthetic.
#> Use `reveal_maps(rec)` to inspect level maps explicitly.

If you need to inspect the maps — typically the data owner reviewing the recipe before saving — call reveal_maps() explicitly:

reveal_maps(rec)
#> ! Revealing sensitive level maps. Proceed at your discretion.
#> 
#> ── gen
#> 
#> ── seed

reveal_maps() prints a warning banner (“Revealing sensitive level maps. Proceed at your discretion.”) and then dumps every map and the seed value. Save its output sparingly.

Saving and loading

save_recipe() writes a single .rds file. The default is runtime-minimal — small, safe to store next to the original data with the same security class.

tmp <- tempfile(fileext = ".rds")
save_recipe(rec, tmp)
file.info(tmp)$size
#> [1] 6810

read_recipe() validates the file and informs (does not error) when the recorded masque_version differs from the currently installed package version.

rec2 <- read_recipe(tmp)
identical(rec@integrity_fp, rec2@integrity_fp)
#> [1] TRUE

The integrity fingerprint

integrity_fp is digest::digest(is.na(original), algo = "sha256"). It lets a downstream consumer check that a recipe corresponds to the expected missingness pattern, without exposing any other information about the original data.

digest::digest(is.na(df), algo = "sha256") == rec@integrity_fp
#> [1] TRUE

It is not a privacy mechanism. The hash tells you whether two data frames share the same NA mask; it does not hide the underlying mask or its risks.

Round-trip the maps directly

The recipe is the bidirectional translator. apply_recipe() and unmask() both operate on it:

fwd  <- apply_recipe(df, rec)
back <- unmask(fwd, rec)
identical(as.character(back$gen), as.character(df$gen))
#> [1] TRUE

Future: include_simulator = TRUE

save_recipe(rec, path, include_simulator = TRUE) is accepted today but is currently a no-op (no simulator state is stored on the recipe). A future release will use this flag to persist enough state that draw_new_synthetic(rec, n) can produce fresh synthetic samples without access to the original. See vignette("roadmap") for the deferred items.