Sensitive fields & masking

The point of synthetic data is that it can replace real data without the legal, audit, or breach risk. DataMaker treats this as a first-class concern: any field can be marked sensitive, and from that moment on the platform behaves differently for that field.

What `sensitive: true` does

When a field is sensitive, DataMaker:

Substitutes a realistic, locale-correct fake at generation time. Real-shaped, never real.
Refuses agent-driven exports unless explicitly approved. The AI agent will not copy a sensitive value into chat output, into a workspace file, or into a non-DataMaker API call.
Tags exports so you have an audit log per template: which sensitive fields were present, which connection received them, when, and on whose authority.
Excludes the value from logs — even live scenario logs that stream to the chat UI redact sensitive values.

Marking a field sensitive

In the template builder, click the gear next to the field and toggle Sensitive. Or in JSON:

{ "name": "tax_id", "type": "tax_id", "options": { "sensitive": true } }

You can mark any field type sensitive — built-in types (like email, iban, tax_id, ssn) come with the flag pre-set when you pick them.

Masking strategies

For each sensitive field you can pick how DataMaker substitutes the value:

`replace` (default)

Replace with a freshly generated fake of the same type. The output is not derivable from any real input — it’s a brand-new value.

real:  alice.smith@example.com
fake:  rachel.weber@example.de

`format-preserve`

Preserve the shape of the input. Useful when downstream systems validate format. We hash the input deterministically so you get a stable mapping (the same real value always produces the same fake) without ever storing the real one.

real:  alice.smith@example.com
fake:  fcbb1278@example.com   ← same domain, scrambled local-part

`redact`

Replace with a fixed token ([REDACTED], ***, configurable). Use when downstream doesn’t need the field at all, only its presence.

Blocking exports

By default, sensitive fields can be exported only via:

The DataMaker UI to a downloaded file (the human is in the loop).
A scenario you authored (the script is the authorisation).

The agent cannot export sensitive fields without one of two opt-ins:

A workspace-level setting (Owner only): “Agent may export sensitive fields”.
An explicit per-chat override: confirm export of sensitive fields: yes.

Every export of a template containing sensitive fields is logged. From Settings → Audit log you can filter by template, project, user, or date and export the log as CSV for your DPO.

The log records: timestamp, actor (user or agent session), template ID + version, count of sensitive fields, count of rows, target connection (or “download”), and outcome (success / partial / blocked).

Practical example

{
  "name": "Customer (regression-safe)",
  "fields": [
    { "name": "id",         "type": "uuid" },
    { "name": "first_name", "type": "first_name" },
    { "name": "email",      "type": "email",
      "options": { "sensitive": true, "masking": "format-preserve" } },
    { "name": "iban",       "type": "iban",
      "options": { "sensitive": true, "masking": "replace", "country": "DE" } },
    { "name": "comments",   "type": "paragraph",
      "options": { "sensitive": true, "masking": "redact" } }
  ]
}

A row generated from this template:

{
  "id":         "f47ac10b-58cc-4372-a567-0e02b2c3d479",
  "first_name": "Lukas",
  "email":      "fcbb1278@example.de",
  "iban":       "DE89 3704 0044 0532 0130 00",
  "comments":   "[REDACTED]"
}

The email is format-preserving (downstream regex validation passes), the iban is freshly generated and MOD-97 valid, and comments is gone.

Sensitive fields & masking

What sensitive: true does

Marking a field sensitive

Masking strategies

replace (default)

format-preserve

redact