Skip to content

Sensitive fields & masking

The point of synthetic data is that it can replace real data without the legal, audit, or breach risk. DataMaker treats this as a first-class concern: any field can be marked sensitive, and from that moment on the platform behaves differently for that field.

What sensitive: true does

When a field is sensitive, DataMaker:

  1. Substitutes a realistic, locale-correct fake at generation time. Real-shaped, never real.
  2. Refuses agent-driven exports unless explicitly approved. The AI agent will not copy a sensitive value into chat output, into a workspace file, or into a non-DataMaker API call.
  3. Tags exports so you have an audit log per template: which sensitive fields were present, which connection received them, when, and on whose authority.
  4. Excludes the value from logs — even live scenario logs that stream to the chat UI redact sensitive values.

Marking a field sensitive

In the template builder, click the gear next to the field and toggle Sensitive. Or in JSON:

{ "name": "tax_id", "type": "tax_id", "options": { "sensitive": true } }

You can mark any field type sensitive — built-in types (like email, iban, tax_id, ssn) come with the flag pre-set when you pick them.

Masking strategies

For each sensitive field you can pick how DataMaker substitutes the value:

replace (default)

Replace with a freshly generated fake of the same type. The output is not derivable from any real input — it’s a brand-new value.

real: alice.smith@example.com
fake: rachel.weber@example.de

format-preserve

Preserve the shape of the input. Useful when downstream systems validate format. We hash the input deterministically so you get a stable mapping (the same real value always produces the same fake) without ever storing the real one.

real: alice.smith@example.com
fake: fcbb1278@example.com ← same domain, scrambled local-part

redact

Replace with a fixed token ([REDACTED], ***, configurable). Use when downstream doesn’t need the field at all, only its presence.

Blocking exports

By default, sensitive fields can be exported only via:

  • The DataMaker UI to a downloaded file (the human is in the loop).
  • A scenario you authored (the script is the authorisation).

The agent cannot export sensitive fields without one of two opt-ins:

  • A workspace-level setting (Owner only): “Agent may export sensitive fields”.
  • An explicit per-chat override: confirm export of sensitive fields: yes.

GDPR / audit

Every export of a template containing sensitive fields is logged. From Settings → Audit log you can filter by template, project, user, or date and export the log as CSV for your DPO.

The log records: timestamp, actor (user or agent session), template ID + version, count of sensitive fields, count of rows, target connection (or “download”), and outcome (success / partial / blocked).

Practical example

{
"name": "Customer (regression-safe)",
"fields": [
{ "name": "id", "type": "uuid" },
{ "name": "first_name", "type": "first_name" },
{ "name": "email", "type": "email",
"options": { "sensitive": true, "masking": "format-preserve" } },
{ "name": "iban", "type": "iban",
"options": { "sensitive": true, "masking": "replace", "country": "DE" } },
{ "name": "comments", "type": "paragraph",
"options": { "sensitive": true, "masking": "redact" } }
]
}

A row generated from this template:

{
"id": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
"first_name": "Lukas",
"email": "fcbb1278@example.de",
"iban": "DE89 3704 0044 0532 0130 00",
"comments": "[REDACTED]"
}

The email is format-preserving (downstream regex validation passes), the iban is freshly generated and MOD-97 valid, and comments is gone.

See also