Sensitive fields & masking
The point of synthetic data is that it can replace real data without the legal, audit, or breach risk. DataMaker treats this as a first-class concern: any field can be marked sensitive, and from that moment on the platform behaves differently for that field.
What sensitive: true does
When a field is sensitive, DataMaker:
- Substitutes a realistic, locale-correct fake at generation time. Real-shaped, never real.
- Refuses agent-driven exports unless explicitly approved. The AI agent will not copy a sensitive value into chat output, into a workspace file, or into a non-DataMaker API call.
- Tags exports so you have an audit log per template: which sensitive fields were present, which connection received them, when, and on whose authority.
- Excludes the value from logs — even live scenario logs that stream to the chat UI redact sensitive values.
Marking a field sensitive
In the template builder, click the gear next to the field and toggle Sensitive. Or in JSON:
{ "name": "tax_id", "type": "tax_id", "options": { "sensitive": true } }You can mark any field type sensitive — built-in types (like email, iban, tax_id,
ssn) come with the flag pre-set when you pick them.
Masking strategies
For each sensitive field you can pick how DataMaker substitutes the value:
replace (default)
Replace with a freshly generated fake of the same type. The output is not derivable from any real input — it’s a brand-new value.
real: alice.smith@example.comfake: rachel.weber@example.deformat-preserve
Preserve the shape of the input. Useful when downstream systems validate format. We hash the input deterministically so you get a stable mapping (the same real value always produces the same fake) without ever storing the real one.
real: alice.smith@example.comfake: fcbb1278@example.com ← same domain, scrambled local-partredact
Replace with a fixed token ([REDACTED], ***, configurable). Use when downstream
doesn’t need the field at all, only its presence.
Blocking exports
By default, sensitive fields can be exported only via:
- The DataMaker UI to a downloaded file (the human is in the loop).
- A scenario you authored (the script is the authorisation).
The agent cannot export sensitive fields without one of two opt-ins:
- A workspace-level setting (Owner only): “Agent may export sensitive fields”.
- An explicit per-chat override:
confirm export of sensitive fields: yes.
GDPR / audit
Every export of a template containing sensitive fields is logged. From Settings → Audit log you can filter by template, project, user, or date and export the log as CSV for your DPO.
The log records: timestamp, actor (user or agent session), template ID + version, count of sensitive fields, count of rows, target connection (or “download”), and outcome (success / partial / blocked).
Practical example
{ "name": "Customer (regression-safe)", "fields": [ { "name": "id", "type": "uuid" }, { "name": "first_name", "type": "first_name" }, { "name": "email", "type": "email", "options": { "sensitive": true, "masking": "format-preserve" } }, { "name": "iban", "type": "iban", "options": { "sensitive": true, "masking": "replace", "country": "DE" } }, { "name": "comments", "type": "paragraph", "options": { "sensitive": true, "masking": "redact" } } ]}A row generated from this template:
{ "id": "f47ac10b-58cc-4372-a567-0e02b2c3d479", "first_name": "Lukas", "email": "fcbb1278@example.de", "iban": "DE89 3704 0044 0532 0130 00", "comments": "[REDACTED]"}The email is format-preserving (downstream regex validation passes), the iban
is freshly generated and MOD-97 valid, and comments is gone.
See also
- Workflows → Mask PII / GDPR — full pipelines that combine sensitive flags, scenarios, and audited exports.
- Reference → Limits — how long audit logs are retained per plan.