Analyse existing data
Sometimes you don’t want to build a template — you have data, and you want a template that produces more like it. DataMaker can analyse:
- CSV files (one or many sheets).
- JSON documents (objects or arrays of objects).
- YAML / XML (best-effort).
- OpenAPI schemas (
components.schemas). - Live databases — point at a connection and a table, we read the schema + sample rows.
- Live SAP OData —
$metadata+ sample rows from$top=100.
The output is a real DataMaker template you can edit, version, and reuse.
CSV / JSON upload
In any chat (Agent mode), drag a file onto the prompt area. The agent runs:
create_template_from_csv (or _json, _openapi) ↓ - infer the column / property types - sample distributions for numeric / date columns - detect IDs, emails, phones, IBANs by pattern - mark obvious PII fields sensitive (name, email, tax_id, dob…) ↓preview_template ↓ show 3 rows for sanity checkYou see what it inferred and can correct anything. Saving creates a template.
OpenAPI schema
For services with an OpenAPI spec, the agent can build a template from a named schema:
agent: create a DataMaker template from the Customer schema in petstore.yaml
→ create_template_from_openapi(file="petstore.yaml", schema="Customer") - 12 properties found - inferred types: 9 strings, 2 numbers, 1 enum - 2 properties have format hints we'll honour: email, date-time → template "Customer (from OpenAPI)" created
agent: previewing 3 rows...Required vs optional fields in the schema map to nullable rules in the template.
Format-tagged strings (email, uuid, uri, ipv4, …) become the matching
DataMaker types.
Live database analysis
For a connected database, the agent can sample a table and infer:
agent: build a template from the customers table in conn_prod_postgres_readonly
→ fetch_endpoint_fields → 14 columns→ execute("SELECT * FROM customers LIMIT 1000") - 14 columns inferred - id: UUID - email: pattern-matched as email; sensitive - balance: distribution=long-tail (Pareto), alpha~1.4 - country: enum from observed distribution {DE: 71%, AT: 14%, CH: 8%, others} - created_at: distribution=recent_bias, halflife≈45 days→ template "Customer (from prod)" created (sensitive flags applied)Distributions are inferred from the sample, not a guess. The more rows you give it, the better it gets.
Live SAP OData
agent: build a template from A_BusinessPartner in conn_s4_sandbox
→ fetch_sap_metadata → 28 properties→ fetch_sap_records_filtered (top=100) - 28 properties inferred - BusinessPartner: ID - Country: enum {DE: 88%, AT: 9%, CH: 3%} - TaxNumber1: pattern-matched as DE USt-ID; sensitive - …→ template "A_BusinessPartner" createdThis is the foundation of the SAP regression workflow — the auto-generated template lets you generate more records that look like the real ones, without ever re-touching production.
What gets inferred well, what doesn’t
Reliable:
- Type — string vs number vs date vs boolean.
- Format-tagged strings — email, UUID, URL, IBAN.
- Categorical fields — when there are <30 unique values, we infer an
enum. - Numeric distributions — uniform / gaussian / long-tail.
- Sensitive fields — names, emails, national IDs, IBANs are flagged automatically.
Worth reviewing:
- Free-text fields — we’ll usually pick
paragraphorsentence; you may want something more structured. - Cross-field relationships —
email = first_name@company.compatterns aren’t always caught; consider a derived field. - Domain-specific codes — industry codes, internal IDs. We’ll fall back to
regex; refine toenumif you have the catalogue.
Tip
After analysis, run preview_template 5-10 times before saving. Each preview
generates 3 sample rows; if every set looks plausible, the template is well-shaped.
If you spot oddities, ask the agent to refine a specific field.