Distributions
For numeric fields, dates, and enum picks, DataMaker lets you choose how values
are sampled. The default is uniform random, but real-world data is rarely uniform —
ages cluster around a mean, country distributions are skewed, prices follow a long-tail.
Uniform random
The default. Every value in the range is equally likely.
{ "name": "age", "type": "number", "options": { "min": 18, "max": 80 } }Weighted (for enum)
Bias an enum toward certain values. Use a weights map keyed by the value.
{ "name": "country", "type": "enum", "options": { "values": ["DE", "AT", "CH", "FR"], "weights": [10, 2, 1, 1] }}90% of rows will be DE here (10 / (10+2+1+1) = 71%, then 14%, 7%, 7% — adjust as needed).
Gaussian (normal)
For number, float, date, currency. Specify mean and stddev; values are
clamped to [min, max].
{ "name": "balance", "type": "currency", "options": { "min": 0, "max": 50000, "mean": 8500, "stddev": 4200 }}Rows cluster around the mean, with a bell-curve fall-off. Most balances will be in the €4-12k range; outliers up to €50k are rare.
Long-tail (Pareto)
Useful when you want a few high-value rows and many low-value ones — typical for e-commerce orders, transaction sizes, follower counts.
{ "name": "order_total", "type": "currency", "options": { "min": 1, "max": 10000, "distribution": "pareto", "alpha": 1.5 }}Lower alpha = fatter tail.
Date distributions
Dates support uniform, gaussian, and recent_bias (more recent dates are more
likely — useful for created_at columns where most rows are from the last few days).
{ "name": "created_at", "type": "datetime", "options": { "min": "2024-01-01", "max": "now", "distribution": "recent_bias", "halflife_days": 30 }}Custom (Python)
When the built-in distributions aren’t enough, drop into Python. The function gets the template-level RNG so the output is reproducible if you set a seed.
# Per-field Python — `dm` is the DataMaker context, `rng` is the seeded RNG.def value(rng, dm): # Bimodal: half cluster around 1200, half around 7500 if rng.random() < 0.5: return rng.gauss(1200, 200) return rng.gauss(7500, 800)See Custom Python per field for the full API.
Reproducibility
Set seed on the template (Settings → Seed) to make every generation deterministic.
Useful when CI runs need stable test fixtures.