Skip to content

Distributions

For numeric fields, dates, and enum picks, DataMaker lets you choose how values are sampled. The default is uniform random, but real-world data is rarely uniform — ages cluster around a mean, country distributions are skewed, prices follow a long-tail.

Uniform random

The default. Every value in the range is equally likely.

{ "name": "age", "type": "number", "options": { "min": 18, "max": 80 } }

Weighted (for enum)

Bias an enum toward certain values. Use a weights map keyed by the value.

{
"name": "country",
"type": "enum",
"options": {
"values": ["DE", "AT", "CH", "FR"],
"weights": [10, 2, 1, 1]
}
}

90% of rows will be DE here (10 / (10+2+1+1) = 71%, then 14%, 7%, 7% — adjust as needed).

Gaussian (normal)

For number, float, date, currency. Specify mean and stddev; values are clamped to [min, max].

{
"name": "balance",
"type": "currency",
"options": {
"min": 0,
"max": 50000,
"mean": 8500,
"stddev": 4200
}
}

Rows cluster around the mean, with a bell-curve fall-off. Most balances will be in the €4-12k range; outliers up to €50k are rare.

Long-tail (Pareto)

Useful when you want a few high-value rows and many low-value ones — typical for e-commerce orders, transaction sizes, follower counts.

{
"name": "order_total",
"type": "currency",
"options": {
"min": 1,
"max": 10000,
"distribution": "pareto",
"alpha": 1.5
}
}

Lower alpha = fatter tail.

Date distributions

Dates support uniform, gaussian, and recent_bias (more recent dates are more likely — useful for created_at columns where most rows are from the last few days).

{
"name": "created_at",
"type": "datetime",
"options": {
"min": "2024-01-01",
"max": "now",
"distribution": "recent_bias",
"halflife_days": 30
}
}

Custom (Python)

When the built-in distributions aren’t enough, drop into Python. The function gets the template-level RNG so the output is reproducible if you set a seed.

# Per-field Python — `dm` is the DataMaker context, `rng` is the seeded RNG.
def value(rng, dm):
# Bimodal: half cluster around 1200, half around 7500
if rng.random() < 0.5:
return rng.gauss(1200, 200)
return rng.gauss(7500, 800)

See Custom Python per field for the full API.

Reproducibility

Set seed on the template (Settings → Seed) to make every generation deterministic. Useful when CI runs need stable test fixtures.