Python environment

Scenarios run on DataMaker-hosted Python workers. You don’t manage the environment — just the script.

Runtime

Python 3.12 (current stable).
One process per scenario run. Cold start ~600 ms; warm starts effectively instant if you re-run within ~5 minutes.
Each run gets its own working directory, mounted at /workspace. Files survive the duration of the run only — for persistence across runs, see Workspace files.

Pre-installed packages

Every worker comes with these:

Package	Why
`datamaker`	The official DataMaker SDK (entrypoint of every script).
`requests`	Generic HTTP. Use this for non-DataMaker REST calls.
`httpx`	Async HTTP. Available if you’d rather use async/await.
`psycopg[binary]`	Postgres driver (if you want raw SQL alongside `dm.connection`).
`pymysql`	MySQL driver.
`pymongo`	MongoDB driver.
`pandas`	DataFrame manipulation. Useful for transforming generated rows.
`pyyaml`	YAML parsing.
`python-dateutil`	Date parsing/arithmetic.
`faker`	Python’s Faker. Most cases use `dm.generate()` instead, but Faker is there if you need a quick one-off.
`pyodata`	OData client. Mostly redundant — `dm.connection.fetch()` is the supported path.

Adding packages

Each scenario can declare its own requirements:

# top of your scenario:
# requirements: arrow~=1.3, polars~=0.20
import arrow, polars as pl

# normal scenario code follows

DataMaker reads the # requirements: comment, resolves the dependency tree against PyPI, and installs into the worker’s .venv before running. Subsequent runs reuse the cached install.

Pinning rules: we accept any PEP 440 version specifier (==1.2.3, ~=1.3, >=2,<3). For reproducibility, prefer ~= (compatible release).

Environment variables

Scenarios have access to:

DataMaker context: DM_PROJECT_ID, DM_TEAM_ID, DM_SCENARIO_ID, DM_RUN_ID — set automatically; you usually don’t read them directly.
Workspace secrets: anything you set under Settings → Workspace secrets is available as os.environ["YOUR_KEY"]. Use this for third-party API tokens.
Run parameters: passed via the scenario’s API call ({"params": {...}}) and available as dm.params (a dict).

import os
slack_token = os.environ["SLACK_BOT_TOKEN"]   # workspace secret
env = dm.params.get("environment", "dev")     # run-time param

What’s not there

To keep workers fast and isolated:

No shell. No subprocess.run() of arbitrary binaries (we block it at runtime).
No persistent filesystem outside /workspace.
No outbound network to private IPs unless you’ve configured a VPN connector (Enterprise plans).

Running locally

You can develop scenarios against your local Python:

pip install datamaker
export DM_API_KEY=your_api_key
python my_scenario.py

The same SDK works locally and in the worker. The only differences:

dm.params is empty unless you read CLI args yourself.
Workspace files are not mounted; use dm.workspace_file().download() if you need them locally.

For more, see API & SDKs → Python SDK.