Skip to content

Python environment

Scenarios run on DataMaker-hosted Python workers. You don’t manage the environment — just the script.

Runtime

  • Python 3.12 (current stable).
  • One process per scenario run. Cold start ~600 ms; warm starts effectively instant if you re-run within ~5 minutes.
  • Each run gets its own working directory, mounted at /workspace. Files survive the duration of the run only — for persistence across runs, see Workspace files.

Pre-installed packages

Every worker comes with these:

PackageWhy
datamakerThe official DataMaker SDK (entrypoint of every script).
requestsGeneric HTTP. Use this for non-DataMaker REST calls.
httpxAsync HTTP. Available if you’d rather use async/await.
psycopg[binary]Postgres driver (if you want raw SQL alongside dm.connection).
pymysqlMySQL driver.
pymongoMongoDB driver.
pandasDataFrame manipulation. Useful for transforming generated rows.
pyyamlYAML parsing.
python-dateutilDate parsing/arithmetic.
fakerPython’s Faker. Most cases use dm.generate() instead, but Faker is there if you need a quick one-off.
pyodataOData client. Mostly redundant — dm.connection.fetch() is the supported path.

Adding packages

Each scenario can declare its own requirements:

# top of your scenario:
# requirements: arrow~=1.3, polars~=0.20
import arrow, polars as pl
# normal scenario code follows

DataMaker reads the # requirements: comment, resolves the dependency tree against PyPI, and installs into the worker’s .venv before running. Subsequent runs reuse the cached install.

Pinning rules: we accept any PEP 440 version specifier (==1.2.3, ~=1.3, >=2,<3). For reproducibility, prefer ~= (compatible release).

Environment variables

Scenarios have access to:

  • DataMaker context: DM_PROJECT_ID, DM_TEAM_ID, DM_SCENARIO_ID, DM_RUN_ID — set automatically; you usually don’t read them directly.
  • Workspace secrets: anything you set under Settings → Workspace secrets is available as os.environ["YOUR_KEY"]. Use this for third-party API tokens.
  • Run parameters: passed via the scenario’s API call ({"params": {...}}) and available as dm.params (a dict).
import os
slack_token = os.environ["SLACK_BOT_TOKEN"] # workspace secret
env = dm.params.get("environment", "dev") # run-time param

What’s not there

To keep workers fast and isolated:

  • No shell. No subprocess.run() of arbitrary binaries (we block it at runtime).
  • No persistent filesystem outside /workspace.
  • No outbound network to private IPs unless you’ve configured a VPN connector (Enterprise plans).

Running locally

You can develop scenarios against your local Python:

Terminal window
pip install datamaker
export DM_API_KEY=your_api_key
python my_scenario.py

The same SDK works locally and in the worker. The only differences:

  • dm.params is empty unless you read CLI args yourself.
  • Workspace files are not mounted; use dm.workspace_file().download() if you need them locally.

For more, see API & SDKs → Python SDK.