Seed CI/CD pipelines

The goal: every pipeline run starts with a known, predictable, isolated dataset. No shared staging, no seed.sql to maintain, no fixtures drifting from your schema.

This page walks through three patterns, in increasing order of sophistication.

Pattern 1: One template per run, downloaded as a file

The simplest approach. Useful for unit / integration tests that read from a JSON fixture.

- name: Fetch seed data
  run: |
    curl -X POST https://api.datamaker.automators.com/templates/$TEMPLATE_ID/generate \
      -H "Authorization: Bearer ${{ secrets.DM_API_KEY }}" \
      -H "Content-Type: application/json" \
      -d '{"count": 200, "format": "json"}' \
      > tests/fixtures/customers.json

- run: pnpm test

Pros: zero-state, no DB to clean up. Cons: tests can’t insert / update / delete.

Pattern 2: Scenario seeds an ephemeral DB

For integration tests that need a real database. Bring up a Postgres container, point DataMaker at it, run a scenario that seeds it.

services:
  postgres:
    image: postgres:16
    env:
      POSTGRES_PASSWORD: test
    ports: [5432:5432]

steps:
  - name: Run schema migrations
    run: pnpm migrate

  - name: Seed via DataMaker scenario
    run: |
      curl -X POST https://api.datamaker.automators.com/scenarios/$SCN_ID/run \
        -H "Authorization: Bearer ${{ secrets.DM_API_KEY }}" \
        -d "{\"params\": {\"db_url\": \"postgresql://postgres:test@localhost:5432/postgres\"}}"

  - run: pnpm test:integration

The scenario uses dm.params["db_url"] to construct an ad-hoc connection at runtime. Each PR gets its own DB, so parallel jobs don’t collide.

Pattern 3: MCP server in the test job

For tests that themselves need to ask for data — e.g. a property-based test that generates inputs against your schema. Drop the DataMaker MCP server in the same container as your tests and let the test framework call it.

- name: Install DataMaker MCP
  run: pipx install datamaker-mcp

- name: Run property tests with MCP
  env:
    DM_API_KEY:    ${{ secrets.DM_API_KEY }}
    DM_PROJECT_ID: proj_xxx
  run: pnpm test:property

Your property-test framework can hold an MCP client open to the DataMaker server and stream new examples per test case. See MCP → Tools.

Best practices

Use a dedicated CI project

Create a project named ci (or regression) in DataMaker. Issue an API key scoped to just that project. CI failures don’t pollute your dev project; quotas don’t compete.

Avoid mutating the same template across PRs

If two PRs touch the same template (e.g. both add a field), your pipeline becomes order-dependent. Have CI read the template by version:

curl ... /templates/$TEMPLATE_ID@v7/generate ...

Pin the version in the repo so a template change is a deliberate code change.

Cache when you can

For large generations, cache the output by the template version + count + seed:

- name: Cache seed data
  uses: actions/cache@v4
  with:
    path: tests/fixtures/customers.json
    key: dm-${{ env.TEMPLATE_VERSION }}-${{ env.SEED }}-200rows

A template change invalidates the cache automatically.

Idempotent inserts

If your scenario can re-run on retry, make inserts idempotent (ON CONFLICT DO NOTHING or unique IDs from dm.counter()). See Scenarios → Logs & retries.