Skip to content

Seed CI/CD pipelines

The goal: every pipeline run starts with a known, predictable, isolated dataset. No shared staging, no seed.sql to maintain, no fixtures drifting from your schema.

This page walks through three patterns, in increasing order of sophistication.

Pattern 1: One template per run, downloaded as a file

The simplest approach. Useful for unit / integration tests that read from a JSON fixture.

.github/workflows/test.yml
- name: Fetch seed data
run: |
curl -X POST https://api.datamaker.automators.com/templates/$TEMPLATE_ID/generate \
-H "Authorization: Bearer ${{ secrets.DM_API_KEY }}" \
-H "Content-Type: application/json" \
-d '{"count": 200, "format": "json"}' \
> tests/fixtures/customers.json
- run: pnpm test

Pros: zero-state, no DB to clean up. Cons: tests can’t insert / update / delete.

Pattern 2: Scenario seeds an ephemeral DB

For integration tests that need a real database. Bring up a Postgres container, point DataMaker at it, run a scenario that seeds it.

services:
postgres:
image: postgres:16
env:
POSTGRES_PASSWORD: test
ports: [5432:5432]
steps:
- name: Run schema migrations
run: pnpm migrate
- name: Seed via DataMaker scenario
run: |
curl -X POST https://api.datamaker.automators.com/scenarios/$SCN_ID/run \
-H "Authorization: Bearer ${{ secrets.DM_API_KEY }}" \
-d "{\"params\": {\"db_url\": \"postgresql://postgres:test@localhost:5432/postgres\"}}"
- run: pnpm test:integration

The scenario uses dm.params["db_url"] to construct an ad-hoc connection at runtime. Each PR gets its own DB, so parallel jobs don’t collide.

Pattern 3: MCP server in the test job

For tests that themselves need to ask for data — e.g. a property-based test that generates inputs against your schema. Drop the DataMaker MCP server in the same container as your tests and let the test framework call it.

- name: Install DataMaker MCP
run: pipx install datamaker-mcp
- name: Run property tests with MCP
env:
DM_API_KEY: ${{ secrets.DM_API_KEY }}
DM_PROJECT_ID: proj_xxx
run: pnpm test:property

Your property-test framework can hold an MCP client open to the DataMaker server and stream new examples per test case. See MCP → Tools.

Best practices

Use a dedicated CI project

Create a project named ci (or regression) in DataMaker. Issue an API key scoped to just that project. CI failures don’t pollute your dev project; quotas don’t compete.

Avoid mutating the same template across PRs

If two PRs touch the same template (e.g. both add a field), your pipeline becomes order-dependent. Have CI read the template by version:

Terminal window
curl ... /templates/$TEMPLATE_ID@v7/generate ...

Pin the version in the repo so a template change is a deliberate code change.

Cache when you can

For large generations, cache the output by the template version + count + seed:

- name: Cache seed data
uses: actions/cache@v4
with:
path: tests/fixtures/customers.json
key: dm-${{ env.TEMPLATE_VERSION }}-${{ env.SEED }}-200rows

A template change invalidates the cache automatically.

Idempotent inserts

If your scenario can re-run on retry, make inserts idempotent (ON CONFLICT DO NOTHING or unique IDs from dm.counter()). See Scenarios → Logs & retries.

See also