Seed CI/CD pipelines
The goal: every pipeline run starts with a known, predictable, isolated dataset. No
shared staging, no seed.sql to maintain, no fixtures drifting from your schema.
This page walks through three patterns, in increasing order of sophistication.
Pattern 1: One template per run, downloaded as a file
The simplest approach. Useful for unit / integration tests that read from a JSON fixture.
- name: Fetch seed data run: | curl -X POST https://api.datamaker.automators.com/templates/$TEMPLATE_ID/generate \ -H "Authorization: Bearer ${{ secrets.DM_API_KEY }}" \ -H "Content-Type: application/json" \ -d '{"count": 200, "format": "json"}' \ > tests/fixtures/customers.json
- run: pnpm testPros: zero-state, no DB to clean up. Cons: tests can’t insert / update / delete.
Pattern 2: Scenario seeds an ephemeral DB
For integration tests that need a real database. Bring up a Postgres container, point DataMaker at it, run a scenario that seeds it.
services: postgres: image: postgres:16 env: POSTGRES_PASSWORD: test ports: [5432:5432]
steps: - name: Run schema migrations run: pnpm migrate
- name: Seed via DataMaker scenario run: | curl -X POST https://api.datamaker.automators.com/scenarios/$SCN_ID/run \ -H "Authorization: Bearer ${{ secrets.DM_API_KEY }}" \ -d "{\"params\": {\"db_url\": \"postgresql://postgres:test@localhost:5432/postgres\"}}"
- run: pnpm test:integrationThe scenario uses dm.params["db_url"] to construct an ad-hoc connection at runtime.
Each PR gets its own DB, so parallel jobs don’t collide.
Pattern 3: MCP server in the test job
For tests that themselves need to ask for data — e.g. a property-based test that generates inputs against your schema. Drop the DataMaker MCP server in the same container as your tests and let the test framework call it.
- name: Install DataMaker MCP run: pipx install datamaker-mcp
- name: Run property tests with MCP env: DM_API_KEY: ${{ secrets.DM_API_KEY }} DM_PROJECT_ID: proj_xxx run: pnpm test:propertyYour property-test framework can hold an MCP client open to the DataMaker server and stream new examples per test case. See MCP → Tools.
Best practices
Use a dedicated CI project
Create a project named ci (or regression) in DataMaker. Issue an API key scoped to
just that project. CI failures don’t pollute your dev project; quotas don’t compete.
Avoid mutating the same template across PRs
If two PRs touch the same template (e.g. both add a field), your pipeline becomes order-dependent. Have CI read the template by version:
curl ... /templates/$TEMPLATE_ID@v7/generate ...Pin the version in the repo so a template change is a deliberate code change.
Cache when you can
For large generations, cache the output by the template version + count + seed:
- name: Cache seed data uses: actions/cache@v4 with: path: tests/fixtures/customers.json key: dm-${{ env.TEMPLATE_VERSION }}-${{ env.SEED }}-200rowsA template change invalidates the cache automatically.
Idempotent inserts
If your scenario can re-run on retry, make inserts idempotent (ON CONFLICT DO NOTHING
or unique IDs from dm.counter()). See Scenarios → Logs & retries.
See also
- Scenarios for the orchestration model.
- Workflows → SAP regression for the SAP-specific case (fetch + replay, not generate-and-seed).