Skip to content

Pull SAP regression data

Regression suites need data that matches the shape of real test cases — real-world distributions, real edge cases, real codes. Synthetic data is great for unit tests; for end-to-end regression you usually want to start from records that already exist in the system.

This is the workflow:

┌── filter SAP via $filter ──┐ ┌── save as named set ──┐ ┌── replay in CI ──┐
│ Country = DE │ → │ reg_bp_de_2026q2 │ → │ GET /sets/... │
│ BPRole = FLCU01 │ │ (25 rows, masked) │ │ │
│ CreatedOn last 90 days │ │ │ │ │
└────────────────────────────┘ └───────────────────────┘ └──────────────────┘

1. Configure the SAP OData connection

Once. See Connections → SAP OData.

2. Fetch with $filter

In a chat (Agent mode):

Pull 25 existing SAP Business Partners from the S/4 sandbox where Country = DE and BPRole = FLCU01 and CreatedOn within the last 90 days. Required fields: BusinessPartner, BusinessPartnerName, Country, Industry, TaxNumber1.

The agent calls fetch_sap_records_filtered with a generated $filter. Or in code:

from datamaker import DataMaker
dm = DataMaker()
sap = dm.connection("conn_s4_sandbox")
records = sap.fetch(
entity="A_BusinessPartner",
filter=(
"Country eq 'DE' "
"and any(BusinessPartnerRole/BusinessPartnerRole eq 'FLCU01') "
"and CreationDate ge 2026-01-26"
),
select=["BusinessPartner", "BusinessPartnerName", "Country", "Industry", "TaxNumber1"],
top=200,
)

We get more than 25 to start; the next step samples down.

3. Sample down

import random
sample = random.sample(records, k=25)

Or stratified — keep the country distribution proportional:

de = [r for r in records if r["Country"] == "DE"]
at = [r for r in records if r["Country"] == "AT"]
sample = random.sample(de, k=23) + random.sample(at, k=2)

4. Mask sensitive fields

Real records contain real PII. Mask them before persisting the set:

masked = dm.mask(
sample,
fields=["TaxNumber1", "BusinessPartnerName"],
strategy="format-preserve",
)

format-preserve preserves shape (so format validators downstream still pass) but substitutes the value through a deterministic hash. Same input → same fake. The real value is never stored.

See Templates → Sensitive fields for strategies.

5. Save the set

dm.save_set(name="reg_bp_de_2026q2", rows=masked)

Saved sets are project-scoped, named, and immutable (re-running with the same name creates a new version). Recall in any other scenario or chat:

prior = dm.load_set("reg_bp_de_2026q2")

6. Replay in CI

Your regression test suite reads the set instead of regenerating:

- name: Load regression dataset
run: |
curl -X GET https://api.datamaker.automators.com/sets/reg_bp_de_2026q2 \
-H "Authorization: Bearer ${{ secrets.DM_API_KEY }}" \
> tests/fixtures/regression_bp.json
- run: pnpm test:regression

Or POST it back into a sandbox SAP if your tests run end-to-end:

sap.post(entity="A_BusinessPartner", rows=dm.load_set("reg_bp_de_2026q2"))

When to refresh the set

Saved sets snapshot in time. Refresh when:

  • The shape of real-world data has shifted (new countries, new industry codes, new business roles).
  • A regression bug shows the set doesn’t trigger a path you care about.
  • Quarterly, as a routine — set a reminder, re-run the scenario, save with a new name.

The naming convention reg_<entity>_<scope>_<period> (e.g. reg_bp_de_2026q2) makes old sets obvious to retire.