Pull SAP regression data
Regression suites need data that matches the shape of real test cases — real-world distributions, real edge cases, real codes. Synthetic data is great for unit tests; for end-to-end regression you usually want to start from records that already exist in the system.
This is the workflow:
┌── filter SAP via $filter ──┐ ┌── save as named set ──┐ ┌── replay in CI ──┐│ Country = DE │ → │ reg_bp_de_2026q2 │ → │ GET /sets/... ││ BPRole = FLCU01 │ │ (25 rows, masked) │ │ ││ CreatedOn last 90 days │ │ │ │ │└────────────────────────────┘ └───────────────────────┘ └──────────────────┘1. Configure the SAP OData connection
Once. See Connections → SAP OData.
2. Fetch with $filter
In a chat (Agent mode):
Pull 25 existing SAP Business Partners from the S/4 sandbox where Country = DE and BPRole = FLCU01 and CreatedOn within the last 90 days. Required fields: BusinessPartner, BusinessPartnerName, Country, Industry, TaxNumber1.
The agent calls fetch_sap_records_filtered with a generated $filter. Or in code:
from datamaker import DataMakerdm = DataMaker()sap = dm.connection("conn_s4_sandbox")
records = sap.fetch( entity="A_BusinessPartner", filter=( "Country eq 'DE' " "and any(BusinessPartnerRole/BusinessPartnerRole eq 'FLCU01') " "and CreationDate ge 2026-01-26" ), select=["BusinessPartner", "BusinessPartnerName", "Country", "Industry", "TaxNumber1"], top=200,)We get more than 25 to start; the next step samples down.
3. Sample down
import randomsample = random.sample(records, k=25)Or stratified — keep the country distribution proportional:
de = [r for r in records if r["Country"] == "DE"]at = [r for r in records if r["Country"] == "AT"]sample = random.sample(de, k=23) + random.sample(at, k=2)4. Mask sensitive fields
Real records contain real PII. Mask them before persisting the set:
masked = dm.mask( sample, fields=["TaxNumber1", "BusinessPartnerName"], strategy="format-preserve",)format-preserve preserves shape (so format validators downstream still pass) but
substitutes the value through a deterministic hash. Same input → same fake. The
real value is never stored.
See Templates → Sensitive fields for strategies.
5. Save the set
dm.save_set(name="reg_bp_de_2026q2", rows=masked)Saved sets are project-scoped, named, and immutable (re-running with the same name creates a new version). Recall in any other scenario or chat:
prior = dm.load_set("reg_bp_de_2026q2")6. Replay in CI
Your regression test suite reads the set instead of regenerating:
- name: Load regression dataset run: | curl -X GET https://api.datamaker.automators.com/sets/reg_bp_de_2026q2 \ -H "Authorization: Bearer ${{ secrets.DM_API_KEY }}" \ > tests/fixtures/regression_bp.json- run: pnpm test:regressionOr POST it back into a sandbox SAP if your tests run end-to-end:
sap.post(entity="A_BusinessPartner", rows=dm.load_set("reg_bp_de_2026q2"))When to refresh the set
Saved sets snapshot in time. Refresh when:
- The shape of real-world data has shifted (new countries, new industry codes, new business roles).
- A regression bug shows the set doesn’t trigger a path you care about.
- Quarterly, as a routine — set a reminder, re-run the scenario, save with a new name.
The naming convention reg_<entity>_<scope>_<period> (e.g. reg_bp_de_2026q2) makes
old sets obvious to retire.
Related
- Workflows → Mask PII / GDPR for masking standalone, outside of a regression flow.
- Connections → SAP OData for the connection setup.