Pull SAP regression data

Regression suites need data that matches the shape of real test cases: real-world distributions, real edge cases, real codes. Synthetic data is great for unit tests; for end-to-end regression you usually want to start from records that already exist in the system.

This is the workflow:

┌── filter SAP via $filter ──┐    ┌── mask sensitive fields ──┐    ┌── replay in CI ──┐
│ Country = DE               │ →  │  TaxNumber1, Email…       │ →  │ run the scenario  │
│ BPRole = FLCU01            │    │  (format-preserving)      │    │ → fixture / SAP   │
│ CreatedOn last 90 days     │    │                           │    │                   │
└────────────────────────────┘    └───────────────────────────┘    └───────────────────┘

1. Configure the SAP OData connection

Once. See Connections → SAP OData.

2. Fetch with `$filter`

In a chat (Agent mode):

Pull 25 existing SAP Business Partners from the S/4 sandbox where Country = DE and BPRole = FLCU01 and CreatedOn within the last 90 days. Required fields: BusinessPartner, BusinessPartnerName, Country, Industry, TaxNumber1.

The agent generates the $filter and runs its SAP fetch tool against the connection. Pull a few more rows than you need, then sample down (next step).

3. Sample down

Ask for the size and distribution you want, e.g. “sample down to 25, keeping the country mix proportional (≈23 DE, 2 AT).” The agent trims the set (and in the saved scenario it’s plain Python, e.g. random.sample).

4. Mask sensitive fields

Real records contain real PII, so mask it before persisting the set. Mark the sensitive columns (TaxNumber1, BusinessPartnerName, …) as sensitive fields: format-preserving masking keeps the shape so downstream validators still pass, substitutes the value deterministically (same input → same fake), and never stores the real value. Ask the agent to mask the PII columns as part of the flow.

5. Package it as a scenario

Save the fetch-and-mask steps as a scenario so the dataset is repeatable. From a chat, ask the agent to save this as a scenario once the run looks right (or Save as scenario on the finished run). The scenario is a Python script that re-runs the same flow on demand: fetch with $filter, mask, and emit the rows. See Scenarios.

6. Replay in CI

Trigger the scenario from CI and capture its output as a fixture, instead of hand-rolling test data:

- name: Generate the regression dataset
  run: |
    curl -X POST "$DM_API/scenarios/$SCENARIO_ID/run" \
      -H "Authorization: Bearer ${{ secrets.DM_API_KEY }}" \
      > tests/fixtures/regression_bp.json
- run: pnpm test:regression

(Use the live OpenAPI reference for the exact run/trigger endpoint.) If your tests run end-to-end, the scenario can also POST the masked rows straight back into a sandbox SAP entity as its final step.

When to refresh

Re-run the scenario whenever:

The shape of real-world data has shifted (new countries, new industry codes, new business roles).
A regression bug shows the set doesn’t trigger a path you care about.
Quarterly, as a routine. Set a reminder, re-run the scenario, save with a new name.

The naming convention reg_<entity>_<scope>_<period> (e.g. reg_bp_de_2026q2) makes old sets obvious to retire.

Workflows → Mask PII / GDPR for masking standalone, outside of a regression flow.
Connections → SAP OData for the connection setup.