Versioned Demo Dataset Playbook

Use this playbook to build a realistic, fake historical parcel dataset (for example: starts in 2000, then edits, splits, and merges over time).

Goal

Create one dedicated dataset that demonstrates:

  • initial historical load
  • normal attribute and geometry evolution
  • split events (one parcel becomes multiple)
  • merge events (multiple parcels become one)
  • clean timeline replay using State As Of map view

Use a deterministic timeline and replay events in chronological order through existing versioned workflows (upload, batch delete, edit).
Do not use direct SQL or manual row editing.

Why this is preferred:

  • exercises the same business paths as real usage
  • keeps valid_from / valid_to / record_status consistent
  • reproducible for demos, QA, and training

Timeline Design

Define a simple event list before loading any data.

Example timeline:

Step Effective At Event Description
1 2000-01-01T00:00 Initial load Create base parcels P-100 to P-130
2 2005-07-01T00:00 Geometry correction Replace geometry for P-105
3 2010-03-15T00:00 Split P-110 split into P-110A + P-110B
4 2016-06-01T00:00 Merge P-120 + P-121 merged into P-120M
5 2022-11-01T00:00 Attribute update Update names/status for selected parcels

Data Prep Conventions

  • Use one GeoPackage or ZIP per event (small, focused files are easier to verify).
  • Include at least: parcel_id, name, parcel_type, geometry.
  • Keep parcel_id stable for same logical parcel history.
  • Use new parcel_id values for split children / merge result parcels.

Replay Procedure (UI-First)

  1. Create a dedicated dataset (for example demo-versioned-history) and activate it.
  2. Initial load:
  3. Upload the base file in /parcels/upload/.
  4. Choose keep_existing.
  5. Set Effective At = 2000-01-01T00:00.
  6. Normal edit event (existing parcel IDs):
  7. Upload changed rows.
  8. Choose replace_matched.
  9. Set the event timestamp (for example 2005-07-01T00:00).
  10. Split event:
  11. Batch delete (void) parent parcel at split timestamp.
  12. Upload child parcels (P-110A, P-110B) with same timestamp.
  13. Use keep_existing for child creation file.
  14. Merge event:
  15. Batch delete (void) source parcels at merge timestamp.
  16. Upload merged parcel (P-120M) with same timestamp.
  17. Use keep_existing for merged parcel creation file.
  18. Repeat for later events.

Important timestamp rule:

  • The system uses [valid_from, valid_to) semantics.
  • If old row closes at T and new row starts at T, map ref_ts=T shows the new state.

Validation Checklist

  1. Run:
  2. ./.venv/bin/python manage.py verify_parcel_versioning
  3. Open /parcels/map/ and test State As Of on each event date.
  4. Open parcel detail pages and verify version history order.
  5. Confirm there is at most one active row per dataset + parcel_id.

Automation Command (Implemented)

Use the management command for repeatable seeding:

./.venv/bin/python manage.py seed_versioned_demo_data \
  --dataset-slug demo-versioned-history \
  --dataset-name "Demo Versioned History" \
  --owner-username demo-seeder \
  --reset --yes

What it does:

  • creates/updates target dataset
  • enforces safe reset with --reset --yes
  • replays a built-in fake historical timeline (2000+ with edits, split, merge, attribute updates)
  • uses version_parcel_edit and void_parcel_version for versioned transitions
  • validates dataset invariants after replay

Use a Custom Timeline JSON

./.venv/bin/python manage.py seed_versioned_demo_data \
  --dataset-slug demo-versioned-custom \
  --timeline-file tmp/demo_timeline.json \
  --reset --yes

Supported event types:

  • initial_load / create
  • edit
  • group_edit
  • void
  • split
  • merge

Minimal example:

{
  "events": [
    {
      "type": "create",
      "effective_at": "2000-01-01T00:00:00",
      "parcels": [
        {
          "parcel_id": "A-100",
          "name": "Alpha 100",
          "parcel_type": "residential",
          "bbox": [-62.85, 17.35, -62.84, 17.36]
        }
      ]
    },
    {
      "type": "edit",
      "effective_at": "2005-01-01T00:00:00",
      "parcel_id": "A-100",
      "payload": { "name": "Alpha 100 Updated" }
    }
  ]
}

Notes:

  • Timeline order must be chronological (effective_at non-decreasing).
  • Geometry can be provided as bbox, geometry_wkt, or geometry.
  • split and merge void parent parcels and create new child parcel rows at the same timestamp.

High-Volume Realistic Mode (GPKG + Auto History)

Use this when you want realistic geometry from OSM-derived parcels and rich synthetic history at scale.

Step 1: Generate realistic base parcels as GeoPackage.

python scripts/generate_realistic_parcels_gpkg.py \
  --place "Basseterre, Saint Kitts and Nevis" \
  --out tmp/demo_parcels.gpkg \
  --layer parcels_demo \
  --max-parcels 1200 \
  --seed 42

Step 2: Seed versioned history from that base GeoPackage.

./.venv/bin/python manage.py seed_versioned_demo_data2 \
  --base-gpkg tmp/demo_parcels.gpkg \
  --base-layer parcels_demo \
  --dataset-slug demo-versioned-history2 \
  --dataset-name "Demo Versioned History 2" \
  --owner-username demo-seeder2 \
  --start-effective-at 2000-01-01T00:00:00 \
  --period-years 3 \
  --period-count 8 \
  --attr-edit-rate 0.05 \
  --geometry-edit-rate 0.02 \
  --split-rate 0.006 \
  --merge-rate 0.003 \
  --void-rate 0.001 \
  --seed 42 \
  --reset --yes

This command:

  • loads base parcels as version-1 rows at the start timestamp
  • generates deterministic periodic edits, geometry updates, splits, merges, and void events
  • uses versioned write paths (version_parcel_edit, void_parcel_version)
  • runs dataset-level invariant checks at the end

Workflow-Oriented Demo Mode (seed_workflow_demo_data2)

Use this when you need realistic geometry plus parcel workflow examples for training reviewers/approvers and UI walkthroughs.

Command

./.venv/bin/python manage.py seed_workflow_demo_data2 <dataset_name> --count <N> [--source-file <PATH>] [--seed <INT>] [--mode service|fast-db] [--tx-chunk-size <N>]

For live progress output during long runs, add -v 3:

./.venv/bin/python manage.py seed_workflow_demo_data2 "Demo Workflow Training 500" \
  --count 500 \
  --source-file tmp/demo_parcels.gpkg \
  --seed 42 \
  -v 3

Fast direct-insert mode (one transaction):

./.venv/bin/python manage.py seed_workflow_demo_data2 "Demo Workflow Training Fast 500" \
  --count 500 \
  --source-file tmp/demo_parcels.gpkg \
  --seed 42 \
  --mode fast-db

Fast direct-insert mode with chunked commits (example: 500 parcels/transaction):

./.venv/bin/python manage.py seed_workflow_demo_data2 "Demo Workflow Training Fast 2000" \
  --count 2000 \
  --source-file tmp/demo_parcels.gpkg \
  --seed 42 \
  --mode fast-db \
  --tx-chunk-size 500 \
  -v 3

Examples:

./.venv/bin/python manage.py seed_workflow_demo_data2 "Demo Workflow Training 500" \
  --count 500 \
  --source-file tmp/demo_parcels.gpkg \
  --seed 42
./.venv/bin/python manage.py seed_workflow_demo_data2 "Demo Workflow Training 2000" \
  --count 2000 \
  --source-file tmp/demo_parcels.gpkg \
  --seed 42

Behavior

  • Creates a brand-new dataset (fails if dataset name already exists).
  • Loads realistic snapshot geometry from the source GeoPackage.
  • Creates base current official parcel versions.
  • In default --mode service, adds version history/workflow examples through existing versioning and ParcelDraft services.
  • In --mode fast-db, writes the same seeded shape through direct ORM bulk inserts/updates to reduce workflow-service overhead.
  • Prints one parse-friendly summary line with key counts.
  • With -v 3, prints incremental progress lines during base/history/workflow loops.

Fast Path (--mode fast-db)

Use --mode fast-db when throughput matters more than strict reuse of workflow service-layer code paths.

What fast path does:

  • Uses direct ORM bulk inserts/updates for history/workflow/overlay enrichment.
  • Keeps the same seeded output shape and summary counters contract.
  • Relies on database constraints for integrity enforcement.

What fast path intentionally avoids:

  • Per-step workflow service orchestration overhead.
  • Additional denied-transition audit writes after a failing step.

Failure semantics:

  • --mode fast-db without --tx-chunk-size: one all-or-nothing transaction.
  • --mode fast-db --tx-chunk-size N: each chunk commits independently; failure aborts the current chunk and no further writes are attempted in that chunk.

When to choose it:

  • Large training/demo seeds where runtime is more important than service-level parity.
  • Fresh disposable datasets that can be dropped/recreated.

Safety Rules

  • No append/merge into existing dataset names.
  • --mode service and --mode fast-db (without chunking) run as one all-or-nothing transaction (fatal errors rollback dataset + seeded rows).
  • --mode fast-db --tx-chunk-size N commits each chunk separately; a fatal error rolls back only the currently running chunk.
  • Only existing workflow transitions/roles are used.
  • Rejected/cancelled workflow samples include reasons for training context.

Effort Estimate

  • First pass (timeline files + first replay): ~4 to 8 hours
  • Automated command + tests: +0.5 to 1.5 days
  • Subsequent reruns: minutes