Versioned Demo Dataset Playbook¶

Use this playbook to build a realistic, fake historical parcel dataset (for example: starts in 2000, then edits, splits, and merges over time).

Goal¶

Create one dedicated dataset that demonstrates:

initial historical load
normal attribute and geometry evolution
split events (one parcel becomes multiple)
merge events (multiple parcels become one)
clean timeline replay using State As Of map view

Recommended Approach: Event Replay¶

Use a deterministic timeline and replay events in chronological order through existing versioned workflows (upload, batch delete, edit).
Do not use direct SQL or manual row editing.

Why this is preferred:

exercises the same business paths as real usage
keeps valid_from / valid_to / record_status consistent
reproducible for demos, QA, and training

Timeline Design¶

Define a simple event list before loading any data.

Example timeline:

Step	Effective At	Event	Description
1	`2000-01-01T00:00`	Initial load	Create base parcels `P-100` to `P-130`
2	`2005-07-01T00:00`	Geometry correction	Replace geometry for `P-105`
3	`2010-03-15T00:00`	Split	`P-110` split into `P-110A` + `P-110B`
4	`2016-06-01T00:00`	Merge	`P-120` + `P-121` merged into `P-120M`
5	`2022-11-01T00:00`	Attribute update	Update names/status for selected parcels

Data Prep Conventions¶

Use one GeoPackage or ZIP per event (small, focused files are easier to verify).
Include at least: parcel_id, name, parcel_type, geometry.
Keep parcel_id stable for same logical parcel history.
Use new parcel_id values for split children / merge result parcels.

Replay Procedure (UI-First)¶

Create a dedicated dataset (for example demo-versioned-history) and activate it.
Initial load:
Upload the base file in /parcels/upload/.
Choose keep_existing.
Set Effective At = 2000-01-01T00:00.
Normal edit event (existing parcel IDs):
Upload changed rows.
Choose replace_matched.
Set the event timestamp (for example 2005-07-01T00:00).
Split event:
Batch delete (void) parent parcel at split timestamp.
Upload child parcels (P-110A, P-110B) with same timestamp.
Use keep_existing for child creation file.
Merge event:
Batch delete (void) source parcels at merge timestamp.
Upload merged parcel (P-120M) with same timestamp.
Use keep_existing for merged parcel creation file.
Repeat for later events.

Important timestamp rule:

The system uses [valid_from, valid_to) semantics.
If old row closes at T and new row starts at T, map ref_ts=T shows the new state.

Validation Checklist¶

Run:
./.venv/bin/python manage.py verify_parcel_versioning
Open /parcels/map/ and test State As Of on each event date.
Open parcel detail pages and verify version history order.
Confirm there is at most one active row per dataset + parcel_id.

Automation Command (Implemented)¶

Use the management command for repeatable seeding:

./.venv/bin/python manage.py seed_versioned_demo_data \
  --dataset-slug demo-versioned-history \
  --dataset-name "Demo Versioned History" \
  --owner-username demo-seeder \
  --reset --yes

What it does:

creates/updates target dataset
enforces safe reset with --reset --yes
replays a built-in fake historical timeline (2000+ with edits, split, merge, attribute updates)
uses version_parcel_edit and void_parcel_version for versioned transitions
validates dataset invariants after replay

Use a Custom Timeline JSON¶

./.venv/bin/python manage.py seed_versioned_demo_data \
  --dataset-slug demo-versioned-custom \
  --timeline-file tmp/demo_timeline.json \
  --reset --yes

Supported event types:

initial_load / create
edit
group_edit
void
split
merge

Minimal example:

{
  "events": [
    {
      "type": "create",
      "effective_at": "2000-01-01T00:00:00",
      "parcels": [
        {
          "parcel_id": "A-100",
          "name": "Alpha 100",
          "parcel_type": "residential",
          "bbox": [-62.85, 17.35, -62.84, 17.36]
        }
      ]
    },
    {
      "type": "edit",
      "effective_at": "2005-01-01T00:00:00",
      "parcel_id": "A-100",
      "payload": { "name": "Alpha 100 Updated" }
    }
  ]
}

Notes:

Timeline order must be chronological (effective_at non-decreasing).
Geometry can be provided as bbox, geometry_wkt, or geometry.
split and merge void parent parcels and create new child parcel rows at the same timestamp.

High-Volume Realistic Mode (GPKG + Auto History)¶

Use this when you want realistic geometry from OSM-derived parcels and rich synthetic history at scale.

Step 1: Generate realistic base parcels as GeoPackage.

python scripts/generate_realistic_parcels_gpkg.py \
  --place "Basseterre, Saint Kitts and Nevis" \
  --out tmp/demo_parcels.gpkg \
  --layer parcels_demo \
  --max-parcels 1200 \
  --seed 42

Step 2: Seed versioned history from that base GeoPackage.

./.venv/bin/python manage.py seed_versioned_demo_data2 \
  --base-gpkg tmp/demo_parcels.gpkg \
  --base-layer parcels_demo \
  --dataset-slug demo-versioned-history2 \
  --dataset-name "Demo Versioned History 2" \
  --owner-username demo-seeder2 \
  --start-effective-at 2000-01-01T00:00:00 \
  --period-years 3 \
  --period-count 8 \
  --attr-edit-rate 0.05 \
  --geometry-edit-rate 0.02 \
  --split-rate 0.006 \
  --merge-rate 0.003 \
  --void-rate 0.001 \
  --seed 42 \
  --reset --yes

This command:

loads base parcels as version-1 rows at the start timestamp
generates deterministic periodic edits, geometry updates, splits, merges, and void events
uses versioned write paths (version_parcel_edit, void_parcel_version)
runs dataset-level invariant checks at the end

Workflow-Oriented Demo Mode (`seed_workflow_demo_data2`)¶

Use this when you need realistic geometry plus parcel workflow examples for training reviewers/approvers and UI walkthroughs.

Command¶

./.venv/bin/python manage.py seed_workflow_demo_data2 <dataset_name> --count <N> [--source-file <PATH>] [--seed <INT>] [--mode service|fast-db] [--tx-chunk-size <N>]

For live progress output during long runs, add -v 3:

./.venv/bin/python manage.py seed_workflow_demo_data2 "Demo Workflow Training 500" \
  --count 500 \
  --source-file tmp/demo_parcels.gpkg \
  --seed 42 \
  -v 3

Fast direct-insert mode (one transaction):

./.venv/bin/python manage.py seed_workflow_demo_data2 "Demo Workflow Training Fast 500" \
  --count 500 \
  --source-file tmp/demo_parcels.gpkg \
  --seed 42 \
  --mode fast-db

Fast direct-insert mode with chunked commits (example: 500 parcels/transaction):

./.venv/bin/python manage.py seed_workflow_demo_data2 "Demo Workflow Training Fast 2000" \
  --count 2000 \
  --source-file tmp/demo_parcels.gpkg \
  --seed 42 \
  --mode fast-db \
  --tx-chunk-size 500 \
  -v 3

Examples:

./.venv/bin/python manage.py seed_workflow_demo_data2 "Demo Workflow Training 500" \
  --count 500 \
  --source-file tmp/demo_parcels.gpkg \
  --seed 42

./.venv/bin/python manage.py seed_workflow_demo_data2 "Demo Workflow Training 2000" \
  --count 2000 \
  --source-file tmp/demo_parcels.gpkg \
  --seed 42

Behavior¶

Creates a brand-new dataset (fails if dataset name already exists).
Loads realistic snapshot geometry from the source GeoPackage.
Creates base current official parcel versions.
In default --mode service, adds version history/workflow examples through existing versioning and ParcelDraft services.
In --mode fast-db, writes the same seeded shape through direct ORM bulk inserts/updates to reduce workflow-service overhead.
Prints one parse-friendly summary line with key counts.
With -v 3, prints incremental progress lines during base/history/workflow loops.

Fast Path (`--mode fast-db`)¶

Use --mode fast-db when throughput matters more than strict reuse of workflow service-layer code paths.

What fast path does:

Uses direct ORM bulk inserts/updates for history/workflow/overlay enrichment.
Keeps the same seeded output shape and summary counters contract.
Relies on database constraints for integrity enforcement.

What fast path intentionally avoids:

Per-step workflow service orchestration overhead.
Additional denied-transition audit writes after a failing step.

Failure semantics:

--mode fast-db without --tx-chunk-size: one all-or-nothing transaction.
--mode fast-db --tx-chunk-size N: each chunk commits independently; failure aborts the current chunk and no further writes are attempted in that chunk.

When to choose it:

Large training/demo seeds where runtime is more important than service-level parity.
Fresh disposable datasets that can be dropped/recreated.

Safety Rules¶

No append/merge into existing dataset names.
--mode service and --mode fast-db (without chunking) run as one all-or-nothing transaction (fatal errors rollback dataset + seeded rows).
--mode fast-db --tx-chunk-size N commits each chunk separately; a fatal error rolls back only the currently running chunk.
Only existing workflow transitions/roles are used.
Rejected/cancelled workflow samples include reasons for training context.

Effort Estimate¶

First pass (timeline files + first replay): ~4 to 8 hours
Automated command + tests: +0.5 to 1.5 days
Subsequent reruns: minutes

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search