Versioned Demo Dataset Playbook¶
Use this playbook to build a realistic, fake historical parcel dataset (for example: starts in 2000, then edits, splits, and merges over time).
Goal¶
Create one dedicated dataset that demonstrates:
- initial historical load
- normal attribute and geometry evolution
- split events (one parcel becomes multiple)
- merge events (multiple parcels become one)
- clean timeline replay using
State As Ofmap view
Recommended Approach: Event Replay¶
Use a deterministic timeline and replay events in chronological order through existing versioned workflows (upload, batch delete, edit).
Do not use direct SQL or manual row editing.
Why this is preferred:
- exercises the same business paths as real usage
- keeps
valid_from/valid_to/record_statusconsistent - reproducible for demos, QA, and training
Timeline Design¶
Define a simple event list before loading any data.
Example timeline:
| Step | Effective At | Event | Description |
|---|---|---|---|
| 1 | 2000-01-01T00:00 |
Initial load | Create base parcels P-100 to P-130 |
| 2 | 2005-07-01T00:00 |
Geometry correction | Replace geometry for P-105 |
| 3 | 2010-03-15T00:00 |
Split | P-110 split into P-110A + P-110B |
| 4 | 2016-06-01T00:00 |
Merge | P-120 + P-121 merged into P-120M |
| 5 | 2022-11-01T00:00 |
Attribute update | Update names/status for selected parcels |
Data Prep Conventions¶
- Use one GeoPackage or ZIP per event (small, focused files are easier to verify).
- Include at least:
parcel_id,name,parcel_type,geometry. - Keep
parcel_idstable for same logical parcel history. - Use new
parcel_idvalues for split children / merge result parcels.
Replay Procedure (UI-First)¶
- Create a dedicated dataset (for example
demo-versioned-history) and activate it. - Initial load:
- Upload the base file in
/parcels/upload/. - Choose
keep_existing. - Set
Effective At = 2000-01-01T00:00. - Normal edit event (existing parcel IDs):
- Upload changed rows.
- Choose
replace_matched. - Set the event timestamp (for example
2005-07-01T00:00). - Split event:
- Batch delete (void) parent parcel at split timestamp.
- Upload child parcels (
P-110A,P-110B) with same timestamp. - Use
keep_existingfor child creation file. - Merge event:
- Batch delete (void) source parcels at merge timestamp.
- Upload merged parcel (
P-120M) with same timestamp. - Use
keep_existingfor merged parcel creation file. - Repeat for later events.
Important timestamp rule:
- The system uses
[valid_from, valid_to)semantics. - If old row closes at
Tand new row starts atT, mapref_ts=Tshows the new state.
Validation Checklist¶
- Run:
./.venv/bin/python manage.py verify_parcel_versioning- Open
/parcels/map/and testState As Ofon each event date. - Open parcel detail pages and verify version history order.
- Confirm there is at most one active row per
dataset + parcel_id.
Automation Command (Implemented)¶
Use the management command for repeatable seeding:
./.venv/bin/python manage.py seed_versioned_demo_data \
--dataset-slug demo-versioned-history \
--dataset-name "Demo Versioned History" \
--owner-username demo-seeder \
--reset --yes
What it does:
- creates/updates target dataset
- enforces safe reset with
--reset --yes - replays a built-in fake historical timeline (2000+ with edits, split, merge, attribute updates)
- uses
version_parcel_editandvoid_parcel_versionfor versioned transitions - validates dataset invariants after replay
Use a Custom Timeline JSON¶
./.venv/bin/python manage.py seed_versioned_demo_data \
--dataset-slug demo-versioned-custom \
--timeline-file tmp/demo_timeline.json \
--reset --yes
Supported event types:
initial_load/createeditgroup_editvoidsplitmerge
Minimal example:
{
"events": [
{
"type": "create",
"effective_at": "2000-01-01T00:00:00",
"parcels": [
{
"parcel_id": "A-100",
"name": "Alpha 100",
"parcel_type": "residential",
"bbox": [-62.85, 17.35, -62.84, 17.36]
}
]
},
{
"type": "edit",
"effective_at": "2005-01-01T00:00:00",
"parcel_id": "A-100",
"payload": { "name": "Alpha 100 Updated" }
}
]
}
Notes:
- Timeline order must be chronological (
effective_atnon-decreasing). - Geometry can be provided as
bbox,geometry_wkt, orgeometry. splitandmergevoid parent parcels and create new child parcel rows at the same timestamp.
High-Volume Realistic Mode (GPKG + Auto History)¶
Use this when you want realistic geometry from OSM-derived parcels and rich synthetic history at scale.
Step 1: Generate realistic base parcels as GeoPackage.
python scripts/generate_realistic_parcels_gpkg.py \
--place "Basseterre, Saint Kitts and Nevis" \
--out tmp/demo_parcels.gpkg \
--layer parcels_demo \
--max-parcels 1200 \
--seed 42
Step 2: Seed versioned history from that base GeoPackage.
./.venv/bin/python manage.py seed_versioned_demo_data2 \
--base-gpkg tmp/demo_parcels.gpkg \
--base-layer parcels_demo \
--dataset-slug demo-versioned-history2 \
--dataset-name "Demo Versioned History 2" \
--owner-username demo-seeder2 \
--start-effective-at 2000-01-01T00:00:00 \
--period-years 3 \
--period-count 8 \
--attr-edit-rate 0.05 \
--geometry-edit-rate 0.02 \
--split-rate 0.006 \
--merge-rate 0.003 \
--void-rate 0.001 \
--seed 42 \
--reset --yes
This command:
- loads base parcels as version-1 rows at the start timestamp
- generates deterministic periodic edits, geometry updates, splits, merges, and void events
- uses versioned write paths (
version_parcel_edit,void_parcel_version) - runs dataset-level invariant checks at the end
Workflow-Oriented Demo Mode (seed_workflow_demo_data2)¶
Use this when you need realistic geometry plus parcel workflow examples for training reviewers/approvers and UI walkthroughs.
Command¶
./.venv/bin/python manage.py seed_workflow_demo_data2 <dataset_name> --count <N> [--source-file <PATH>] [--seed <INT>] [--mode service|fast-db] [--tx-chunk-size <N>]
For live progress output during long runs, add -v 3:
./.venv/bin/python manage.py seed_workflow_demo_data2 "Demo Workflow Training 500" \
--count 500 \
--source-file tmp/demo_parcels.gpkg \
--seed 42 \
-v 3
Fast direct-insert mode (one transaction):
./.venv/bin/python manage.py seed_workflow_demo_data2 "Demo Workflow Training Fast 500" \
--count 500 \
--source-file tmp/demo_parcels.gpkg \
--seed 42 \
--mode fast-db
Fast direct-insert mode with chunked commits (example: 500 parcels/transaction):
./.venv/bin/python manage.py seed_workflow_demo_data2 "Demo Workflow Training Fast 2000" \
--count 2000 \
--source-file tmp/demo_parcels.gpkg \
--seed 42 \
--mode fast-db \
--tx-chunk-size 500 \
-v 3
Examples:
./.venv/bin/python manage.py seed_workflow_demo_data2 "Demo Workflow Training 500" \
--count 500 \
--source-file tmp/demo_parcels.gpkg \
--seed 42
./.venv/bin/python manage.py seed_workflow_demo_data2 "Demo Workflow Training 2000" \
--count 2000 \
--source-file tmp/demo_parcels.gpkg \
--seed 42
Behavior¶
- Creates a brand-new dataset (fails if dataset name already exists).
- Loads realistic snapshot geometry from the source GeoPackage.
- Creates base current official parcel versions.
- In default
--mode service, adds version history/workflow examples through existing versioning and ParcelDraft services. - In
--mode fast-db, writes the same seeded shape through direct ORM bulk inserts/updates to reduce workflow-service overhead. - Prints one parse-friendly summary line with key counts.
- With
-v 3, prints incremental progress lines during base/history/workflow loops.
Fast Path (--mode fast-db)¶
Use --mode fast-db when throughput matters more than strict reuse of workflow
service-layer code paths.
What fast path does:
- Uses direct ORM bulk inserts/updates for history/workflow/overlay enrichment.
- Keeps the same seeded output shape and summary counters contract.
- Relies on database constraints for integrity enforcement.
What fast path intentionally avoids:
- Per-step workflow service orchestration overhead.
- Additional denied-transition audit writes after a failing step.
Failure semantics:
--mode fast-dbwithout--tx-chunk-size: one all-or-nothing transaction.--mode fast-db --tx-chunk-size N: each chunk commits independently; failure aborts the current chunk and no further writes are attempted in that chunk.
When to choose it:
- Large training/demo seeds where runtime is more important than service-level parity.
- Fresh disposable datasets that can be dropped/recreated.
Safety Rules¶
- No append/merge into existing dataset names.
--mode serviceand--mode fast-db(without chunking) run as one all-or-nothing transaction (fatal errors rollback dataset + seeded rows).--mode fast-db --tx-chunk-size Ncommits each chunk separately; a fatal error rolls back only the currently running chunk.- Only existing workflow transitions/roles are used.
- Rejected/cancelled workflow samples include reasons for training context.
Effort Estimate¶
- First pass (timeline files + first replay): ~4 to 8 hours
- Automated command + tests: +0.5 to 1.5 days
- Subsequent reruns: minutes