UN 1:1 population synthesis QA cycle
Country-specific layer for synthetic people in households, dwellings, real building stock where available, hidden-population overlays, and work/school assignment evidence.
Board: synthestat-population-qa · Tenant: synthestat · Country: UN · Task workflow status: ready · Artifact completion: no_review_bundle
Ideal-country quality criteria: impossible 1:1 benchmark
This is the common gold-standard benchmark for an ideal country. It is intentionally impossible to fully satisfy: complete success would mean a 1:1 replica of the real population where every person, household, dwelling, attribute, and assignment is exactly represented. The QA page uses it as an asymptote and gap taxonomy, not as a release promise.
Apply this same rubric to this country’s latest run, then report which needs are measured, constrained, modelled, unavailable, or blocked.
| Need | Unachievable ideal | QA evidence we require instead | Why perfection cannot be achieved |
|---|---|---|---|
| Complete de jure resident coverage | Every real resident represented exactly once in the right country, municipality, small area, household, and dwelling. | Synthetic person count equals official population at all enforced geographies; no unexplained duplicate, missing, or out-of-universe people. | A true 1:1 resident list is a confidential population register and changes continuously; Synthestat can only match official aggregates and declared source universes. |
| Complete attribute truth | Each synthetic person has the same age, sex, household role, education, occupation, industry, origin, health proxy, income proxy, and lifecycle state as the corresponding real person. | Published marginal and cross-tab constraints pass within HARD/FIRM/SOFT tolerances; modelled fields carry uncertainty and measured/constrained/modelled provenance. | Official releases do not expose a complete individual joint distribution, and many attributes are survey-derived, lagged, suppressed, or unavailable at fine geography. |
| Perfect household and family structure | Every household contains the exact real members and relationships, including multi-generation, partnership, child, shared, institutional, and edge-case arrangements. | Household totals, household-type distributions, age/sex/role consistency, fertility/child constraints, and structural invariants pass with explicit residuals. | Household membership is sensitive microdata; public sources usually expose only aggregate household/family tables and partial cross-tabs. |
| Exact dwelling and building grounding | Every household is assigned to its real dwelling and building with exact occupancy, vacancy, dwelling type, floor area, tenure, and address-level geography. | Dwelling/building capacity checks pass; vacancy/second-home/institutional dwellings are represented or explicitly unavailable; building links have source provenance. | Many countries lack open address-level registers; dwelling occupancy is confidential and time-varying. |
| Complete de facto and hidden-population overlays | Homeless, undocumented, refugees, students away from home, seasonal, institutional, tourists, and daytime populations are all represented with exact location and timing. | Overlay layers use interval estimates, source-specific quality flags, and never silently modify de jure HARD constraints. | Hidden populations are partly unobserved by definition; ethical/privacy constraints forbid exact person-level labels. |
| Exact school, workplace, facility, and mobility assignment | Every person is assigned to the real school, workplace, care provider, commute, and daily activity chain they use. | Assignment layers use official registers/OD flows where available; modelled assignments are flagged and validated only against aggregate flows/capacities. | Operational assignments are usually protected registers or dynamic behavioural data; Phase 1 must not imply they are known. |
| Full joint-distribution realism | The full multivariate joint distribution is identical to reality across all attributes, households, geography, and rare subgroups. | High-priority marginals/cross-tabs pass; sparse zones and prior-dominated attributes are clearly marked with quality tiers and credible intervals. | The joint distribution is non-identifiable from published marginals; IPF/BN/hierarchical pooling choose plausible distributions, not truth. |
| Zero uncertainty and zero lag | All values are current today and known without error. | Every output records reference period, retrieval timestamp, lag, confidence, uncertainty bounds, and degradation decisions. | Official statistics are lagged, revised, sampled, suppressed, and harmonized after collection. |
| Privacy-safe yet maximally detailed release | The system releases maximum useful detail while creating zero re-identification risk. | Release mode, k-anonymity/cell safeguards, perturbation/aggregation policy, and sensitive-field treatment are explicit. | Fine-area synthetic microdata can still create structurally unique records; synthetic does not mean anonymous. |
| Perfect reproducibility and auditability | Any user can trace every output record to exact source snapshots, transformations, constraints, relaxations, seeds, and code versions. | Run manifests, source provenance, checksums, frozen extracts, seeds, versioned crosswalks, validation reports, and relaxation logs are complete. | This is approachable but never final: source portals, classifications, geography, and code keep changing, so audits must be continuously renewed. |
Population artifact output status (separate from task status)
| Artifact completion | Row count source | People | Target population | National coverage | Absolute shortfall | Households | Dwellings | Houses/buildings | Max marginal deviation | HARD status | Run |
|---|---|---|---|---|---|---|---|---|---|---|---|
| no_review_bundle | none | — | — | — | — | — | — | — | — | — | no_bundle_yet |
This table describes emitted population artifacts only. It is intentionally independent from the Kanban task workflow status below: a country can have all tasks done while its artifact is still only a seeded slice, or a passing review bundle can still lack national target completion. Deviation is the maximum absolute relative error across collected HARD/FIRM/SOFT marginal constraints in the latest review bundle. GUIDE/INFORMATIONAL priors are excluded. National target/coverage are read from build_manifest.json when available and override any visual impression of completion.
Kanban task workflow status (not artifact completion)
These cards count board tasks only. They do not certify that the country-level population artifact is nationally complete or reviewer-approved.
Datasets and distributions
Lists come from the latest run bundle: source_provenance.json, distribution_diagnostics.json, and build_manifest.json.
Summary
| Datasets used | 0 |
|---|---|
| Distributions available | 0 |
| Constraints/distributions used in synthesis | 0 |
| Constraint types | — |
| Dataset variants | — |
| Finest-geography status | — |
Source gaps
- No source gaps listed.
Datasets used
| Dataset/source ID |
|---|
| None listed yet. |
Best source by distribution family
| Distribution family | Dataset/source ID |
|---|---|
| None listed yet. | |
Available distributions / priors in registry
| Spec | Label | Type | Geo | Status | Variant | Confidence | Data URI |
|---|---|---|---|---|---|---|---|
| None listed yet. | |||||||
Constraints/distributions used in synthesis manifest
| Constraint or distribution ID |
|---|
| None listed yet. |
Current country tasks
| ID | Title | Assignee | Status | Created | Latest summary |
|---|---|---|---|---|---|
| t_f12df715 | Resolve GR/EL publication mapping for Greece population QA status | synth-modeler | ready | 2026-05-19 22:05:23 CEST | |
| t_1aacf742 | Add guardrail test: release_review cannot pass on tiny seeded population artifacts | synth-modeler | ready | 2026-05-19 22:05:22 CEST | |
| t_757abce4 | Fix release/status language so seeded population slices cannot surface as national PASS | synth-modeler | ready | 2026-05-19 22:05:21 CEST | |
| t_e0e72e3c | manager: supervise Italy/Greece/Baltic population-QA expansion wave | synth-manager | todo | 2026-05-19 20:17:46 CEST | |
| t_1aeadf3a | Reviewer sanity sweep: national-scale claims versus actual artifact row counts | synth-reviewer | done | 2026-05-19 22:00:13 CEST | Completed reviewer sanity sweep and wrote the country-level matrix to /home/synthestat/output/reviews/national_scale_claims_sanity_sweep_2026-05-19.md. Verdict is NEEDS_MODEL_FIX for release/status presentation: top-level release surfaces still expose seeded-slice population counts and DK/FI/NL/NO/S |
| t_49963fe5 | manager: immediate active control loop after memory-scale spawn request | synth-manager | done | 2026-05-19 21:59:28 CEST | Completed active population-QA manager control loop: checked population-QA plus synthestat infra board, verified country status by actual parquet row counts/latest-run artifacts, routed PASS countries to final internal notes, and built continuation chains around stale blocked parents for IT/NL/LU/MT |
| t_42457a02 | Re-review CZ repaired household-prior bundle | synth-reviewer | done | 2026-05-19 21:42:47 CEST | Re-reviewed CZ repaired household-prior bundle at /home/synthestat/output/runs/CZ/cz_population_fixed_household_priors_36c1d87e_seed420987. Verdict PASS for the declared repaired household-prior/internal-review scope: required bundle files are present, HARD persons and FIRM households are exact, hou |
| t_4afa8f5b | Fix CZ household type-size/dependent-child prior integration | synth-modeler | done | 2026-05-19 21:00:24 CEST | Built repaired CZ household-prior review bundle at /home/synthestat/output/runs/CZ/cz_population_fixed_household_priors_36c1d87e_seed420987. HARD persons/FIRM households pass exact, household size/exclusive leaf type/dependent-child residuals pass exact, composition and person-household size mismatc |
| t_5bc37015 | Re-review XK candidate after residual bucket diagnostics fix | synth-reviewer | done | 2026-05-19 20:55:51 CEST | Re-reviewed XK rebuilt national candidate after the residual-bucket diagnostics fix and returned PASS for the declared internal municipality-level C-tier national-core scope. Independent parquet/JSON checks verified required bundle presence, exact HARD person/household totals, +12 FIRM dwelling resi |
| t_bc746218 | Fix XK candidate unresolved residual bucket diagnostics inconsistency | synth-modeler | done | 2026-05-19 20:47:46 CEST | Fixed and rebuilt the XK national candidate residual bucket diagnostics: parquet readback, household_diagnostics, uncertainty_summary, national_candidate_summary, and model_notes now agree on 432 unresolved bucket rows / 21,510 persons using the emitted-row size-sum definition. Added a parquet-vs-JS |
| t_efd44c77 | Review fixed XK national candidate household realism and diagnostics | synth-reviewer | done | 2026-05-19 20:43:20 CEST | Reviewed the fixed XK national candidate and returned BLOCKED_INVALID_OUTPUT. The row-level household realism fixes are verified, but the bundle is internally inconsistent: diagnostics/uncertainty report 18,497 unresolved residual persons while actual unresolved bucket household rows sum to 21,510; |
| t_40e91e8d | Review LT cycle-4 household-family realism model-fix bundle | synth-reviewer | done | 2026-05-19 20:39:53 CEST | Reviewed LT cycle-4 bundle at /home/synthestat/output/runs/LT/lt_population_review_cycle4_888b7387_seed420987 and returned PASS for the declared internal research-review scope. Independent checks confirmed complete contract files, 2,810,761 persons / 1,215,360 households / 1,215,360 dwellings, exact |
| t_20be5c35 | Review CZ household/non-demographic prior integration | synth-reviewer | done | 2026-05-19 20:26:46 CEST | Reviewed CZ targeted-prior bundle and returned NEEDS_MODEL_FIX. National person/household counts are exact and unsupported fine joints/assignments remain flagged, but household type-size/dependent-child prior integration is invalid with large residuals and impossible household compositions; handoff |
| t_9465d714 | manager: reconcile country-level population QA status vs fixture outputs | synth-manager | done | 2026-05-19 20:12:25 CEST | Reconciled country-level population QA status against current persons.parquet counts and reviewer/task history for CZ, SE, NL, LT, LU, AD, PT, MT, SI, BG, and XK. Wrote the status matrix at /home/synthestat/workspace/manager_handoffs/population_qa_country_status/latest.md, appended manager_updates.m |
| t_67b5031e | Route LU seeded-slice PASS to final human review/delivery | synth-manager | done | 2026-05-19 20:11:06 CEST | Routed the LU cycle-2 PASS into a final human-facing delivery/readiness-note task without expanding scope. The downstream note task must preserve that the PASS applies only to the internal seeded two-zone LU slice and is not nationwide Luxembourg 1:1 production readiness or external-release approval |
| t_e09e08ce | manager: continuous board oversight and reaction loop | synth-manager | done | 2026-05-19 19:56:07 CEST | Completed a population-QA board supervisor pass: inspected active/done/blocked tasks, routed stalled NL/AD/LU/LT/SE continuations, created final PASS delivery-note tasks for XK/PT/MT/SI, and kept CZ pressure on the >8-person national synthesis blocker with a dependent re-review. Wrote the concise su |
Process
synth-manager creates and controls the country loop.
synth-modeler generates the review bundle: people, households, dwellings/buildings or unavailable markers, overlays, assignments, manifests, residuals, diagnostics, uncertainty, provenance.
synth-reviewer audits constraints, marginals, household/family realism, hidden populations, dwelling/building grounding, work/school assignment, uncertainty, provenance, and privacy.
PASS finalizes; NEEDS_MODEL_FIX routes back to modeler; NEEDS_MORE_SOURCES routes to marginal/distribution researchers then downloader; exhausted evidence/model plateau stops for human decision.
Quality gates and stop conditions
- PASS: satisfactory for declared country evidence tier and internal review mode.
- NEEDS_MODEL_FIX: model logic, bundle, uncertainty, household/dwelling/assignment issue.
- NEEDS_MORE_SOURCES: missing marginal or joint/conditional evidence; researchers then downloader.
- EVIDENCE_EXHAUSTED_HUMAN_REVIEW: source search cannot responsibly improve the output.
- MODEL_IMPROVEMENT_EXHAUSTED_HUMAN_REVIEW: modeler cannot materially improve or diagnostics plateau.