SE 1:1 population synthesis QA cycle
Country-specific layer for synthetic people in households, dwellings, real building stock where available, hidden-population overlays, and work/school assignment evidence.
Board: synthestat-population-qa · Tenant: synthestat · Country: SE · Overall status: running
Ideal-country quality criteria: impossible 1:1 benchmark
This is the common gold-standard benchmark for an ideal country. It is intentionally impossible to fully satisfy: complete success would mean a 1:1 replica of the real population where every person, household, dwelling, attribute, and assignment is exactly represented. The QA page uses it as an asymptote and gap taxonomy, not as a release promise.
Apply this same rubric to this country’s latest run, then report which needs are measured, constrained, modelled, unavailable, or blocked.
| Need | Unachievable ideal | QA evidence we require instead | Why perfection cannot be achieved |
|---|---|---|---|
| Complete de jure resident coverage | Every real resident represented exactly once in the right country, municipality, small area, household, and dwelling. | Synthetic person count equals official population at all enforced geographies; no unexplained duplicate, missing, or out-of-universe people. | A true 1:1 resident list is a confidential population register and changes continuously; Synthestat can only match official aggregates and declared source universes. |
| Complete attribute truth | Each synthetic person has the same age, sex, household role, education, occupation, industry, origin, health proxy, income proxy, and lifecycle state as the corresponding real person. | Published marginal and cross-tab constraints pass within HARD/FIRM/SOFT tolerances; modelled fields carry uncertainty and measured/constrained/modelled provenance. | Official releases do not expose a complete individual joint distribution, and many attributes are survey-derived, lagged, suppressed, or unavailable at fine geography. |
| Perfect household and family structure | Every household contains the exact real members and relationships, including multi-generation, partnership, child, shared, institutional, and edge-case arrangements. | Household totals, household-type distributions, age/sex/role consistency, fertility/child constraints, and structural invariants pass with explicit residuals. | Household membership is sensitive microdata; public sources usually expose only aggregate household/family tables and partial cross-tabs. |
| Exact dwelling and building grounding | Every household is assigned to its real dwelling and building with exact occupancy, vacancy, dwelling type, floor area, tenure, and address-level geography. | Dwelling/building capacity checks pass; vacancy/second-home/institutional dwellings are represented or explicitly unavailable; building links have source provenance. | Many countries lack open address-level registers; dwelling occupancy is confidential and time-varying. |
| Complete de facto and hidden-population overlays | Homeless, undocumented, refugees, students away from home, seasonal, institutional, tourists, and daytime populations are all represented with exact location and timing. | Overlay layers use interval estimates, source-specific quality flags, and never silently modify de jure HARD constraints. | Hidden populations are partly unobserved by definition; ethical/privacy constraints forbid exact person-level labels. |
| Exact school, workplace, facility, and mobility assignment | Every person is assigned to the real school, workplace, care provider, commute, and daily activity chain they use. | Assignment layers use official registers/OD flows where available; modelled assignments are flagged and validated only against aggregate flows/capacities. | Operational assignments are usually protected registers or dynamic behavioural data; Phase 1 must not imply they are known. |
| Full joint-distribution realism | The full multivariate joint distribution is identical to reality across all attributes, households, geography, and rare subgroups. | High-priority marginals/cross-tabs pass; sparse zones and prior-dominated attributes are clearly marked with quality tiers and credible intervals. | The joint distribution is non-identifiable from published marginals; IPF/BN/hierarchical pooling choose plausible distributions, not truth. |
| Zero uncertainty and zero lag | All values are current today and known without error. | Every output records reference period, retrieval timestamp, lag, confidence, uncertainty bounds, and degradation decisions. | Official statistics are lagged, revised, sampled, suppressed, and harmonized after collection. |
| Privacy-safe yet maximally detailed release | The system releases maximum useful detail while creating zero re-identification risk. | Release mode, k-anonymity/cell safeguards, perturbation/aggregation policy, and sensitive-field treatment are explicit. | Fine-area synthetic microdata can still create structurally unique records; synthetic does not mean anonymous. |
| Perfect reproducibility and auditability | Any user can trace every output record to exact source snapshots, transformations, constraints, relaxations, seeds, and code versions. | Run manifests, source provenance, checksums, frozen extracts, seeds, versioned crosswalks, validation reports, and relaxation logs are complete. | This is approachable but never final: source portals, classifications, geography, and code keep changing, so audits must be continuously renewed. |
Population output status
| People | Target population | National coverage | Absolute shortfall | Households | Dwellings | Houses/buildings | Max marginal deviation | HARD status | Run |
|---|---|---|---|---|---|---|---|---|---|
| 91,030 | — | — | — | 43,739 | 43,739 | — | 0.00% | pass_exact | se_population_review_cycle2_3a9d999a_seed420987 |
Deviation is the maximum absolute relative error across collected HARD/FIRM/SOFT marginal constraints in the latest review bundle. GUIDE/INFORMATIONAL priors are excluded. National target/coverage are read from build_manifest.json when available and override any visual impression of completion.
Datasets and distributions
Lists come from the latest run bundle: source_provenance.json, distribution_diagnostics.json, and build_manifest.json.
Summary
| Datasets used | 0 |
|---|---|
| Distributions available | 0 |
| Constraints/distributions used in synthesis | 3 |
| Constraint types | — |
| Dataset variants | — |
| Finest-geography status | — |
Source gaps
- No source gaps listed.
Datasets used
| Dataset/source ID |
|---|
| None listed yet. |
Best source by distribution family
| Distribution family | Dataset/source ID |
|---|---|
| None listed yet. | |
Available distributions / priors in registry
| Spec | Label | Type | Geo | Status | Variant | Confidence | Data URI |
|---|---|---|---|---|---|---|---|
| None listed yet. | |||||||
Constraints/distributions used in synthesis manifest
| Constraint or distribution ID |
|---|
D01_population_age_sex_deso_HARD_exact_for_selected_zones |
national_household_size_prior_FIRM_MODELLED |
DeSO/municipality/national fallback metadata per row |
Current country tasks
| ID | Title | Assignee | Status | Created | Latest summary |
|---|---|---|---|---|---|
| t_1bbf9f63 | SE population QA cycle 2 downloader: freeze/catalogue SCB, household-prior, geography, anchor sources | synth-downloader | running | 2026-05-19 20:00:15 CEST | Froze/catalogued the SE population QA cycle-2 source bundle with a 33-record manifest, CSV index, downloader handoff, and latest snapshot. Complete model-ready assets include the reused full SCB P0 mirror plus newly downloaded/checksummed SCB DeSO/RegSO geodata; absent/licensed/proxy/hidden-overlay |
| t_f8b6f296 | SE cycle 2 review after other_synthesis-informed rerun | synth-reviewer | running | 2026-05-19 20:00:09 CEST | Reviewed SE cycle-2 bundle and returned BLOCKED_INVALID_OUTPUT. It is non-toy and contract-complete, but invalid: claimed SCB HARD age-sex residuals fail independent source comparison, household formation creates 11,526 children-only households, zone degradation flags contradict unavailable building |
| t_dd8de9c4 | SE cycle 2 invalid-output remediation: SCB target residuals, household realism, degraded-zone flags | synth-modeler | todo | 2026-05-19 20:54:39 CEST | |
| t_66bdf062 | SE population QA cycle 2 reviewer: inspect non-toy/source-upgraded bundle | synth-reviewer | todo | 2026-05-19 20:00:51 CEST | |
| t_29a0c9c4 | SE population QA cycle 2 modeler: build non-toy review bundle from frozen sources and local methodology | synth-modeler | todo | 2026-05-19 20:00:33 CEST | |
| t_f9075c12 | SE population QA cycle 2 distribution research: household/family/assignment/hidden priors | synth-distributions-researcher | done | 2026-05-19 19:59:57 CEST | Completed Sweden cycle-2 distribution prior research and wrote the requested SE population_distribution_priors handoff plus mirrored distribution findings/extraction/latest files. Key result: the claimed local sweden_household_composition bundle is absent and must be rebuilt/mirrored; SCB household/ |
| t_ff30afb9 | SE population QA cycle 2 source research: SCB DeSO/household/building/hidden marginal targets | synth-marginals-researcher | done | 2026-05-19 19:59:41 CEST | Completed SE population marginal/source research and wrote downloader-ready source handoff at /home/synthestat/workspace/manager_handoffs/researcher/2026-05-19_2014_SE_population_marginals_sources.md. The P0 SCB marginal/geography backbone is mostly downloadable now, while Lantmäteriet anchors, stal |
| t_82562c50 | SE cycle 2 non-toy model rerun after other_synthesis source freeze | synth-modeler | done | 2026-05-19 19:59:38 CEST | Built the SE cycle-2 non-toy population review bundle at /home/synthestat/output/runs/SE/se_population_review_cycle2_3a9d999a_seed420987. The bundle is an explicitly labelled stratified multi-DeSO candidate (42 DeSO zones, 91,030 persons, 43,739 households) with exact PASS residuals for frozen SCB 2 |
| t_e666395d | SE cycle 2 household/family/assignment distribution source closure | synth-distributions-researcher | done | 2026-05-19 19:58:48 CEST | Completed SE cycle-2 distribution source closure and wrote findings/extraction/latest handoffs under /home/synthestat/workspace/manager_handoffs/distributions. The handoff identifies SCB household/family, LE0102 child/family graph, YREG labour/occupation, income, origin/background and assignment-pri |
| t_fd61cf13 | SE cycle 2 marginal and hidden-population source closure | synth-marginals-researcher | done | 2026-05-19 19:58:47 CEST | Completed SE cycle-2 marginal and hidden-population source closure. Wrote sources/needs/latest handoffs under /home/synthestat/workspace/manager_handoffs/marginals/, verified SCB DeSO/RegSO core controls plus official hidden-overlay source families, detected the existing 20/20 SE P0 downloader freez |
| t_1878ab53 | SE cycle 2 source freeze from other_synthesis ingest | synth-downloader | done | 2026-05-19 19:58:30 CEST | Froze the SE cycle-2 P0 source bundle from the other_synthesis ingest: 20/20 requested records are frozen with checksums, retrieval timestamps, source IDs, table/API metadata, and official/inherited/synthetic labels. SCB public table payloads are ready for modeller; exact remaining gap is official r |
| t_20de640c | SE cycle 1b: ingest user-provided other_synthesis methodology before rerun | synth-modeler | done | 2026-05-19 16:11:56 CEST | Ingested the now-readable Sweden /home/synthestat/other_synthesis methodology and wrote the SE modeller handoff. The handoff extracts SCB/DeSO sources, hierarchical fitting/fallback logic, household/home/building/anchor modelling approach, hidden-population gaps, next model changes, and downloader/r |
| t_33ff07f7 | SE population synthesis QA cycle 1: review bundle | synth-reviewer | done | 2026-05-19 15:56:38 CEST | Reviewed SE cycle-1 population bundle at /home/synthestat/output/runs/SE/se_population_review_cycle1_a5ad12d7_seed420987. Verdict NEEDS_MORE_SOURCES: bundle is contract-complete and HARD/FIRM residuals pass in the seeded slice, but it is only an 8-person/two-test-zone fixture with hidden overlays, w |
| t_987d9d2d | SE population synthesis QA cycle 1: build review bundle | synth-modeler | done | 2026-05-19 15:56:20 CEST | Built and verified the SE population QA cycle 1 review bundle at /home/synthestat/output/runs/SE/se_population_review_cycle1_a5ad12d7_seed420987. The bundle is contract-complete for internal review, with persons/households/dwellings/building assignments present and hidden-population plus work/school |
| t_0c611b3b | orchestrate SE population synthesis QA loop | synth-manager | done | 2026-05-19 15:55:07 CEST | Branched SE population QA after local other_synthesis ingestion: created parallel marginals/distributions research tasks, a dependent downloader freeze task, then dependent cycle-2 modeler and reviewer tasks. Updated /home/synthestat/workspace/manager_handoffs/manager_updates.md with bottleneck summ |
Process
synth-manager creates and controls the country loop.
synth-modeler generates the review bundle: people, households, dwellings/buildings or unavailable markers, overlays, assignments, manifests, residuals, diagnostics, uncertainty, provenance.
synth-reviewer audits constraints, marginals, household/family realism, hidden populations, dwelling/building grounding, work/school assignment, uncertainty, provenance, and privacy.
PASS finalizes; NEEDS_MODEL_FIX routes back to modeler; NEEDS_MORE_SOURCES routes to marginal/distribution researchers then downloader; exhausted evidence/model plateau stops for human decision.
Quality gates and stop conditions
- PASS: satisfactory for declared country evidence tier and internal review mode.
- NEEDS_MODEL_FIX: model logic, bundle, uncertainty, household/dwelling/assignment issue.
- NEEDS_MORE_SOURCES: missing marginal or joint/conditional evidence; researchers then downloader.
- EVIDENCE_EXHAUSTED_HUMAN_REVIEW: source search cannot responsibly improve the output.
- MODEL_IMPROVEMENT_EXHAUSTED_HUMAN_REVIEW: modeler cannot materially improve or diagnostics plateau.