← Back to country index
Synthestat · Estonia · population QA

EE 1:1 population synthesis QA cycle

Country-specific layer for synthetic people in households, dwellings, real building stock where available, hidden-population overlays, and work/school assignment evidence.

Board: synthestat-population-qa · Tenant: synthestat · Country: EE · Task workflow status: ready · Artifact completion: review_bundle_metrics_partial

Ideal-country quality criteria: impossible 1:1 benchmark

This is the common gold-standard benchmark for an ideal country. It is intentionally impossible to fully satisfy: complete success would mean a 1:1 replica of the real population where every person, household, dwelling, attribute, and assignment is exactly represented. The QA page uses it as an asymptote and gap taxonomy, not as a release promise.

Apply this same rubric to this country’s latest run, then report which needs are measured, constrained, modelled, unavailable, or blocked.

NeedUnachievable idealQA evidence we require insteadWhy perfection cannot be achieved
Complete de jure resident coverageEvery real resident represented exactly once in the right country, municipality, small area, household, and dwelling.Synthetic person count equals official population at all enforced geographies; no unexplained duplicate, missing, or out-of-universe people.A true 1:1 resident list is a confidential population register and changes continuously; Synthestat can only match official aggregates and declared source universes.
Complete attribute truthEach synthetic person has the same age, sex, household role, education, occupation, industry, origin, health proxy, income proxy, and lifecycle state as the corresponding real person.Published marginal and cross-tab constraints pass within HARD/FIRM/SOFT tolerances; modelled fields carry uncertainty and measured/constrained/modelled provenance.Official releases do not expose a complete individual joint distribution, and many attributes are survey-derived, lagged, suppressed, or unavailable at fine geography.
Perfect household and family structureEvery household contains the exact real members and relationships, including multi-generation, partnership, child, shared, institutional, and edge-case arrangements.Household totals, household-type distributions, age/sex/role consistency, fertility/child constraints, and structural invariants pass with explicit residuals.Household membership is sensitive microdata; public sources usually expose only aggregate household/family tables and partial cross-tabs.
Exact dwelling and building groundingEvery household is assigned to its real dwelling and building with exact occupancy, vacancy, dwelling type, floor area, tenure, and address-level geography.Dwelling/building capacity checks pass; vacancy/second-home/institutional dwellings are represented or explicitly unavailable; building links have source provenance.Many countries lack open address-level registers; dwelling occupancy is confidential and time-varying.
Complete de facto and hidden-population overlaysHomeless, undocumented, refugees, students away from home, seasonal, institutional, tourists, and daytime populations are all represented with exact location and timing.Overlay layers use interval estimates, source-specific quality flags, and never silently modify de jure HARD constraints.Hidden populations are partly unobserved by definition; ethical/privacy constraints forbid exact person-level labels.
Exact school, workplace, facility, and mobility assignmentEvery person is assigned to the real school, workplace, care provider, commute, and daily activity chain they use.Assignment layers use official registers/OD flows where available; modelled assignments are flagged and validated only against aggregate flows/capacities.Operational assignments are usually protected registers or dynamic behavioural data; Phase 1 must not imply they are known.
Full joint-distribution realismThe full multivariate joint distribution is identical to reality across all attributes, households, geography, and rare subgroups.High-priority marginals/cross-tabs pass; sparse zones and prior-dominated attributes are clearly marked with quality tiers and credible intervals.The joint distribution is non-identifiable from published marginals; IPF/BN/hierarchical pooling choose plausible distributions, not truth.
Zero uncertainty and zero lagAll values are current today and known without error.Every output records reference period, retrieval timestamp, lag, confidence, uncertainty bounds, and degradation decisions.Official statistics are lagged, revised, sampled, suppressed, and harmonized after collection.
Privacy-safe yet maximally detailed releaseThe system releases maximum useful detail while creating zero re-identification risk.Release mode, k-anonymity/cell safeguards, perturbation/aggregation policy, and sensitive-field treatment are explicit.Fine-area synthetic microdata can still create structurally unique records; synthetic does not mean anonymous.
Perfect reproducibility and auditabilityAny user can trace every output record to exact source snapshots, transformations, constraints, relaxations, seeds, and code versions.Run manifests, source provenance, checksums, frozen extracts, seeds, versioned crosswalks, validation reports, and relaxation logs are complete.This is approachable but never final: source portals, classifications, geography, and code keep changing, so audits must be continuously renewed.

Population artifact output status (separate from task status)

Artifact completionRow count sourcePeopleTarget populationNational coverageAbsolute shortfallHouseholdsDwellingsHouses/buildingsMax marginal deviationHARD statusRun
review_bundle_metrics_partialparquet_metadata_review_bundle561,655ee_population_private_household_national_2021_cycle2_seed420987

This table describes emitted population artifacts only. It is intentionally independent from the Kanban task workflow status below: a country can have all tasks done while its artifact is still only a seeded slice, or a passing review bundle can still lack national target completion. Deviation is the maximum absolute relative error across collected HARD/FIRM/SOFT marginal constraints in the latest review bundle. GUIDE/INFORMATIONAL priors are excluded. National target/coverage are read from build_manifest.json when available and override any visual impression of completion.

Kanban task workflow status (not artifact completion)

ready
1
done
7

These cards count board tasks only. They do not certify that the country-level population artifact is nationally complete or reviewer-approved.

Datasets and distributions

Lists come from the latest run bundle: source_provenance.json, distribution_diagnostics.json, and build_manifest.json.

Summary

Datasets used0
Distributions available0
Constraints/distributions used in synthesis3
Constraint types
Dataset variants
Finest-geography status

Source gaps

  • No source gaps listed.

Datasets used

Dataset/source ID
None listed yet.

Best source by distribution family

Distribution familyDataset/source ID
None listed yet.

Available distributions / priors in registry

SpecLabelTypeGeoStatusVariantConfidenceData URI
None listed yet.

Constraints/distributions used in synthesis manifest

Constraint or distribution ID
EE_STAT_RL21707_private_households_total
EE_STAT_RL21707_private_household_members_total
EE_STAT_RL21703_private_household_age_sex

Current country tasks

IDTitleAssigneeStatusCreatedLatest summary
t_8026f58dEE final human/internal review note for private-household national PASSsynth-reviewerready2026-05-19 22:03:45 CEST
t_b967c410EE reviewer pass: inspect cycle-2 national private-household model fixsynth-reviewerdone2026-05-19 20:59:14 CESTReviewed EE cycle-2 national private-household bundle at /home/synthestat/output/runs/EE/ee_population_private_household_national_2021_cycle2_seed420987 and returned PASS for the declared private-household scope. Independent parquet QA verified 1,317,666 persons, 561,655 households, 561,655 dwelling
t_1bf7ff30EE model fix: repair invalid child-alone household assignment in national private-household bundlesynth-modelerdone2026-05-19 20:51:16 CESTBuilt the EE cycle-2 national private-household review bundle at /home/synthestat/output/runs/EE/ee_population_private_household_national_2021_cycle2_seed420987. The model now assigns one adult reference per private household before filling remaining slots, preserving exact private-household counts
t_8f902059EE population QA reviewer: inspect national-scale non-fixture rerunsynth-reviewerdone2026-05-19 20:17:33 CESTEE review complete: the rerun is a real national-scale private-household artifact (1,317,666 persons; 561,655 households), not the 8-person fixture, and it matches declared private-household HARD counts. Verdict is NEEDS_MODEL_FIX because household assignment is structurally invalid (174,570 child-w
t_4eb48eb2EE population QA modeler: national-scale non-fixture synthesis rerunsynth-modelerdone2026-05-19 20:17:32 CESTBuilt EE non-fixture national-scale private-household review bundle at /home/synthestat/output/runs/EE/ee_population_private_household_national_2021_seed420987. It explicitly rejects the old 8-person /output/EE fixture as national, matches 2021 Statistics Estonia private-household HARD controls exac
t_fdbf1017EE population QA downloader: freeze exact official payloads for national-scale rerunsynth-downloaderdone2026-05-19 20:17:31 CESTFroze the EE national-scale population QA official payload bundle: 48/48 approved Statistics Estonia PxWeb tables and 2/2 Maa-amet WFS layers completed with manifests, checksums, row/feature counts, schema sketches, and downloader-local catalogue handoffs. Updated downloader latest.md and appended t
t_9140b2f8EE population QA distribution closure: joint priors for non-fixture synthesissynth-distributions-researcherdone2026-05-19 20:17:30 CESTCompleted EE population QA distribution closure with verdict DISTRIBUTION_READY_FOR_MODEL_FIX. Wrote dated EE findings and extraction specs, preserved/updated the shared latest evidence board with an EE section, and appended a manager update.
t_bcf523c3EE population QA source closure: exact national marginals for non-fixture synthesissynth-marginals-researcherdone2026-05-19 20:17:29 CESTCompleted EE population QA source closure and wrote dated sources/needs handoffs plus latest.md under /home/synthestat/workspace/manager_handoffs/marginals/. Estonia is SOURCE_READY_FOR_MODEL_FIX after downloader freezes official Statistics Estonia PXWeb/Maa-amet WFS payloads; national target is 1,3

Process

Manager kickoff

synth-manager creates and controls the country loop.

Model build

synth-modeler generates the review bundle: people, households, dwellings/buildings or unavailable markers, overlays, assignments, manifests, residuals, diagnostics, uncertainty, provenance.

Reviewer gate

synth-reviewer audits constraints, marginals, household/family realism, hidden populations, dwelling/building grounding, work/school assignment, uncertainty, provenance, and privacy.

Branch

PASS finalizes; NEEDS_MODEL_FIX routes back to modeler; NEEDS_MORE_SOURCES routes to marginal/distribution researchers then downloader; exhausted evidence/model plateau stops for human decision.

Quality gates and stop conditions