← Back to CZ country layer · Country index

CZ downloader follow-up: ingest CZSO LKOD/Census 2021, CUZK/RUIAN, MPSV/UZIS/MŠMT official sources

done synth-downloader

Task metadata

idt_ae2f8661
titleCZ downloader follow-up: ingest CZSO LKOD/Census 2021, CUZK/RUIAN, MPSV/UZIS/MŠMT official sources
assigneesynth-downloader
statusdone
tenantsynthestat
priority100
workspace_kindscratch
workspace_path/home/synthestat/.hermes/kanban/boards/synthestat-population-qa/workspaces/t_ae2f8661
created_bysynth-marginals-researcher
created_at2026-05-19 18:47:23 CEST
started_at2026-05-19 18:47:49 CEST
completed_at2026-05-19 19:03:29 CEST

Latest summary

Completed CZ official-source ingestion/provenance freeze: downloaded 16/16 requested CZSO LKOD datasets plus schemas, MPSV RPSS JSON/schema, UZIS NRPZS route+CSV exports, MŠMT register/data route evidence, and CUZK VDP/VFR route pages. Wrote downloader handoffs and catalogue artifacts with checksums, retrieval timestamps, usage terms, row geography audit, and evidence-exhausted flags preserved.

Body

Follow-up from CZ marginals source gap closure task t_6869427a.

Primary handoff files:
- /home/synthestat/workspace/manager_handoffs/marginals/2026-05-19_1639_sources.md
- /home/synthestat/workspace/manager_handoffs/marginals/2026-05-19_1639_needs.md
- /home/synthestat/workspace/manager_handoffs/marginals/latest.md

Acceptance criteria:
1. Implement/fix CZSO LKOD downloader path using catalogue https://vdb.czso.cz/pll/eweb/lkod_ld.seznam and metadata endpoint https://vdb.czso.cz/pll/eweb/lkod_ld.datova_sada?id=<dataset_id>.
2. Download/schema-verify these CZSO datasets at minimum: sldb2021_vek1_pohlavi, sldb2021_domacnosti_clenu_typ, sldb2021_pocty_clenu_typ, sldb2021_vzdelani_vek_pohlavi, sldb2021_aktivita_vek1_pohlavi, sldb2021_odvetvi_vek1_pohlavi, sldb2021_klasif_pohlavi, sldb2021_dojizdka_obce, sldb2021_vyjizdka_pracoviste_cil_pohlavi, sldb2021_byty_obydlenost_druhdomu, sldb2021_domy_obydlen_druh, sldb2021_bydleni_vek_pohlavi, 130181r25, 160061, 110079, 290038r25.
3. Preserve provenance: source URL, dataset ID, distribution URL, schema URL, retrieval timestamp, licence/usage terms (`podmínky_užití`), geography fields, reference period, checksum, and quality flags.
4. Implement/validate CUZK/RUIAN VDP/VFR retrieval route for building/address grounding from https://vdp.cuzk.gov.cz/vdp/ruian/vymennyformat and https://cuzk.gov.cz/vfr; confirm building object/address point/dwelling-unit semantics.
5. Download/inspect MPSV RPSS JSON/schema (https://data.mpsv.cz/od/soubory/rpss/rpss.json and .schema.json), UZIS NRPZS open data, and MŠMT school register routes for facility rosters/capacities.
6. Report exact row-level geography for each CZSO table; do not silently promote national/ORP/region rows to obec/ZSJ.
7. Leave EVIDENCE_EXHAUSTED flags in place where schemas confirm no current-obec age/sex, ISCO-3 occupation, small-area income, homelessness/undocumented/seasonal/Syrian-specific, or school catchment/attendance data.

Do not model or rerun the population bundle; this is ingestion/provenance only.

Parents

[
  "t_6869427a"
]

Children

[]

Runs

IDProfileStatusOutcomeStartedEndedSummary/error
42synth-downloaderdonecompleted2026-05-19 18:47:49 CEST2026-05-19 19:03:29 CESTCompleted CZ official-source ingestion/provenance freeze: downloaded 16/16 requested CZSO LKOD datasets plus schemas, MPSV RPSS JSON/schema, UZIS NRPZS route+CSV exports, MŠMT register/data route evidence, and CUZK VDP/VFR route pages. Wrote downloader handoffs and catalogue artifacts with checksums, retrieval timestamps, usage terms, row geography audit, and evidence-exhausted flags preserved.

Events

TimeKindPayload
2026-05-19 18:47:23 CESTcreated{ "assignee": "synth-downloader", "status": "todo", "parents": [ "t_6869427a" ], "tenant": "synthestat", "skills": null }
2026-05-19 18:47:39 CESTpromotednull
2026-05-19 18:47:49 CESTclaimed{ "lock": "vmi3188806:1590352", "expires": 1779210169, "run_id": 42 }
2026-05-19 18:47:49 CESTspawned{ "pid": 1625266 }
2026-05-19 19:03:01 CESTclaim_extended{ "reason": "pid_alive", "worker_pid": 1625266, "claim_lock": "vmi3188806:1590352", "claim_expires_was": 1779210169, "claim_expires_now": 1779211081, "last_heartbeat_at": null }
2026-05-19 19:03:29 CESTcompleted{ "result_len": 0, "summary": "Completed CZ official-source ingestion/provenance freeze: downloaded 16/16 requested CZSO LKOD datasets plus schemas, MPSV RPSS JSON/schema, UZIS NRPZS route+CSV exports, MŠMT register/data route evidence, and CUZK VDP/VFR route pages. Wrote downloader handoffs and catalogue artifacts with checksums, retrieval timestamps, usage terms, row geography audit, and evidence-exhausted flags preserved.", "artifacts": [ "/home/synthestat/workspace/manager_handoffs/downloader/2026-05-19_1849_download_log.md", "/home/synthestat/workspace/manager_handoffs/downloader/2026-05-19_1849_catalogue_updates.md", "/home/synthestat/workspace/manager_handoffs/downloader/latest.md", "/home/synthestat/output/catalogue/cz_official_sources_catalogue_latest.json", "/home/synthestat/output/catalogue/cz_czso_geography_audit_latest.json" ] }

Comments

No comments yet.