CZ downloader follow-up: ingest CZSO LKOD/Census 2021, CUZK/RUIAN, MPSV/UZIS/MŠMT official sources
done synth-downloader
Task metadata
| id | t_ae2f8661 |
|---|---|
| title | CZ downloader follow-up: ingest CZSO LKOD/Census 2021, CUZK/RUIAN, MPSV/UZIS/MŠMT official sources |
| assignee | synth-downloader |
| status | done |
| tenant | synthestat |
| priority | 100 |
| workspace_kind | scratch |
| workspace_path | /home/synthestat/.hermes/kanban/boards/synthestat-population-qa/workspaces/t_ae2f8661 |
| created_by | synth-marginals-researcher |
| created_at | 2026-05-19 18:47:23 CEST |
| started_at | 2026-05-19 18:47:49 CEST |
| completed_at | 2026-05-19 19:03:29 CEST |
Latest summary
Completed CZ official-source ingestion/provenance freeze: downloaded 16/16 requested CZSO LKOD datasets plus schemas, MPSV RPSS JSON/schema, UZIS NRPZS route+CSV exports, MŠMT register/data route evidence, and CUZK VDP/VFR route pages. Wrote downloader handoffs and catalogue artifacts with checksums, retrieval timestamps, usage terms, row geography audit, and evidence-exhausted flags preserved.
Body
Follow-up from CZ marginals source gap closure task t_6869427a. Primary handoff files: - /home/synthestat/workspace/manager_handoffs/marginals/2026-05-19_1639_sources.md - /home/synthestat/workspace/manager_handoffs/marginals/2026-05-19_1639_needs.md - /home/synthestat/workspace/manager_handoffs/marginals/latest.md Acceptance criteria: 1. Implement/fix CZSO LKOD downloader path using catalogue https://vdb.czso.cz/pll/eweb/lkod_ld.seznam and metadata endpoint https://vdb.czso.cz/pll/eweb/lkod_ld.datova_sada?id=<dataset_id>. 2. Download/schema-verify these CZSO datasets at minimum: sldb2021_vek1_pohlavi, sldb2021_domacnosti_clenu_typ, sldb2021_pocty_clenu_typ, sldb2021_vzdelani_vek_pohlavi, sldb2021_aktivita_vek1_pohlavi, sldb2021_odvetvi_vek1_pohlavi, sldb2021_klasif_pohlavi, sldb2021_dojizdka_obce, sldb2021_vyjizdka_pracoviste_cil_pohlavi, sldb2021_byty_obydlenost_druhdomu, sldb2021_domy_obydlen_druh, sldb2021_bydleni_vek_pohlavi, 130181r25, 160061, 110079, 290038r25. 3. Preserve provenance: source URL, dataset ID, distribution URL, schema URL, retrieval timestamp, licence/usage terms (`podmínky_užití`), geography fields, reference period, checksum, and quality flags. 4. Implement/validate CUZK/RUIAN VDP/VFR retrieval route for building/address grounding from https://vdp.cuzk.gov.cz/vdp/ruian/vymennyformat and https://cuzk.gov.cz/vfr; confirm building object/address point/dwelling-unit semantics. 5. Download/inspect MPSV RPSS JSON/schema (https://data.mpsv.cz/od/soubory/rpss/rpss.json and .schema.json), UZIS NRPZS open data, and MŠMT school register routes for facility rosters/capacities. 6. Report exact row-level geography for each CZSO table; do not silently promote national/ORP/region rows to obec/ZSJ. 7. Leave EVIDENCE_EXHAUSTED flags in place where schemas confirm no current-obec age/sex, ISCO-3 occupation, small-area income, homelessness/undocumented/seasonal/Syrian-specific, or school catchment/attendance data. Do not model or rerun the population bundle; this is ingestion/provenance only.
Parents
[ "t_6869427a" ]
Children
[]
Runs
| ID | Profile | Status | Outcome | Started | Ended | Summary/error |
|---|---|---|---|---|---|---|
| 42 | synth-downloader | done | completed | 2026-05-19 18:47:49 CEST | 2026-05-19 19:03:29 CEST | Completed CZ official-source ingestion/provenance freeze: downloaded 16/16 requested CZSO LKOD datasets plus schemas, MPSV RPSS JSON/schema, UZIS NRPZS route+CSV exports, MŠMT register/data route evidence, and CUZK VDP/VFR route pages. Wrote downloader handoffs and catalogue artifacts with checksums, retrieval timestamps, usage terms, row geography audit, and evidence-exhausted flags preserved. |
Events
| Time | Kind | Payload |
|---|---|---|
| 2026-05-19 18:47:23 CEST | created | {
"assignee": "synth-downloader",
"status": "todo",
"parents": [
"t_6869427a"
],
"tenant": "synthestat",
"skills": null
} |
| 2026-05-19 18:47:39 CEST | promoted | null |
| 2026-05-19 18:47:49 CEST | claimed | {
"lock": "vmi3188806:1590352",
"expires": 1779210169,
"run_id": 42
} |
| 2026-05-19 18:47:49 CEST | spawned | {
"pid": 1625266
} |
| 2026-05-19 19:03:01 CEST | claim_extended | {
"reason": "pid_alive",
"worker_pid": 1625266,
"claim_lock": "vmi3188806:1590352",
"claim_expires_was": 1779210169,
"claim_expires_now": 1779211081,
"last_heartbeat_at": null
} |
| 2026-05-19 19:03:29 CEST | completed | {
"result_len": 0,
"summary": "Completed CZ official-source ingestion/provenance freeze: downloaded 16/16 requested CZSO LKOD datasets plus schemas, MPSV RPSS JSON/schema, UZIS NRPZS route+CSV exports, MŠMT register/data route evidence, and CUZK VDP/VFR route pages. Wrote downloader handoffs and catalogue artifacts with checksums, retrieval timestamps, usage terms, row geography audit, and evidence-exhausted flags preserved.",
"artifacts": [
"/home/synthestat/workspace/manager_handoffs/downloader/2026-05-19_1849_download_log.md",
"/home/synthestat/workspace/manager_handoffs/downloader/2026-05-19_1849_catalogue_updates.md",
"/home/synthestat/workspace/manager_handoffs/downloader/latest.md",
"/home/synthestat/output/catalogue/cz_official_sources_catalogue_latest.json",
"/home/synthestat/output/catalogue/cz_czso_geography_audit_latest.json"
]
} |
Comments
No comments yet.