Programmatic access to open statistical data for population studies: The SDMX standard (by Frans Willekens)
Demographic Research ( IF 2.1 ) Pub Date : 2023-12-13
Frans Willekens

Background: The public sector publishes vast amounts of open data and metadata. APIs (application programming interfaces) are transforming the way data are collected, documented, and disseminated. The transformation is slow, however, due to differences in communication protocol, data definition, and data format. The development is of particular relevance to demography, being a data-intensive science. It paves the way to the automation of data acquisition and the integration of data acquisition and data analysis. Together with the parallel development of literate programming, which allows the integration of text and computer code in a single document, programmatic access to data makes workflows transparent, verifiable, and easy to replicate by others. The Statistical Data and Metadata Exchange (SDMX) standard, which has emerged as a popular option for data and metadata exchange, makes finding and retrieving data and metadata easy and swift. Query strings form URLs with a standardised syntax. Objective: The aim of this paper is to describe the SDMX standard and demonstrate its benefits to our profession by retrieving demographic data and the associated metadata from online databases disseminated by a variety of data providers. The software environment used is R. Contribution: This is the first review of the SDMX standard aimed at the study of population. The paper includes the R code to access databases and download data and metadata. The paper includes several hyperlinks to relevant documents issued by data providers, giving readers immediate access to the referenced material.


以编程方式访问人口研究的开放统计数据:SDMX 标准(Frans Willekens)

背景:公共部门发布大量开放数据和元数据。API(应用程序编程接口)正在改变数据收集、记录和传播的方式。然而,由于通信协议、数据定义和数据格式的差异,转换速度很慢。这一发展与人口学尤其相关,因为人口学是一门数据密集型科学。它为数据采集的自动化以及数据采集和数据分析的集成铺平了道路。结合文学编程的并行开发(允许将文本和计算机代码集成在单个文档中),对数据的编程访问使工作流程透明、可验证且易于其他人复制。统计数据和元数据交换 (SDMX) 标准已成为数据和元数据交换的流行选项,使查找和检索数据和元数据变得简单快捷。查询字符串使用标准化语法形成 URL。目标:本文的目的是描述 SDMX 标准,并通过从各种数据提供商传播的在线数据库中检索人口统计数据和相关元数据来展示其对我们行业的好处。使用的软件环境是R。 贡献:这是针对人口研究的SDMX标准的第一次审查。该论文包括用于访问数据库以及下载数据和元数据的 R 代码。该论文包含多个由数据提供商发布的相关文件的超链接,使读者可以立即访问参考材料。