Linked samples and measurement error in historical US census data,Explorations in Economic History

当前位置： X-MOL 学术 › Explor. Econ. Hist. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Linked samples and measurement error in historical US census data
Explorations in Economic History ( IF 2.6 ) Pub Date : 2024-02-03 , DOI: 10.1016/j.eeh.2024.101579
Sam Il Myoung Hwang , Munir Squires

The quality of historical US census data is critical to the performance of linking algorithms. We use genealogical profiles to correct measurement error in census names and ages. Our findings suggest that one in every two records has an error in name or age, and human capital is correlated with lower error rates. While errors in age decline across subsequent census rounds from 1850 to 1930, errors in names do not exhibit such trends. Fixing all transcription errors, hence leaving only those errors made at the time of enumeration, would reduce error rates in names by 41 percent. Correcting all names and ages using genealogical profiles leads to 20%–36% more links and fewer false positives. Reassuringly, we find that reducing such errors has a negligible effect on estimates of intergenerational mobility.

中文翻译：

美国历史人口普查数据中的关联样本和测量误差

美国历史人口普查数据的质量对于链接算法的性能至关重要。我们使用家谱资料来纠正人口普查姓名和年龄中的测量错误。我们的研究结果表明，每两条记录中就有一条存在姓名或年龄错误，而人力资本与较低的错误率相关。虽然年龄错误在 1850 年至 1930 年随后的几轮人口普查中有所下降，但姓名错误却没有表现出这种趋势。修复所有抄写错误，从而仅保留枚举时出现的错误，可将姓名错误率降低 41%。使用家谱资料更正所有姓名和年龄可增加 20%–36% 的链接并减少误报。令人欣慰的是，我们发现减少此类误差对代际流动性的估计影响可以忽略不计。

更新日期：2024-02-03

点击分享查看原文

点击收藏

公开下载

阅读更多本刊新发论文