Child and Adolescent Social Work Journal ( IF 1.4 ) Pub Date : 2023-05-23 , DOI: 10.1007/s10560-023-00931-2 Seventy F. Hall , Melanie Sage , Carol F. Scott , Kenneth Joseph
Child welfare agencies increasingly use machine learning models to predict outcomes and inform decisions. These tools are intended to increase accuracy and fairness but can also amplify bias. This systematic review explores how researchers addressed ethics, equity, bias, and model performance in their design and evaluation of predictive and prescriptive algorithms in child welfare. We searched EBSCO databases, Google Scholar, and reference lists for journal articles, conference papers, dissertations, and book chapters published between January 2010 and March 2020. Sources must have reported on the use of algorithms to predict child welfare-related outcomes and either suggested prescriptive responses, or applied their models to decision-making contexts. We calculated descriptive statistics and conducted Mann-Whitney U tests, and Spearman’s rank correlations to summarize and synthesize findings. Of 15 articles, fewer than half considered ethics, equity, or bias or engaged participatory design principles as part of model development/evaluation. Only one-third involved cross-disciplinary teams. Model performance was positively associated with number of algorithms tested and sample size. No other statistical tests were significant. Interest in algorithmic decision-making in child welfare is growing, yet there remains no gold standard for ameliorating bias, inequity, and other ethics concerns. Our review demonstrates that these efforts are not being reported consistently in the literature and that a uniform reporting protocol may be needed to guide research. In the meantime, computer scientists might collaborate with content experts and stakeholders to ensure they account for the practical implications of using algorithms in child welfare settings.
中文翻译:
儿童福利领域复杂预测和规范分析的系统回顾:准确性、公平性和偏差
儿童福利机构越来越多地使用机器学习模型来预测结果并为决策提供信息。这些工具旨在提高准确性和公平性,但也可能放大偏见。本系统综述探讨了研究人员在设计和评估儿童福利预测和规范算法时如何解决道德、公平、偏见和模型性能问题。我们检索了 EBSCO 数据库、Google Scholar 和参考文献列表,查找 2010 年 1 月至 2020 年 3 月期间发表的期刊文章、会议论文、论文和书籍章节。消息来源必须报告了使用算法来预测儿童福利相关结果的情况,并且建议规定性反应,或将他们的模型应用于决策环境。我们计算了描述性统计数据并进行了 Mann-Whitney U 检验和 Spearman 等级相关性来总结和综合研究结果。在 15 篇文章中,只有不到一半考虑道德、公平或偏见,或将参与式设计原则作为模型开发/评估的一部分。只有三分之一涉及跨学科团队。模型性能与测试的算法数量和样本大小呈正相关。其他统计测试均无显着性。人们对儿童福利算法决策的兴趣与日俱增,但仍然没有改善偏见、不平等和其他道德问题的黄金标准。我们的审查表明,文献中并未一致地报告这些努力,可能需要统一的报告协议来指导研究。与此同时,计算机科学家可能会与内容专家和利益相关者合作,以确保他们考虑到在儿童福利环境中使用算法的实际影响。