Uncovering student profiles. An explainable cluster analysis approach to PISA 2022,Computers & Education

当前位置： X-MOL 学术 › Comput. Educ. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Uncovering student profiles. An explainable cluster analysis approach to PISA 2022
Computers & Education ( IF 8.9 ) Pub Date : 2024-09-20 , DOI: 10.1016/j.compedu.2024.105166
Miguel Alvarez-Garcia, Mar Arenas-Parra, Raquel Ibar-Alonso

Educational data mining (EDM) applied to the wealth of data generated from international large-scale assessments (ILSAs) shows potential for identifying successful educational initiatives. Despite limited research on clustering methods in ILSAs, leveraging these methods to uncover student profiles can help decision-making in designing tailored programs. This study aims to identify and characterize 15-year-old student profiles using PISA 2022 data and reveal insights into the relationship between these profiles and factors such as ICT availability and use, gender, academic performance, and educational expectations. We analyzed PISA 2022 Spanish student data (n = 30,800) with a selection of 74 contextual variables, applying an end-to-end explainable cluster analysis methodology that integrates different machine learning (ML) and explainable artificial intelligence (XAI) techniques. This methodology covered data pre-processing, dimensionality reduction, clustering, and classification to ensure data quality and result explainability. We obtained 16 derived variables, 7 student clusters, and an optimal XGBoost classifier with a global accuracy of 0.8643. Using local and global SHAP values, we interpreted clusters, finding that socio-economic status and ICT availability and use at home are the most important factors differentiating student profiles. Our findings suggest the need to emphasize (i) proper ICT accessibility and use, as well as student support networks to improve academic performance, (ii) gender-specific well-being programs, and (iii) the encouragement of educational expectations tailored to students’ gender and their exposure to higher education. These results pave the way for personalized academic policies and programs through ML-based tools for uncovering student profiles.

中文翻译：

发现学生档案。PISA 2022 的可解释聚类分析方法

教育数据挖掘（EDM）应用于国际大规模评估（ILSA）生成的大量数据，显示出识别成功教育计划的潜力。尽管对 ILSA 中的聚类方法的研究有限，但利用这些方法来发现学生档案可以帮助决策设计量身定制的课程。本研究旨在使用 PISA 15 数据识别和描述其 2022 岁学生概况，并揭示这些概况与 ICT 可用性和使用、性别、学习成绩和教育期望等因素之间的关系。我们分析了 PISA 2022 西班牙学生数据（n = 30,800），并选择了 74 个上下文变量，应用了集成不同机器学习（ML）和可解释人工智能（XAI）技术的端到端可解释聚类分析方法。该方法涵盖数据预处理、降维、聚类和分类，以确保数据质量和结果可解释性。我们获得了 16 个派生变量、7 个学生集群和一个全局准确率为 0.8643 的最佳 XGBoost 分类器。使用本地和全球 SHAP 值，我们解释了集群，发现社会经济地位和 ICT 的可用性和家庭使用是区分学生概况的最重要因素。我们的研究结果表明，需要强调（i）适当的 ICT 可访问性和使用，以及提高学习成绩的学生支持网络，（ii）针对特定性别的福利计划，以及（iii）鼓励针对学生的性别和接受高等教育的经历量身定制的教育期望。这些结果为通过基于 ML 的工具来发现学生档案的个性化学术政策和计划铺平了道路。

更新日期：2024-09-20

点击分享查看原文

点击收藏

公开下载

阅读更多本刊新发论文本刊介绍/投稿指南