Predicting problem-solving success in an office simulation applying N-grams and a random forest to behavioral process data,Computers & Education

当前位置： X-MOL 学术 › Comput. Educ. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Predicting problem-solving success in an office simulation applying N-grams and a random forest to behavioral process data
Computers & Education ( IF 8.9 ) Pub Date : 2024-06-03 , DOI: 10.1016/j.compedu.2024.105093
Sabrina Ludwig , Andreas Rausch , Viola Deutscher , Jürgen Seifried

Predicting students' problem-solving success in computer-based simulations at an early stage allows adaptive educational systems to provide learners with personalized support. In this paper, we predict students' problem-solving success by applying a machine-learning model, the random forest, to produce a binary classification (more vs. less successful students). During a business-related problem scenario that lasted 55 min, early behavioral data (during the first 5, 10, and 20 min) such as mouse clicks and keyboard strokes (approx. 29,800 early-window clickstreams and keystrokes during the first 20 min) of 234 trainees were recorded, mirroring the students' problem-solving behavior. We used the n-gram sequence mining technique, which was originally introduced within the emerging disciplines of natural language processing, text mining, and machine learning and has proven to be effective, particularly in the examination of online behavior. We trained the random forest model with training datasets that included all features (bigrams), as well as selected features (the most predictable bigrams explaining inter-group differences). Our results show that early predictions based on the first 10 and 20 min contained sufficient information to accurately predict problem-solving success, while predictions that are too early (based on the first 5 min) do not. As the size of the initial time window expanded, the classification performance improved. Moreover, the selection of the most predictable features improved the models' performance for all three time intervals. The model that was trained with only selected robust features that occurred in the first 20 min achieved the highest ROC AUC score of almost 0.70. This result falls within the range of accuracy scores observed in similar studies. From the instructor's perspective, predictions help in the early identification of weak students and can provide them with personalized learning prompts. For more successful students, tasks can be enriched adaptively.

中文翻译：

将 N 元语法和随机森林应用于行为过程数据，预测办公室模拟中问题解决的成功

在早期阶段通过基于计算机的模拟预测学生解决问题的成功率，使自适应教育系统能够为学习者提供个性化支持。在本文中，我们通过应用机器学习模型（随机森林）来预测学生解决问题的成功率，以产生二元分类（更成功的学生与不太成功的学生）。在持续 55 分钟的业务相关问题场景中，早期行为数据（前 5、10 和 20 分钟），例如鼠标点击和键盘敲击（前 20 分钟内约 29,800 次早期窗口点击流和击键） 234 名学员的记录被记录下来，反映了学生解决问题的行为。我们使用了 n-gram 序列挖掘技术，该技术最初是在自然语言处理、文本挖掘和机器学习等新兴学科中引入的，并已被证明是有效的，特别是在在线行为的检查方面。我们使用包含所有特征（二元组）以及选定特征（解释组间差异的最可预测的二元组）的训练数据集来训练随机森林模型。我们的结果表明，基于前 10 分钟和 20 分钟的早期预测包含足够的信息来准确预测问题解决的成功，而过早的预测（基于前 5 分钟）则不然。随着初始时间窗口的扩大，分类性能得到提高。此外，选择最可预测的特征提高了模型在所有三个时间间隔的性能。仅使用前 20 分钟内发生的选定稳健特征进行训练的模型获得了几乎 0.70 的最高 ROC AUC 分数。该结果属于类似研究中观察到的准确性分数范围内。从教师的角度来看，预测有助于及早识别薄弱学生，并为他们提供个性化的学习提示。对于更成功的学生，任务可以自适应地丰富。

更新日期：2024-06-03

点击分享查看原文

点击收藏

阅读更多本刊新发论文本刊介绍/投稿指南