Music teachers’ labeling accuracy and quality ratings of lesson plans by artificial intelligence (AI) and humans,International Journal of Music Education

当前位置： X-MOL 学术 › Int. J. Music Educ. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Music teachers’ labeling accuracy and quality ratings of lesson plans by artificial intelligence (AI) and humans
International Journal of Music Education ( IF 1.3 ) Pub Date : 2024-05-08 , DOI: 10.1177/02557614241249163
Patrick K Cooper ₁

Affiliation

This study explored the potential of artificial intelligence (ChatGPT) to generate lesson plans for music classes that were indistinguishable from music lesson plans created by humans, with current music teachers as assessors. Fifty-six assessors made a total of 410 ratings across eight lesson plans, assigning a quality score to each lesson plan and labeling if they believed each lesson plan was created by a human or generated by AI. Despite the human-made lesson plans being rated higher in quality as a group ( p < .01, d = 0.44), assessors were unable to accurately label if a lesson plan was created by a human or generated by AI (55% accurate overall). Labeling accuracy was positively predicted by quality scores on human-made lesson plans and previous personal use of AI, while accuracy was negatively predicted by quality scores on AI-generated lesson plans and perception of how useful AI will be in the future. Open-ended responses from 42 teachers suggested assessors used three factors when making evaluations: specific details, evidence of classroom knowledge, and wording. Implications provide suggestions for how music teachers can use prompt engineering with a GPT model to create a virtual assistant or Intelligent Tutor System (ITS) for their classroom.

中文翻译：

音乐教师通过人工智能 (AI) 和人类对课程计划进行标注的准确性和质量评级

这项研究探讨了人工智能 (ChatGPT) 为音乐课程生成课程计划的潜力，这些课程计划与人类创建的音乐课程计划没有区别，并以现任音乐教师作为评估者。 56 名评估员对 8 个课程计划总共进行了 410 项评分，为每个课程计划打出质量分数，并标记他们是否认为每个课程计划是由人类创建还是由人工智能生成。尽管人工制作的课程计划作为一个整体被评为质量较高（p < .01，d = 0.44），但评估人员无法准确标记课程计划是由人类创建还是由人工智能生成（总体准确率为 55%））。人工制作的课程计划的质量分数和之前个人对人工智能的使用对标签准确性进行了积极的预测，而人工智能生成的课程计划的质量分数和对人工智能未来有用性的看法对标签准确性进行了负面预测。 42 名教师的开放式回答建议评估者在进行评估时使用三个因素：具体细节、课堂知识证据和措辞。启示为音乐教师如何使用 GPT 模型的即时工程为课堂创建虚拟助手或智能导师系统 (ITS) 提供了建议。

更新日期：2024-05-08

点击分享查看原文

点击收藏

阅读更多本刊新发论文