当前位置:
X-MOL 学术
›
Organ. Res. Methods
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Taking It Easy: Off-the-Shelf Versus Fine-Tuned Supervised Modeling of Performance Appraisal Text
Organizational Research Methods ( IF 8.9 ) Pub Date : 2024-08-28 , DOI: 10.1177/10944281241271249 Andrew B. Speer 1 , James Perrotta 2 , Tobias L. Kordsmeyer 3
Organizational Research Methods ( IF 8.9 ) Pub Date : 2024-08-28 , DOI: 10.1177/10944281241271249 Andrew B. Speer 1 , James Perrotta 2 , Tobias L. Kordsmeyer 3
Affiliation
When assessing text, supervised natural language processing (NLP) models have traditionally been used to measure targeted constructs in the organizational sciences. However, these models require significant resources to develop. Emerging “off-the-shelf” large language models (LLM) offer a way to evaluate organizational constructs without building customized models. However, it is unclear whether off-the-shelf LLMs accurately score organizational constructs and what evidence is necessary to infer validity. In this study, we compared the validity of supervised NLP models to off-the-shelf LLM models (ChatGPT-3.5 and ChatGPT-4). Across six organizational datasets and thousands of comments, we found that supervised NLP produced scores were more reliable than human coders. However, and even though not specifically developed for this purpose, we found that off-the-shelf LLMs produce similar psychometric properties as supervised models, though with slightly less favorable psychometric properties. We connect these findings to broader validation considerations and present a decision chart to guide researchers and practitioners on how they can use off-the-shelf LLM models to score targeted constructs, including guidance on how psychometric evidence can be “transported” to new contexts.
中文翻译:
放轻松:现成的与微调的绩效评估文本监督建模
在评估文本时,有监督的自然语言处理 (NLP) 模型传统上用于衡量组织科学中的目标结构。然而,这些模型需要大量资源来开发。新兴的“现成”大型语言模型( LLM )提供了一种无需构建定制模型即可评估组织结构的方法。然而,目前尚不清楚现成的LLMs是否准确地对组织结构进行评分,以及推断有效性需要哪些证据。在本研究中,我们将监督 NLP 模型与现成的LLM模型(ChatGPT-3.5 和 ChatGPT-4)的有效性进行了比较。在六个组织数据集和数千条评论中,我们发现有监督的 NLP 生成的分数比人类编码员更可靠。然而,即使不是专门为此目的开发的,我们发现现成的LLMs产生与监督模型类似的心理测量特性,尽管心理测量特性稍差一些。我们将这些发现与更广泛的验证考虑因素联系起来,并提出了一个决策图,以指导研究人员和从业者如何使用现成的LLM模型来对目标结构进行评分,包括如何将心理测量证据“转移”到新的环境中。
更新日期:2024-08-28
中文翻译:
放轻松:现成的与微调的绩效评估文本监督建模
在评估文本时,有监督的自然语言处理 (NLP) 模型传统上用于衡量组织科学中的目标结构。然而,这些模型需要大量资源来开发。新兴的“现成”大型语言模型( LLM )提供了一种无需构建定制模型即可评估组织结构的方法。然而,目前尚不清楚现成的LLMs是否准确地对组织结构进行评分,以及推断有效性需要哪些证据。在本研究中,我们将监督 NLP 模型与现成的LLM模型(ChatGPT-3.5 和 ChatGPT-4)的有效性进行了比较。在六个组织数据集和数千条评论中,我们发现有监督的 NLP 生成的分数比人类编码员更可靠。然而,即使不是专门为此目的开发的,我们发现现成的LLMs产生与监督模型类似的心理测量特性,尽管心理测量特性稍差一些。我们将这些发现与更广泛的验证考虑因素联系起来,并提出了一个决策图,以指导研究人员和从业者如何使用现成的LLM模型来对目标结构进行评分,包括如何将心理测量证据“转移”到新的环境中。