当前位置: X-MOL 学术Q. J. Econ. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Machine Learning as a Tool for Hypothesis Generation
The Quarterly Journal of Economics ( IF 11.1 ) Pub Date : 2024-01-11 , DOI: 10.1093/qje/qjad055
Jens Ludwig 1 , Sendhil Mullainathan 1
Affiliation  

While hypothesis testing is a highly formalized activity, hypothesis generation remains largely informal. We propose a systematic procedure to generate novel hypotheses about human behavior, which uses the capacity of machine learning algorithms to notice patterns people might not. We illustrate the procedure with a concrete application: judge decisions about who to jail. We begin with a striking fact: The defendant’s face alone matters greatly for the judge’s jailing decision. In fact, an algorithm given only the pixels in the defendant’s mug shot accounts for up to half of the predictable variation. We develop a procedure that allows human subjects to interact with this black-box algorithm to produce hypotheses about what in the face influences judge decisions. The procedure generates hypotheses that are both interpretable and novel: They are not explained by demographics (e.g. race) or existing psychology research; nor are they already known (even if tacitly) to people or even experts. Though these results are specific, our procedure is general. It provides a way to produce novel, interpretable hypotheses from any high-dimensional dataset (e.g. cell phones, satellites, online behavior, news headlines, corporate filings, and high-frequency time series). A central tenet of our paper is that hypothesis generation is in and of itself a valuable activity, and hope this encourages future work in this largely “prescientific” stage of science.

中文翻译:

机器学习作为假设生成的工具

虽然假设检验是一项高度正式的活动,但假设的生成在很大程度上仍然是非正式的。我们提出了一种系统程序来生成有关人类行为的新颖假设,该假设利用机器学习算法的能力来注意到人们可能不会注意到的模式。我们用一个具体的应用来说明这个程序:法官决定谁入狱。我们从一个惊人的事实开始:被告的脸对于法官的监禁决定非常重要。事实上,仅给出被告脸部照片中像素的算法就可以解释多达一半的可预测变化。我们开发了一种程序,允许人类受试者与这种黑盒算法进行交互,以产生关于面部影响判断决策的假设。该程序产生的假设既可解释又新颖:它们不能用人口统计学(例如种族)或现有的心理学研究来解释;人们甚至专家也不知道它们(即使是默认的)。尽管这些结果是具体的,但我们的程序是通用的。它提供了一种从任何高维数据集(例如手机、卫星、在线行为、新闻标题、公司文件和高频时间序列)中产生新颖的、可解释的假设的方法。我们论文的一个中心原则是,假设生成本身就是一项有价值的活动,并希望这能鼓励未来在这个很大程度上是“前科学”的科学阶段的工作。
更新日期:2024-01-11
down
wechat
bug