International Journal of Sports Marketing and Sponsorship ( IF 3.0 ) Pub Date : 2024-02-08 , DOI: 10.1108/ijsms-06-2023-0129 Juho Park , Junghwan Cho , Alex C. Gang , Hyun-Woo Lee , Paul M. Pedersen
Purpose
This study aims to identify an automated machine learning algorithm with high accuracy that sport practitioners can use to identify the specific factors for predicting Major League Baseball (MLB) attendance. Furthermore, by predicting spectators for each league (American League and National League) and division in MLB, the authors will identify the specific factors that increase accuracy, discuss them and provide implications for marketing strategies for academics and practitioners in sport.
Design/methodology/approach
This study used six years of daily MLB game data (2014–2019). All data were collected as predictors, such as game performance, weather and unemployment rate. Also, the attendance rate was obtained as an observation variable. The Random Forest, Lasso regression models and XGBoost were used to build the prediction model, and the analysis was conducted using Python 3.7.
Findings
The RMSE value was 0.14, and the R2 was 0.62 as a consequence of fine-tuning the tuning parameters of the XGBoost model, which had the best performance in forecasting the attendance rate. The most influential variables in the model are “Rank” of 0.247 and “Day of the week”, “Home team” and “Day/Night game” were shown as influential variables in order. The result was shown that the “Unemployment rate”, as a macroeconomic factor, has a value of 0.06 and weather factors were a total value of 0.147.
Originality/value
This research highlights unemployment rate as a determinant affecting MLB game attendance rates. Beyond contextual elements such as climate, the findings of this study underscore the significance of economic factors, particularly unemployment rates, necessitating further investigation into these factors to gain a more comprehensive understanding of game attendance.
中文翻译:
影响美国职业棒球大联盟 (MLB) 比赛上座率因素的机器学习预测:算法比较和失业的宏观经济因素
目的
本研究旨在确定一种高精度的自动化机器学习算法,体育从业者可以使用该算法来确定预测美国职业棒球大联盟 (MLB) 出勤率的具体因素。此外,通过预测美国职业棒球大联盟(MLB)每个联盟(美国联盟和国家联盟)和分区的观众人数,作者将确定提高准确性的具体因素,对其进行讨论,并为体育界的学者和从业者的营销策略提供启示。
设计/方法论/途径
这项研究使用了六年的每日 MLB 比赛数据(2014 年至 2019 年)。所有数据均作为预测数据收集,例如比赛表现、天气和失业率。此外,还获得了出勤率作为观察变量。使用随机森林、Lasso回归模型和XGBoost构建预测模型,并使用Python 3.7进行分析。
发现
对XGBoost模型的调优参数进行微调后,RMSE值为0.14,R2为0.62,在预测出勤率方面表现最好。模型中最有影响力的变量是“排名”为 0.247,“星期几”、“主队”和“日/夜比赛”依次显示为影响力变量。结果显示,宏观经济因素“失业率”的值为0.06,天气因素的总值为0.147。
原创性/价值
这项研究强调失业率是影响美国职业棒球大联盟比赛上座率的决定因素。除了气候等背景因素外,这项研究的结果还强调了经济因素的重要性,特别是失业率,因此需要进一步调查这些因素,以便更全面地了解比赛上座率。