当前位置: X-MOL 学术medRxiv. Health Inform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Forecasting the COVID-19 epidemic integrating symptom search behavior: an infodemiology study
medRxiv - Health Informatics Pub Date : 2021-03-12 , DOI: 10.1101/2021.03.09.21253186
Alessandro Rabiolo , Eugenio Alladio , Esteban Morales , Andrew Ian McNaught , Francesco Bandello , Abdelmonem A Afifi , Alessandro Marchese

Background: Previous studies have suggested associations between trends of web searches and COVID-19 traditional metrics. It remains unclear whether models incorporating trends of digital searches lead to better predictions. Methods: An open-access web application was developed to evaluate Google Trends and traditional COVID-19 metrics via an interactive framework based on principal components analysis (PCA) and time series modelling. The app facilitates the analysis of symptom search behavior associated with COVID-19 disease in 188 countries. In this study, we selected data of eight countries as case studies to represent all continents. PCA was used to perform data dimensionality reduction, and three different time series models (Error Trend Seasonality, Autoregressive integrated moving average, and feed-forward neural network autoregression) were used to predict COVID-19 metrics in the upcoming 14 days. The models were compared in terms of prediction ability using the root-mean-square error (RMSE) of the first principal component (PC1). Predictive ability of models generated with both Google Trends data and conventional COVID-19 metrics were compared with those fitted with conventional COVID-19 metrics only. Findings: The degree of correlation and the best time-lag varied as a function of the selected country and topic searched; in general, the optimal time-lag was within 15 days. Overall, predictions of PC1 based on both searched termed and COVID-19 traditional metrics performed better than those not including Google searches (median [IQR]: 1.43 [0.74-2.36] vs. 1.78 [0.95-2.88], respectively), but the improvement in prediction varied as a function of the selected country and timeframe. The best model varied as a function of country, time range, and period of time selected. Models based on a 7-day moving average led to considerably smaller RMSE values as opposed to those calculated with raw data (median [IQR]: 0.74 [0.47-1.22] vs. 2.15 [1.55-3.89], respectively).. Interpretation: The inclusion of digital online searches in statistical models may improve the prediction of the COVID-19 epidemic.

中文翻译:

结合症状搜索行为预测COVID-19流行病:一项信息流行病学研究

背景:先前的研究建议网络搜索趋势与COVID-19传统指标之间存在关联。尚不清楚纳入数字搜索趋势的模型能否带来更好的预测。方法:开发了一个开放式Web应用程序,以基于主成分分析(PCA)和时间序列建模的交互式框架评估Google趋势和传统的COVID-19指标。该应用程序可促进在188个国家/地区与COVID-19疾病相关的症状搜索行为的分析。在这项研究中,我们选择了八个国家的数据作为案例研究来代表所有大洲。PCA用于执行数据降维,以及三种不同的时间序列模型(误差趋势季节性,自回归综合移动平均值,和前馈神经网络自回归)来预测未来14天的COVID-19指标。使用第一主成分(PC1)的均方根误差(RMSE)在预测能力方面对模型进行了比较。将使用Google趋势数据和常规COVID-19指标生成的模型的预测能力与仅使用常规COVID-19指标拟合的模型进行了比较。结果:相关程度和最佳时滞随所选国家和所搜索主题的变化而变化;通常,最佳时滞在15天内。总体而言,基于搜索字词和COVID-19传统指标的PC1预测要比不包含Google搜索的预测更好(分别为[IQR]:1.43 [0.74-2.36]和1.78 [0.95-2.88]),但是预测的改进随所选国家/地区和时间的变化而变化。最佳模型随国家/地区,时间范围和所选时间段的变化而变化。与使用原始数据计算得出的模型相反,基于7天移动平均值的模型得出的RMSE值要小得多(分别为[IQR]:0.74 [0.47-1.22]和2.15 [1.55-3.89])。在统计模型中包含数字在线搜索可能会改善对COVID-19流行病的预测。
更新日期:2021-03-12
down
wechat
bug