npj Digital Medicine ( IF 12.4 ) Pub Date : 2024-07-20 , DOI: 10.1038/s41746-024-01160-2
Aarzoo Dhiman 1, 2 , Elad Yom-Tov 3, 4 , Lorenzo Pellis 5 , Michael Edelstein 6 , Richard Pebody 7 , Andrew Hayward 8 , Thomas House 5 , Thomas Finnie 7 , David Guzman 1 , Vasileios Lampos 1 , , Ingemar J Cox 1, 9
We propose a method to estimate the household secondary attack rate (hSAR) of COVID-19 in the United Kingdom based on activity on the social media platform X, formerly known as Twitter. Conventional methods of hSAR estimation are resource intensive, requiring regular contact tracing of COVID-19 cases. Our proposed framework provides a complementary method that does not rely on conventional contact tracing or laboratory involvement, including the collection, processing, and analysis of biological samples. We use a text classifier to identify reports of people tweeting about themselves and/or members of their household having COVID-19 infections. A probabilistic analysis is then performed to estimate the hSAR based on the number of self or household, and self and household tweets of COVID-19 infection. The analysis includes adjustments for a reluctance of Twitter users to tweet about household members, and the possibility that the secondary infection was not acquired within the household. Experimental results for the UK, both monthly and weekly, are reported for the period from January 2020 to February 2022. Our results agree with previously reported hSAR estimates, varying with the primary variants of concern, e.g. delta and omicron. The serial interval (SI) is based on the time between the two tweets that indicate a primary and secondary infection. Experimental results, though larger than the consensus, are qualitatively similar. The estimation of hSAR and SI using social media data constitutes a new tool that may help in characterizing, forecasting and managing outbreaks and pandemics in a faster, affordable, and more efficient manner.
中文翻译:
使用社交媒体估计家庭二次发作率和 COVID-19 的连续间隔
我们提出了一种根据社交媒体平台 X(以前称为 Twitter)上的活动来估计英国 COVID-19 家庭二次攻击率 (hSAR) 的方法。传统的 hSAR 估计方法需要大量资源,需要定期追踪 COVID-19 病例的接触者。我们提出的框架提供了一种补充方法,不依赖于传统的接触者追踪或实验室参与,包括生物样本的收集、处理和分析。我们使用文本分类器来识别人们在推特上发布有关自己和/或其家庭成员感染了 COVID-19 感染的信息的报告。然后进行概率分析,根据自我或家庭数量以及自我和家庭感染 COVID-19 的推文来估计 hSAR。该分析包括对推特用户不愿发布有关家庭成员的推文以及二次感染并非在家庭内获得的可能性的调整。英国的实验结果(每月和每周)报告的时间为 2020 年 1 月至 2022 年 2 月。我们的结果与之前报告的 hSAR 估计值一致,但因所关注的主要变体(例如 delta 和 omicron)而异。序列间隔 (SI) 基于指示原发和继发感染的两条推文之间的时间。实验结果虽然大于共识,但在质量上是相似的。使用社交媒体数据估计 hSAR 和 SI 构成了一种新工具,可能有助于以更快、更经济、更有效的方式描述、预测和管理疫情和流行病。