Information Systems and E-Business Management ( IF 2.3 ) Pub Date : 2023-03-11 , DOI: 10.1007/s10257-023-00631-5 Jose Ramon Saura , Daniel Palacios-Marqués , Domingo Ribeiro-Soriano
In a digital ecosystem where large amounts of data related to user actions are generated every day, important concerns have emerged about the collection, management, and analysis of these data and, according, about user privacy. In recent years, users have been accustomed to organizing in and relying on digital communities to support and achieve their goals. In this context, the present study aims to identify the main privacy concerns in user communities on social media, and how these affect users’ online behavior. In order to better understand online communities in social networks, privacy concerns, and their connection to user behavior, we developed an innovative and original methodology that combines elements of machine learning as a technical contribution. First, a complex network visualization algorithm known as ForceAtlas2 was used through the open-source software Gephi to visually identify the nodes that form the main communities belonging to the sample of UGC collected from Twitter. Then, a sentiment analysis was applied with Textblob, an algorithm that works with machine learning on which experiments were developed with support vector classifier (SVC), multinomial naïve Bayes (MNB), logistic regression (LR), random forest, and classifier (RFC) under the theoretical frameworks of computer-aided text analysis (CATA) and natural language processing (NLP). As a result, a total of 11 user communities were identified: the positive protection software and cybersecurity and eCommerce, the negative privacy settings, personal information and social engineering, and the neutral privacy concerns, hacking, false information, impersonation and cookies data. The paper concludes with a discussion of the results and their relation to user behavior in digital environments and an outline valuable and practical insights into some techniques and challenges related to users’ personal data.
中文翻译:
社交媒体 UGC 社区中的隐私问题:了解复杂网络中的用户行为情绪
在每天都会产生大量与用户行为相关的数据的数字生态系统中,人们对这些数据的收集、管理和分析以及用户隐私产生了重要的担忧。近年来,用户已经习惯于组织并依赖数字社区来支持和实现他们的目标。在此背景下,本研究旨在确定社交媒体用户社区的主要隐私问题,以及这些问题如何影响用户的在线行为。为了更好地了解社交网络中的在线社区、隐私问题及其与用户行为的联系,我们开发了一种创新且原创的方法,该方法结合了机器学习的元素作为技术贡献。首先,通过开源软件Gephi使用名为ForceAtlas2的复杂网络可视化算法来直观地识别构成从Twitter收集的UGC样本的主要社区的节点。然后,使用 Textblob 应用情感分析,Textblob 是一种与机器学习配合使用的算法,在该算法上使用支持向量分类器 (SVC)、多项式朴素贝叶斯 (MNB)、逻辑回归 (LR)、随机森林和分类器 (RFC) 开发了实验)在计算机辅助文本分析(CATA)和自然语言处理(NLP)的理论框架下。结果,总共确定了 11 个用户社区:积极的保护软件和网络安全和电子商务,消极的隐私设置、个人信息和社会工程,以及中性的隐私问题、黑客攻击、虚假信息、冒充和 cookie 数据。本文最后讨论了结果及其与数字环境中用户行为的关系,并概述了与用户个人数据相关的一些技术和挑战的有价值且实用的见解。