Complex & Intelligent Systems ( IF 5.0 ) Pub Date : 2024-11-12 , DOI: 10.1007/s40747-024-01626-6 Yongqiang Peng, Xiaoliang Chen, Duoqian Miao, Xiaolin Qin, Xu Gu, Peng Lu
The field of social network analysis has identified User Alignment (UA) as a crucial area of investigation. The objective of UA is to identify and connect user accounts across diverse social networks, even when there are no explicit interconnections. UA plays a pivotal role in synthesising coherent user profiles and delving into the intricacies of user behaviour across platforms. However, traditional approaches have encountered limitations. Singular embedding techniques have been found to fall short in fully capturing the semantic essence of user profile attributes. Furthermore, classification-based embedding methods lack definitive criteria for categorisation, thereby constraining both the efficacy and applicability of these models. This paper presents a novel unsupervised Gradient Semantic Model for User Alignment (GSMUA) for the purpose of identifying common user identities across social networks. GSMUA categorises user profile information into weak, sub, and strong gradients based on the semantic intensity of attributes. Different gradient semantic levels direct attention to literal features, semantic features, or a combination of both during feature extraction, thereby achieving a full semantic representation of user attributes. In the case of strongly semantic long texts, GSMUA employs Named Entity Recognition (ENR) technology in order to enhance the inefficient handling of such texts. Furthermore, GSMUA compensates for missing user profile attributes by utilising profile information from user neighbours, thereby reducing the negative impact of missing user profile attributes on model performance. Extensive experiments conducted on four pairs of real datasets demonstrate the superiority of our approach. In comparison to the most effective previously developed unsupervised methods, GSMUA demonstrates improvements in hit-precision ranging from 5.32 to 12.17%. When compared to supervised methods, the improvements range from 0.71 to 11.79%.
中文翻译:
在社交媒体上揭示用户身份:一种新颖的无监督梯度语义模型,用于准确高效的用户对齐
社交网络分析领域已将用户对齐 (UA) 确定为一个关键的调查领域。UA 的目标是识别和连接不同社交网络中的用户帐户,即使没有明确的互连。UA 在综合连贯的用户档案和深入研究跨平台用户行为的复杂性方面发挥着关键作用。但是,传统方法遇到了局限性。已发现单一嵌入技术在完全捕获用户配置文件属性的语义本质方面存在不足。此外,基于分类的嵌入方法缺乏明确的分类标准,从而限制了这些模型的有效性和适用性。本文提出了一种新的用于用户对齐的无监督梯度语义模型 (GSMUA),用于识别社交网络中的常见用户身份。GSMUA 根据属性的语义强度将用户配置文件信息分为弱、子和强梯度。在特征提取过程中,不同的梯度语义级别将注意力引导到文字特征、语义特征或两者的组合上,从而实现用户属性的完整语义表示。在强语义长文本的情况下,GSMUA 采用命名实体识别 (ENR) 技术来增强对此类文本的低效处理。此外,GSMUA 通过利用来自用户邻居的个人资料信息来补偿缺失的用户个人资料属性,从而减少缺失的用户个人资料属性对模型性能的负面影响。对四对真实数据集进行的广泛实验证明了我们方法的优越性。 与以前开发的最有效的无监督方法相比,GSMUA 的命中精度提高了 5.32 到 12.17%。与监督方法相比,改善范围为 0.71 至 11.79%。