当前位置: X-MOL 学术Found. Trends Inf. Ret. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Semantic Search on Text and Knowledge Bases
Foundations and Trends in Information Retrieval ( IF 8.3 ) Pub Date : 2016-6-21 , DOI: 10.1561/1500000032
Hannah Bast , Björn Buchhold , Elmar Haussmann

This article provides a comprehensive overview of the broad area of semantic search on text and knowledge bases. In a nutshell, semantic search is “search with meaning”. This “meaning” can refer to various parts of the search process: understanding the query (instead of just finding matches of its components in the data), understanding the data (instead of just searching it for such matches), or representing knowledge in a way suitable for meaningful retrieval. Semantic search is studied in a variety of different communities with a variety of different views of the problem. In this survey, we classify this work according to two dimensions: the type of data (text, knowledge bases, combinations of these) and the kind of search (keyword, structured, natural language). We consider all nine combinations. The focus is on fundamental techniques, concrete systems, and benchmarks. The survey also considers advanced issues: ranking, indexing, ontology matching and merging, and inference. It also provides a succinct overview of fundamental natural language processing techniques: POS-tagging, named-entity recognition and disambiguation, sentence parsing, and distributional semantics. The survey is as self-contained as possible, and should thus also serve as a good tutorial for newcomers to this fascinating and highly topical field.



本文提供了有关基于文本和知识库的语义搜索的广泛领域的全面概述。简而言之,语义搜索是“有意义的搜索”。这种“含义”可以指搜索过程的各个部分:理解查询(而不是仅仅在数据中找到其组成部分的匹配项),理解数据(而不是仅在其中搜索此类匹配项)或表示知识。适合有意义检索的方式。在各种不同的社区中对语义搜索进行了研究,对问题的看法也各不相同。在这项调查中,我们根据两个维度对这项工作进行分类:数据类型(文本,知识库,这些的组合)和搜索类型(关键字,结构化,自然语言)。我们考虑所有九种组合。重点是基本技术 具体系统和基准。该调查还考虑了高级问题:排名,索引,本体匹配和合并以及推理。它还简要概述了基本的自然语言处理技术:POS标记,命名实体识别和歧义消除,句子解析和分布语义。这项调查尽可能独立,因此对于这个引人入胜且高度话题性的领域的新手来说,它应该是一个很好的指南。
