当前位置: X-MOL 学术Sci. Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A manually annotated corpus in French for the study of urbanization and the natural risk prevention
Scientific Data ( IF 5.8 ) Pub Date : 2023-11-22 , DOI: 10.1038/s41597-023-02705-y
Maksim Koptelov 1, 2, 3 , Margaux Holveck 4 , Bruno Cremilleux 1 , Justine Reynaud 1 , Mathieu Roche 3, 5 , Maguelonne Teisseire 2, 3
Affiliation  

Land artificialization is a serious problem of civilization. Urban planning and natural risk management are aimed to improve it. In France, these practices operate the Local Land Plans (PLU – Plan Local d’Urbanisme) and the Natural risk prevention plans (PPRn – Plan de Prévention des Risques naturels) containing land use rules. To facilitate automatic extraction of the rules, we manually annotated a number of those documents concerning Montpellier, a rapidly evolving agglomeration exposed to natural risks. We defined a format for labeled examples in which each entry includes title and subtitle. In addition, we proposed a hierarchical representation of class labels to generalize the use of our corpus. Our corpus, consisting of 1934 textual segments, each of which labeled by one of the 4 classes (Verifiable, Non-verifiable, Informative and Not pertinent) is the first corpus in the French language in the fields of urban planning and natural risk management. Along with presenting the corpus, we tested a state-of-the-art approach for text classification to demonstrate its usability for automatic rule extraction.



中文翻译:

用于研究城市化和自然风险预防的法语手动注释语料库

土地人工化是一个严重的文明问题。城市规划和自然风险管理旨在改善这一状况。在法国,这些做法包括地方土地规划(PLU – Plan Local d'Urbanisme)和包含土地使用规则的自然风险预防计划(PPRn – Plan de Prévention des Risques naturels)。为了促进规则的自动提取,我们手动注释了一些有关蒙彼利埃的文件,蒙彼利埃是一个快速发展的暴露于自然风险的城市群。我们定义了标记示例的格式,其中每个条目都包含标题和副标题。此外,我们提出了类标签的分层表示来概括我们的语料库的使用。我们的语料库由 1934 个文本片段组成,每个文本片段都标有 4 个类别(可验证、不可验证、信息性和不相关)中的一个,是城市规划和自然风险管理领域的第一个法语语料库。在展示语料库的同时,我们还测试了最先进的文本分类方法,以证明其自动规则提取的可用性。

更新日期:2023-11-22
down
wechat
bug